

## 0) Returns, Covariance, PCA, and Standardization

* **Objects and shapes**

  * **Returns:** $R_t\in\mathbb{R}^N$ (excess returns).
  * **Mean:** $\mu=\mathbb{E}[R_t]\in\mathbb{R}^N$.
  * **Covariance:** $\Sigma=\operatorname{Var}(R_t)=\mathbb{E}[(R_t-\mu)(R_t-\mu)^\top]\in\mathbb{R}^{N\times N}$, symmetric p.s.d.

* **PCA / spectral decomposition**

  * Write

    $$
    \Sigma = F\Psi^2 F^\top,
    $$

    where:

    * $F\in\mathbb{R}^{N\times N}$ is **orthogonal**: $F^\top F=FF^\top=I$. Columns are eigenvectors of $\Sigma$.
    * $\Psi=\operatorname{diag}(\psi_1,\dots,\psi_N)\succ 0$ with $\psi_n=\sqrt{\lambda_n}$ (square roots of eigenvalues).

* **Rotate into PC space**

  * Define **PC coordinates**:

    $$
    Y_t \;\equiv\; F^\top(R_t-\mu)\in\mathbb{R}^N.
    $$
  * Compute the covariance step-by-step:

    $$
    \begin{aligned}
    \operatorname{Cov}(Y_t)
    &= \mathbb{E}\!\big[Y_t Y_t^\top\big]
     = \mathbb{E}\!\big[F^\top(R_t-\mu)(R_t-\mu)^\top F\big] \\
    &= F^\top\,\mathbb{E}\!\big[(R_t-\mu)(R_t-\mu)^\top\big]\,F
     = F^\top \Sigma F \\
    &= F^\top (F\Psi^2 F^\top) F
     = (F^\top F)\Psi^2(F^\top F)
     = \Psi^2.
    \end{aligned}
    $$

    So PCs are **uncorrelated** with variances $\psi_n^2$.

* **Standardize PCs to unit variance**

  * Define **standardized shocks**:

    $$
    v_t \;\equiv\; \Psi^{-1}Y_t
    \;=\; \Psi^{-1}F^\top(R_t-\mu)\in\mathbb{R}^N.
    $$
  * Verify mean and covariance:

    $$
    \mathbb{E}[v_t] = \Psi^{-1}\underbrace{\mathbb{E}[Y_t]}_{0}=\mathbf{0},\qquad
    \operatorname{Cov}(v_t) = \Psi^{-1}\operatorname{Cov}(Y_t)\Psi^{-1}
    = \Psi^{-1}\Psi^2\Psi^{-1} = I.
    $$

    Each component of $v_t$ has **variance 1** and components are **uncorrelated**.

* **Reconstruct returns from standardized shocks**

  * From $Y_t=\Psi v_t$ and $Y_t=F^\top(R_t-\mu)$:

    $$
    F^\top(R_t-\mu) = \Psi v_t \;\Rightarrow\; R_t-\mu = F\Psi v_t.
    $$
  * **Working representation:**

    $$
    \boxed{R_t = \mu + F\Psi v_t, \quad \mathbb{E}[v_t]=0,\; \operatorname{Cov}(v_t)=I.}
    $$

---

## 1) Covariance and Its Inverse (with proof)

* From PCA:

  $$
  \Sigma = F\Psi^2 F^\top.
  $$
* **Inverse identity** for orthogonal $F$ and diagonal positive $\Psi^2$:

  $$
  \Sigma^{-1} = F\Psi^{-2}F^\top.
  $$

  **Verification:**

  $$
  \begin{aligned}
  \Sigma \,\big(F\Psi^{-2}F^\top\big)
  &= \big(F\Psi^2F^\top\big)\big(F\Psi^{-2}F^\top\big)
   = F\underbrace{\Psi^2\Psi^{-2}}_{I}F^\top
   = FF^\top
   = I,\\
  \big(F\Psi^{-2}F^\top\big)\Sigma
  &= F\Psi^{-2}\underbrace{F^\top F}_{I}\Psi^2F^\top
   = F\underbrace{\Psi^{-2}\Psi^2}_{I}F^\top
   = I.
  \end{aligned}
  $$

---

## 2) Information Sources and Conditional Expected Returns

* **Signals**

  $$
  s_t\in\mathbb{R}^M,\qquad \mathbb{E}[s_t]=0,\qquad \operatorname{Cov}(s_t)=I_M.
  $$

* **Skill matrix (PC–signal cross-covariance)**

  $$
  K \;\equiv\; \mathbb{E}[\,v_t s_t^\top\,]\in\mathbb{R}^{N\times M}.
  $$

* **Joint second moments of $(v_t,s_t)$ (for clarity)**

  $$
  \operatorname{Cov}\!\begin{pmatrix} v_t \\ s_t \end{pmatrix}
  = \begin{pmatrix}
      \operatorname{Cov}(v_t) & \operatorname{Cov}(v_t,s_t) \\
      \operatorname{Cov}(s_t,v_t) & \operatorname{Cov}(s_t)
    \end{pmatrix}
  = \begin{pmatrix}
      I_N & K \\
      K^\top & I_M
    \end{pmatrix}.
  $$

* **Conditional mean of $v_t$ given $s_t$** (best linear predictor; exact under joint normality):

  $$
  \boxed{\;\mathbb{E}[v_t\mid s_t]
  = \operatorname{Cov}(v_t,s_t)\,\operatorname{Cov}(s_t)^{-1}\,s_t
  = K\,I_M^{-1}\,s_t
  = K s_t.\;}
  $$

  *(Shapes: $K\in\mathbb{R}^{N\times M}$, $s_t\in\mathbb{R}^M$ ⇒ $Ks_t\in\mathbb{R}^N$.)*

* **Signal-driven conditional expected return (alpha)**

  $$
  \begin{aligned}
  a(s_t)
  &\;\equiv\; \mathbb{E}[R_t\mid s_t]-\mu
   \;=\; \mathbb{E}[F\Psi v_t\mid s_t]
   \;=\; F\Psi\,\mathbb{E}[v_t\mid s_t] \\
  &= F\Psi K s_t.
  \end{aligned}
  $$

  $$
  \boxed{\;a(s_t)=F\Psi K s_t.\;}
  $$

---

## 3) Mean–Variance Portfolio for Given Signals

* **Problem**

  $$
  \max_{p\in\mathbb{R}^N}
  \;\; a(s_t)^\top p - \frac{\lambda}{2}\,p^\top \Sigma\,p,
  \qquad \lambda>0.
  $$
* **FOC and solution (strict concavity from $\Sigma\succ0$)**

  $$
  \nabla_p:\; a(s_t)-\lambda\Sigma p=\mathbf{0}
  \;\Longrightarrow\;
  \boxed{\;p(s_t)=\frac{1}{\lambda}\Sigma^{-1}a(s_t).\;}
  $$
* **(Optional) Value at optimum**
  Substitute $p(s_t)$ back:

  $$
  a^\top p - \tfrac{\lambda}{2}p^\top\Sigma p
  = \tfrac{1}{\lambda}a^\top\Sigma^{-1}a - \tfrac{\lambda}{2}\cdot \tfrac{1}{\lambda^2}a^\top\Sigma^{-1}\Sigma\Sigma^{-1}a
  = \frac{1}{2\lambda}\,a^\top\Sigma^{-1}a.
  $$

---

## 4) Generalized Information Ratio (GIR)

* **Definition (ex-ante Sharpe of $p(s_t)$)**

  $$
  GIR \;\equiv\;
  \frac{\mathbb{E}[\,a(s_t)^\top p(s_t)\,]}
       {\sqrt{\mathbb{E}[\,p(s_t)^\top \Sigma\, p(s_t)\,]}}.
  $$
* **Compute numerator and denominator explicitly**

  $$
  a^\top p
  = a(s_t)^\top \big(\tfrac{1}{\lambda}\Sigma^{-1}a(s_t)\big)
  = \frac{1}{\lambda}\,a(s_t)^\top\Sigma^{-1}a(s_t).
  $$

  $$
  p^\top \Sigma p
  = \big(\tfrac{1}{\lambda}\Sigma^{-1}a(s_t)\big)^\top \Sigma
    \big(\tfrac{1}{\lambda}\Sigma^{-1}a(s_t)\big)
  = \frac{1}{\lambda^2}\,a(s_t)^\top\Sigma^{-1}a(s_t).
  $$
* **Cancel the common factor under the ratio**

  $$
  \boxed{\;GIR
  = \sqrt{\;\mathbb{E}\!\left[a(s_t)^\top\Sigma^{-1}a(s_t)\right]\,}. \;}
  $$

---

## 5) Evaluate $\mathbb{E}[\,a^\top \Sigma^{-1} a\,]$ (no steps skipped)

* **Substitute $a(s_t)=F\Psi K s_t$** and $\Sigma^{-1}=F\Psi^{-2}F^\top$:

  $$
  \begin{aligned}
  a^\top \Sigma^{-1} a
  &= (F\Psi K s_t)^\top \,(F\Psi^{-2}F^\top)\,(F\Psi K s_t) \\
  &\stackrel{(1)}{=} s_t^\top K^\top \Psi^\top F^\top F \Psi^{-2} F^\top F \Psi K s_t \\
  &\stackrel{(2)}{=} s_t^\top K^\top \Psi \underbrace{(F^\top F)}_{I}\Psi^{-2}\underbrace{(F^\top F)}_{I}\Psi K s_t \\
  &\stackrel{(3)}{=} s_t^\top K^\top \big(\Psi\,\Psi^{-2}\,\Psi\big) K s_t \\
  &\stackrel{(4)}{=} s_t^\top K^\top K s_t,
  \end{aligned}
  $$

  where:

  * (1) uses $(AB)^\top=B^\top A^\top$ and that $\Psi=\Psi^\top$ (diagonal), hence $(F\Psi K s)^\top = s^\top K^\top \Psi F^\top$.
  * (2) uses orthogonality $F^\top F=I$.
  * (3) groups diagonal matrices; $\Psi\Psi^{-2}\Psi=\operatorname{diag}(\psi_n\cdot\psi_n^{-2}\cdot\psi_n)=I$.
  * (4) removes the identity factor.

* **Take expectation using $\mathbb{E}[s_t s_t^\top]=I_M$**

  $$
  \begin{aligned}
  \mathbb{E}\!\left[s_t^\top K^\top K s_t\right]
  &= \operatorname{tr}\!\Big(K^\top K\,\mathbb{E}[s_t s_t^\top]\Big)
   \quad\big(\text{identity: } \mathbb{E}[x^\top A x]=\operatorname{tr}(A\,\mathbb{E}[xx^\top])\big)\\
  &= \operatorname{tr}\!\big(K^\top K\cdot I_M\big)
   = \operatorname{tr}(K^\top K).
  \end{aligned}
  $$

* **Therefore**

  $$
  \boxed{\;GIR
    = \sqrt{\operatorname{tr}(K^\top K)} 
    = \|K\|_F.\;}
  $$

  *(Frobenius norm: $\|K\|_F^2=\sum_{n,m}k_{n,m}^2$.)*


---

## 6) Interpreting $GIR$ (PC view and signal view)

* **Entrywise (components)**

  $$
  \operatorname{tr}(K^\top K) = \sum_{n=1}^{N}\sum_{m=1}^{M} k_{n,m}^2.
  $$

* **PC aggregation**

  * Define per-PC skill:

    $$
    \kappa_n^2 \;\equiv\; \sum_{m=1}^M k_{n,m}^2,\qquad
    \bar{\kappa}^2 \;\equiv\; \frac{1}{N}\sum_{n=1}^N \kappa_n^2.
    $$
  * Then

    $$
    GIR^2 = \sum_{n=1}^N \kappa_n^2 = N\,\bar{\kappa}^2
    \;\Longrightarrow\;
    \boxed{\;GIR=\bar{\kappa}\sqrt{N}.\;}
    $$

* **Signal aggregation**

  * Eigen-decompose

    $$
    K^\top K = E C^2 E^\top,\qquad
    C=\operatorname{diag}(c_1,\dots,c_M),\; E^\top E=I_M.
    $$
  * Then

    $$
    GIR^2=\operatorname{tr}(K^\top K)
    = \sum_{m=1}^M c_m^2
    = M\,\bar{c}^2,\quad
    \bar{c}^2 \equiv \frac{1}{M}\sum_{m=1}^M c_m^2,
    $$

    hence

    $$
    \boxed{\;GIR=\bar{c}\sqrt{M}.\;}
    $$

* **Sanity checks**

  * $K=0 \Rightarrow GIR=0$.
  * Adding a signal with zero loadings (a zero column in $K$) leaves $GIR$ unchanged.

---

## Mini reference: identities used (so no re-deriving needed)

* **Orthogonal similarity inverse:** $(F A F^\top)^{-1} = F A^{-1} F^\top$ for $F^\top F=I$.
* **Transpose of a product:** $(AB)^\top = B^\top A^\top$.
* **Trace–quadratic form:** $\mathbb{E}[x^\top A x] = \operatorname{tr}(A\,\mathbb{E}[xx^\top])$.
* **Diagonal cancellations:** For diagonal $\Psi\succ0$, $\Psi\Psi^{-2}\Psi=I$.
* **Orthogonality:** $F^\top F = I$.




## 7) Source–Portfolio Representation 

### 7.0 Conventions (per-period, standardized)

* **Returns:** $R_t = \mu + F\,\Psi\,v_t$, with $F^\top F = I$, $\Psi=\mathrm{diag}(\psi_1,\dots,\psi_N)\succ0$.
* **Standardized shocks:** $\mathbb{E}[v_t]=0$, $\operatorname{Cov}(v_t)=I$.
* **Signals:** $\mathbb{E}[s_t]=0$, $\operatorname{Cov}(s_t)=I$.
* **Skill matrix:** $K := \mathbb{E}[v_t s_t^\top]\in\mathbb{R}^{N\times M}$.
* **Skill geometry:** **define** the eigensystem

  $$
  K^\top K \;=\; E\,C^2\,E^\top, 
  \quad E^\top E = I_M, 
  \quad C := \mathrm{diag}(c_1,\dots,c_M)\succeq 0.
  $$

  Here, **$c_m$** are the **effective skills**.

---

### 7.1 Source portfolios

* **Definition (C51).**

  $$
  \boxed{P \;\equiv\; F\,\Psi^{-1}\,K\,E \in \mathbb{R}^{N\times M}}
  $$

  * Column $p_m$ is the $m$-th **source portfolio** in asset space.
    
* **Explanation of construction:**

  1. $K$ links signals $s_t$ to standardized return shocks $v_t$.
  2. Multiplying by $E$ rotates the signal space so that sources are orthogonal and aligned with eigenvalues $c_m^2$.
  3. Multiplying by $\Psi^{-1}$ and $F$ maps those directions back into **asset space**.

* **Result:** each column $p_m$ of $P$ is a **tradable portfolio** that captures exactly one independent source of information.
---

### 7.2 Surprise return of source portfolios

* **Goal:** express portfolio returns in terms of $v_t$.
* Start from $R_t-\mu = F\Psi v_t$.
* Compute (C52):

  $$
  \begin{align}
  P^\top F\Psi v_t
  &= (F\Psi^{-1} K E)^\top\,F\Psi v_t 
   && \text{(expand transpose)} \\
  &= E^\top K^\top \Psi^{-1} F^\top F \Psi\, v_t 
   && \text{(transpose + reassociate)}\\
  &= E^\top K^\top \underbrace{\Psi^{-1}\Psi}_{I}\, v_t 
   && \text{(orthogonality \(F^\top F=I\))} \\
  &= \boxed{E^\top K^\top v_t}. && \tag{C52}
  \end{align}
  $$

---

### 7.3 Return variance in the source basis

Two equivalent routes (both shown).

**Route A (via $\Sigma$)**
Since $\Sigma = F\Psi^2 F^\top$:

$$
\begin{align}
P^\top \Sigma P
&= (F\Psi^{-1} K E)^\top\,(F\Psi^2 F^\top)\,(F\Psi^{-1} K E) \\
&= E^\top K^\top \Psi^{-1} \underbrace{(F^\top F)}_{I}\Psi^2 \underbrace{(F^\top F)}_{I}\Psi^{-1} K E \\
&= E^\top K^\top \underbrace{\Psi^{-1}\Psi^2\Psi^{-1}}_{I} K E \\
&= E^\top (K^\top K) E 
 = \boxed{C^2}. \tag{C53}
\end{align}
$$

**Route B (direct from 7.2)**
With $u := P^\top F\Psi v_t = E^\top K^\top v_t$:

$$
\operatorname{Var}(u) 
= E^\top K^\top\,\underbrace{\operatorname{Var}(v_t)}_{I}\,K E
= E^\top (K^\top K) E 
= \boxed{C^2}.
$$

**Conclusion:** source-portfolio returns are **uncorrelated**, and $\operatorname{Var}(p_m^\top R_t)=c_m^2$.

---

### 7.4 Forecasts for source portfolios

* **Alpha in asset space:** **define**

  $$
  a(s_t) \;\equiv\; F\,\Psi\,K\,s_t.
  $$

* **Project onto the source portfolios** (C54):

  $$
  \begin{align}
  a(s_t)^\top P
  &= (F\Psi K s_t)^\top (F\Psi^{-1} K E) \\
  &= s_t^\top K^\top \Psi F^\top F \Psi^{-1} K E \\
  &= s_t^\top K^\top \underbrace{\Psi\Psi^{-1}}_{I} K E \\
  &= s_t^\top (K^\top K) E \\
  &= \boxed{s_t^\top E\,C^2}. \tag{C54}
  \end{align}
  $$

* **Forecast variance** (C56):

  $$
  \begin{align}
  \operatorname{Var}\!\big(a(s_t)^\top P\big)
  &= \operatorname{Var}\!\big(s_t^\top E C^2\big) \\
  &= C^2 \underbrace{E^\top \operatorname{Var}(s_t) E}_{I} C^2 \\
  &= \boxed{C^4}. \tag{C56}
  \end{align}
  $$

---

### 7.5 Return–forecast covariance in the source basis

* **Define** vectors:

  $$
  u \;\equiv\; P^\top F\Psi v_t 
  \;=\; E^\top K^\top v_t, 
  \qquad
  w \;\equiv\; P^\top a(s_t) 
  \;=\; E^\top (K^\top K) s_t.
  $$
* **Compute** (C55):

  $$
  \begin{align}
  \operatorname{Cov}(u,w)
  &= \mathbb{E}[u\,w^\top] 
   && \text{(zero means)} \\
  &= E^\top K^\top\,\underbrace{\mathbb{E}[v_t s_t^\top]}_{K}\,(K^\top K) E \\
  &= E^\top (K^\top K)\,(K^\top K) E \\
  &= E^\top (K^\top K)^2 E 
   = \boxed{C^4}. \tag{C55}
  \end{align}
  $$
* **Diagonal:** covariance is diagonal in the source basis.

---

### 7.6 Per-source correlation equals effective skill

For the $m$-th source portfolio $p_m$:

* $\operatorname{Var}(\text{return}_m)=c_m^2$ (from $C^2$),
* $\operatorname{Var}(\text{forecast}_m)=c_m^4$ (from $C^4$),
* $\operatorname{Cov}(\text{return}_m,\text{forecast}_m)=c_m^4$ (from $C^4$).

Hence

$$
\boxed{\ \operatorname{Corr}\!\big(p_m^\top R_t,\; a(s_t)^\top p_m\big)
= \frac{c_m^4}{\sqrt{c_m^2}\,\sqrt{c_m^4}} 
= c_m.\ }
$$

**Interpretation:** **$c_m$** is exactly the **return–forecast correlation** of the $m$-th source portfolio.

---

## 7.7 GIR in the source–portfolio basis

* **Fact:** source-portfolio returns and forecasts are mutually **uncorrelated across $m$** (everything is diagonal).
* **Therefore,** the squared ex-ante Sharpe **adds across sources**:

  $$
  \begin{aligned}
  GIR^2
  &= \sum_{m=1}^M 
     \operatorname{Corr}^2\!\big(p_m^\top R_t,\; a(s_t)^\top p_m\big) \\
  &= \sum_{m=1}^M c_m^2 
   = \operatorname{tr}(K^\top K).
  \end{aligned}
  $$
* **Result (C58):**

  $$
  \boxed{\ GIR \;=\; \|K\|_F \;=\; \sqrt{ \sum_{m=1}^M c_m^2 } 
  \;=\; \bar{c}\,\sqrt{M},\quad 
  \bar{c}^2 := \tfrac{1}{M}\sum_{m=1}^M c_m^2.\ }
  $$

---

## 7.8 Redundancy / rank condition

* **Claim:** $K^\top K \succeq 0$ (by construction).
* **If not full rank:** some $c_m=0$ ⇒ the corresponding source is **redundant** (no contribution to GIR) and can be eliminated without loss.

---

### Quick identity list used (so you never have to rethink)

* **Orthogonality:** $F^\top F = I$, $E^\top E = I$.
* **Similarity inverse:** $(F\Psi^2 F^\top)^{-1} = F\Psi^{-2} F^\top$.
* **Transpose of product:** $(AB)^\top = B^\top A^\top$.
* **Variance under linear map:** $\operatorname{Var}(A x) = A\,\operatorname{Var}(x)\,A^\top$.
* **Trace–quadratic form:** $\mathbb{E}[x^\top A x] = \operatorname{tr}(A\,\operatorname{Cov}(x))$.
* **Diagonal cancellations:** $\Psi^{-1}\Psi^2\Psi^{-1}=I$.

---

**End state:** All steps explicit, “**define**” vs “**equal**” cleanly separated, and the final result

$$
GIR=\|K\|_F=\bar{c}\sqrt{M}
$$

follows directly in the **source–portfolio** basis with **no time-factor leftovers**.


## 11) The Realized Information Coefficient (IC)

We want the correlation between realized shocks in returns and forecasts based on signals.

---

### 11.1 General definition

* **Definition (C59):**

  $$
  IC\{ {s}, {v}\} 
  := \frac{ {v}^\top K  {s}}
  {\sqrt{ {v}^\top  {v}}\;\sqrt{ {s}^\top (K^\top K)\, {s}}}.
  \tag{C59}
  $$

* **Interpretation:**
  This is the sample correlation between realized standardized shocks $ {v}$ and forecasted components $K {s}$.

---

### 11.2 Approximations (legacy version)

The original text used crude approximations:

1. For the denominator involving $ {v}^\top  {v}$:

   $$
   \mathbb{E}[ {v}^\top  {v}] = N \quad\Rightarrow\quad \sqrt{ {v}^\top  {v}} \approx \sqrt{N}. 
   \tag{C60'}
   $$
  

2. For the numerator involving $ {v}^\top K  {s}$:

   $$
   \begin{align}
   \mathbb{E}[ {v}^\top K  {s}]
   &= \operatorname{tr}(K^\top \mathbb{E}[ {v} {s}^\top]) \\
   &= \operatorname{tr}(K^\top K) \\
   &= \kappa^2 N. \tag{C61'}
   \end{align}
   $$

   where we **define**

   $$
   \kappa^2 := \frac{1}{N}\,\operatorname{tr}(K^\top K).
   \tag{C31'}
   $$

3. For the denominator involving $ {s}^\top (K^\top K) {s}$:

   $$
   \mathbb{E}[ {s}^\top (K^\top K) {s}]
   = \operatorname{tr}(K^\top K)
   = \kappa^2 N \quad\Rightarrow\quad
   \sqrt{ {s}^\top (K^\top K) {s}} \approx \kappa \sqrt{N}.
   \tag{C62'}
   $$

---

### 11.3 Substitution into IC formula

* **Plugging (C60'), (C61'), (C62') into (C59):**

  $$
  IC\{ {s}, {v}\}
  \;\approx\;
  \frac{\kappa^2 N}{\sqrt{N}\,\kappa \sqrt{N}}.
  $$

* Simplify:

  $$
  IC \approx \frac{\kappa^2 N}{\kappa N}
  = \kappa.
  \tag{C63'}
  $$

---

### 11.4 Clean result

$$
\boxed{IC \;\approx\; \kappa}
$$

* **Meaning:** The realized information coefficient is approximately equal to the **average per-component skill** $\kappa$.


---

✨ **Interpretation:**

* In practice, the IC measures how well signals predict return shocks.
* Under this model, IC collapses to the standardized average predictive correlation, $\kappa$.
* This ties back neatly to the GIR result:

  $$
  GIR = \kappa \sqrt{N},
  $$

  where IC is the “per-asset skill,” and GIR multiplies it by diversification breadth.
