Modern portfolio theory was born with the efficient frontier analysis of Markowitz (1952). Unfortunately, early applications of the technique, based on naïve estimates of the input parameters, have been found to be of little use because they often lead to non-sensible portfolio allocations.

The focus of this module is on bridging the gap between portfolio theory and portfolio construction by showing how to generate enhanced parameter estimates so as to improve the quality of the portfolio optimization outputs (optimal portfolio weights), with a focus on risk parameter estimates. We first address sample risk and explain how to improve covariance matrix estimation via the use of **factor and/or Bayesian techniques and statistical shrinkage estimators**.

We then move on to addressing stationarity risk, namely the fact that risk parameters are not constant but move over time. We start with basic rolling-window and exponentially-weighted moving average analysis and then move on to the estimation of covariance parameters with autoregressive conditional heteroskedasticity and state-dependent models.

----

# The Curse of Dimensionality

To obtain the efficient frontier of $n$ securities, you have to estimate

- $n$ expected returns
- $n$ volatilities
- $n (n-1) / 2$ correlations (this grows quadratically!)

So how do we reduce the number of parameters to estimate?

- Reduce the number of stocks you analyse
  - bad solution though :)
- Increase sample size
  - Increase the sample period
  - **or:** Increase the frequency

Remember: We face a trade-off between sample risk and model risk

### Extreme example 1: No model risk - high sample risk

The sample covariance:

$$\hat{S}_{ij} = \frac{1}{T} \sum_{t=1}^T (R_{it}-\bar{R}_i) (R_{jt}-\bar{R}_j)$$

### Extreme example 2: High model risk - low sample risk

Constant correlation model:

Assume that all correlations $\rho_{ij}$ are equal: $\rho$

$$\hat{\sigma}_{ij} = \hat{\sigma}_i \hat{\sigma}_j \hat{\rho}$$

Obvs not true. But this is the other side: Try to estimate fewer parameters accurately instead of many parameters poorly.

**And:** A 1973 paper by Elton and Gruber (NYU) showed when you use this constant correlation model, the minimum-variance portfolio constructed from it fared better than the one using "full" sample covariance estimates.

So this constant correlation model is already an "improvement", but we can do better:

----

# Estimating the Covariance Matrix with a Factor Model

Using a factor model is a nice way to reduce the number of risk parameters while introducing only a reasonable amount of model risk.

----

Assume stock returns are driven by a limited set of factors

$$ R_{it} = \mu_i + \beta_{i1}F_{1t} + \ldots + \beta_{iK}F_{Kt} + \epsilon_{it} $$

where $\beta_{ik}$ is the sensitivity of asset $i$ with respect to factor $k$.

Then, with an example of only two factors, express the variance and covariance based on only these factors:

**Variance:**

$$ \sigma^2_i = \beta_{i1}^2 \sigma_{F1}^2 + \beta_{i2}^2 \sigma_{F2}^2 + 2\beta_{i1}\beta_{i2}\text{Cov}(F_1,F_2) + \sigma_{\epsilon i}^2$$

**Covariance:**

$$ \sigma_{ij} = \beta_{i1}\beta_{j1} \sigma_{F1}^2 + \beta_{i2}\beta_{j2} \sigma_{F2}^2 + (\beta_{i1}\beta_{j2}+\beta_{i2}\beta_{j1}) \text{Cov}(F_1,F_2) + \text{Cov}(\epsilon_{it}, \epsilon_{jt})$$

Now: Introduce structure by assuming that residuals are uncorrelated. We'll introduce some *model risk* with this assumption:

$$\text{Cov}(\epsilon_{it}, \epsilon_{jt}) = 0$$

----

This way, in a universe of 500 stocks, you need to estimate 1502 parameters: 

We first need 500 volatility estimates for individual stock returns, plus 500 estimates of betas of stocks with respect to factor 1, 500 estimates of betas of stocks with respect to factor 2, and finally 2 volatility estimates for factor returns, which gives a total of 500+500+500+2=1,502, which compares favorably to 500x499/2=124,750 when using the sample covariance matrix estimate.

----

### Which factor model to choose?

- Simplest model: Sharpe's (1963) single-factor model: only one $\beta$ for the market return
- Three families of multi-factor model:
  - **Explicit macro factor models**: Use macro factors such as inflation, growth, interest rates, time spreads.
  - **Micro factors**: Attributes or characteristics of individual stocks: country, industry, size, book-to-market (i.e. *value*)
  - **Implicit factor model**: Statistical factors, e.g. through a Principal Component Analysis. You make no prior assumptions on what the factors could be.

----

# Honey I Shrunk the Covariance Matrix!

(this is the title of a [2004 paper by Ledoit and Wolf](http://jpm.iijournals.com/content/30/4/110))

Computing **shrinkage** estimators is a way to find an optimal tradeoff of model risk and sample risk.

$$\hat{S}_{\text{shrink}} = \hat{\delta}^* \hat{F} + (1-\hat{\delta}^*) \hat{S}$$

- $F$ is the factor model based estimator for the covariance matrix. F has model risk.
- $S$ is the sample covariance matrix. It has no model risk at all, but high sample risk.

$\delta$ is the mix parameter. Ledoit and Wolf explain how to estimate it.

----

Imposing constraints on weights is equivalent to statistical shrinkage (Jagannathan and Ma, 2003)


----

# Portfolio Construction with Time-Varying Risk Parameters

- Now, the second problem: **The curse of non-stationarity**
  - Parameters might vary with time


- For high-frequency data, we often assume the mean return $R_t$ is zero.
- Then the simplified variance is $\sigma^2_T = \frac{1}{T} \sum_{t=1}^T R_t^2$

### Volatility is not constant over time

Look at the annualized market volatility for the trailing year, based on weekly returns.

This implies that increasing the frequency (from monthly to daily data) might be better than increasing the period, because you'll stay in relatively recent times.

### Expanding window vs. sliding window analysis

Expanding Window is better if asset returns are stationary, i.e. the volatility is constant, because then you're using more data at the later points in time.

But the volatilities are not constant :)

One way out is to upweigh recent data, e.g. by using exponentially weighted averages:

----

# Exponentially weighted average

- To estimate covariance parameters when you have reason to believe they are time-variant
- Most basic estimate for vol: $\sigma^2_T = \frac{1}{T} \sum^T_{t=1} R^2_t$
  - Problem: You treat the most recent returns the same as very old returns
  - Solution: Weight the data: $\sigma^2_T = \sum^T_{t=1} \alpha_t R^2_t$
    - where $\sum_t \alpha_t = 1$
- **Exponential** weighting: The weights decline exponentially with length of time

$$\alpha_t = \lambda^{T-t} / \sum_{t=1}^T \lambda^{T-t}$$

Where $\lambda$ is a parameter between 0 and 1. Lower $\lambda$ gives you faster weight decreases ($\lambda=1$ will give you equal weighting)

$\lambda=0.9$ is a common choice.

Then, a covariance estimator is:

$$\text{Cov}(R_i, R_j) = \sum_t \alpha_t (R_{i,t}-\bar{R_i}) (R_{j,t}-\bar{R_j})$$

----

Nice: You can now use expanding windows instead of sliding windows again. You don't have to throw away the oldest observations, you just downweigh them heavily.

A *pure* rolling window has drawbacks: An outlier is in with "full" (i.e. equal) weight one day, and completely out the next day when it falls out of the window.

----

# ARCH and GARCH models

- powerful methodologies for estimating time-varying risk parameters

### ARCH

- In an ARCH model, we also assign some of the weight to the long-term variance $V_L$:

$$ \sigma^2_T = \gamma V_l +  \sum_{t=1}^T \alpha_t R_t^2$$

where $\gamma + \sum_t \alpha_t = 1$

### GARCH

- Extending ARCH by also assigning some weight to the previous variance estimate $\sigma^2_{T-1}$ to capture **volatility clustering**
- Example: GARCH(1,1):

$$\sigma_T^2 = \gamma V_L + \alpha R_T^2 + \beta \sigma_{T-1}^2$$

with $\gamma+\alpha+\beta=1$

**GARCH(p,q)**

$$\sigma_T^2 = \omega + \sum_{i=1}^p \alpha_i R^2_{t-i} + \sum_{j=1}^q \beta_j \sigma^2_{T-j}$$

Here, $\omega = \gamma V_L$

----

GARCH is good when you suspect time-varying parameters. But: You increase the curse of dimensionality. Each new risk parameter you introduce $p+q+1$ new parameters.

There is a way out: Sparse GARCH models, like the factor GARCH or even better, the Orthogonal (factor) GARCH model:

$$\hat{\sigma}_{ij}^{\text{OGARCH}} = \hat{\sigma}_{ij}(t) = \sum_{k=1}^K \hat{\beta}_{ij}\hat{\beta}_{jk}\hat{\sigma}_{F_k}^2(t)$$

Then, in a 500-stocks-universe, when using a 2-factor models with GARCH(1,1) model for the volatility of each one of the two factors, we need 1506 parameters: We first need 500 volatility estimates for individual stock returns, plus 500 estimates of betas of stocks with respect to factor 1, 500 estimates of betas of stocks with respect to factor 2, and finally 3 GARCH parameter estimates for each factor, which gives a total of 500+500+500+2x3=1,506, which is not much more than if we had assumed constant volatility parameters. 

----

# Lab session: Covariance estimation

see `lab_22.ipynb`