# Matrix shrinkage analysis (covariance matrix)

### Objective is to explore and develope a method for dealing with large $N$ matrices, such as those encountered in portfolio optimization. Keeping in mind that the typical covariance matrix contains some N(N-1)/2 values
#### ->> Use matrix shrinkage for covariance estimation (SKLEARN)
```
from sklearn.covariance import LedoitWolf, OAS, ShrunkCovariance
```

#### ->> Idea: Empirical covariance matrix good predictor for true covariance matrix. In accordance with the MLE estimator of the population covariance matrix, but poor estimator of eigen values! Thus introduce shrinkage penalty, such that the covariance eigen values, $\{ \lambda_1, \dots, \lambda_N \}$, are scaled in a way which minimizes the ratio between largest and smallest eigen values

#### ->> From Ledoit & Wolf (2015) Spectrum Estimation ...
#### The authors acknowledge that there are numerous approaches to estimating covariance matrices that incl. the empircal covar matrix, which provides an unbiased estimator for the true covar matrix. An approach based on the linear shrinkage estimator i.e. Ledoit & Wolf (2004) and the non-linear methods as proposed by the same authors (2012).

#### ->> We choose to implement the linear shrinkage approach, detailed in Ledoit & Wolf (2004), as it provides a well founded starting point for the analysis which is to follow.

In [58]:
## Alway run this block first! 
%matplotlib inline
# coding=utf-8

import sys
sys.path.extend(['/Users/Dim/Desktop/school_folder/masters_thesis/gitCodeRepo/codePython/collateralOptimizer/'])
from dataPreProcess import *
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
from sklearn.covariance import LedoitWolf, empirical_covariance

# Python 3 style division operator w/o the need to convert to float
from __future__ import division

In [59]:
# Import data from .csv
df = pd.read_csv('/Users/Dim/Desktop/school_folder/masters_thesis/gitCodeRepo/data/noStaleX_returnsData_20160825.csv', sep=';')
df.index = df.date
df = df.drop('date', axis=1)

# Estimate covariance using LedoitWolf, first create instance of object
lw = LedoitWolf(assume_centered=True)
lwFitted = lw.fit(X=df).covariance_

# Estimate covariance using Empirical/MLE 
mleFitted = empirical_covariance(X=df, assume_centered=True)

In [61]:
# Test for definiteness of the covariance matrix, as suggested by TOBAM paper
def is_pos_def(x):
    return np.all(np.linalg.eigvals(x) > 0)

print "MLE Method"
print is_pos_def(mleFitted)

print "LW Method"
print is_pos_def(lwFitted)

MLE Method
True
LW Method
True


In [62]:
# Test portfolio variance for different estimation procedures
# Suppose pVar = w'Sw
N = df.shape[1]
w = [1/N] * N
w = np.asarray(w)
pVarMLE = np.dot((np.dot(np.transpose(w), mleFitted)), w)
pVarLW  = np.dot((np.dot(np.transpose(w), lwFitted)), w)

In [63]:
pVarMLE

0.00011339943392707333

In [64]:
pVarLW

9.7997861138801709e-05

# Conclusions
#### ->> The L&W method produces smaller portfolio variance than the empirical/MLE method for the constrained sample (i.e. w/o stale time-series). It it well documented and is readily useable in Python

#### ->> It is proposed that we continue with it's use throughout the entirety of the estimation procedure 

#### ->> For further empirical proof: http://scikit-learn.org/stable/auto_examples/covariance/plot_covariance_estimation.html