
<br>
=======================================================================<br>
Shrinkage covariance estimation: LedoitWolf vs OAS and max-likelihood<br>
=======================================================================<br>
When working with covariance estimation, the usual approach is to use<br>
a maximum likelihood estimator, such as the<br>
:class:`sklearn.covariance.EmpiricalCovariance`. It is unbiased, i.e. it<br>
converges to the true (population) covariance when given many<br>
observations. However, it can also be beneficial to regularize it, in<br>
order to reduce its variance; this, in turn, introduces some bias. This<br>
example illustrates the simple regularization used in<br>
:ref:`shrunk_covariance` estimators. In particular, it focuses on how to<br>
set the amount of regularization, i.e. how to choose the bias-variance<br>
trade-off.<br>
Here we compare 3 approaches:<br>
* Setting the parameter by cross-validating the likelihood on three folds<br>
  according to a grid of potential shrinkage parameters.<br>
* A close formula proposed by Ledoit and Wolf to compute<br>
  the asymptotically optimal regularization parameter (minimizing a MSE<br>
  criterion), yielding the :class:`sklearn.covariance.LedoitWolf`<br>
  covariance estimate.<br>
* An improvement of the Ledoit-Wolf shrinkage, the<br>
  :class:`sklearn.covariance.OAS`, proposed by Chen et al. Its<br>
  convergence is significantly better under the assumption that the data<br>
  are Gaussian, in particular for small samples.<br>
To quantify estimation error, we plot the likelihood of unseen data for<br>
different values of the shrinkage parameter. We also show the choices by<br>
cross-validation, or with the LedoitWolf and OAS estimates.<br>
Note that the maximum likelihood estimate corresponds to no shrinkage,<br>
and thus performs poorly. The Ledoit-Wolf estimate performs really well,<br>
as it is close to the optimal and is computational not costly. In this<br>
example, the OAS estimate is a bit further away. Interestingly, both<br>
approaches outperform cross-validation, which is significantly most<br>
computationally costly.<br>


In [None]:
print(__doc__)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import linalg

In [None]:
from sklearn.covariance import LedoitWolf, OAS, ShrunkCovariance, \
    log_likelihood, empirical_covariance
from sklearn.model_selection import GridSearchCV

#############################################################################<br>
Generate sample data

In [None]:
n_features, n_samples = 40, 20
np.random.seed(42)
base_X_train = np.random.normal(size=(n_samples, n_features))
base_X_test = np.random.normal(size=(n_samples, n_features))

Color samples

In [None]:
coloring_matrix = np.random.normal(size=(n_features, n_features))
X_train = np.dot(base_X_train, coloring_matrix)
X_test = np.dot(base_X_test, coloring_matrix)

#############################################################################<br>
Compute the likelihood on test data

spanning a range of possible shrinkage coefficient values

In [None]:
shrinkages = np.logspace(-2, 0, 30)
negative_logliks = [-ShrunkCovariance(shrinkage=s).fit(X_train).score(X_test)
                    for s in shrinkages]

under the ground-truth model, which we would not have access to in real<br>
settings

In [None]:
real_cov = np.dot(coloring_matrix.T, coloring_matrix)
emp_cov = empirical_covariance(X_train)
loglik_real = -log_likelihood(emp_cov, linalg.inv(real_cov))

#############################################################################<br>
Compare different approaches to setting the parameter

GridSearch for an optimal shrinkage coefficient

In [None]:
tuned_parameters = [{'shrinkage': shrinkages}]
cv = GridSearchCV(ShrunkCovariance(), tuned_parameters)
cv.fit(X_train)

Ledoit-Wolf optimal shrinkage coefficient estimate

In [None]:
lw = LedoitWolf()
loglik_lw = lw.fit(X_train).score(X_test)

OAS coefficient estimate

In [None]:
oa = OAS()
loglik_oa = oa.fit(X_train).score(X_test)

#############################################################################<br>
Plot results

In [None]:
fig = plt.figure()
plt.title("Regularized covariance: likelihood and shrinkage coefficient")
plt.xlabel('Regularization parameter: shrinkage coefficient')
plt.ylabel('Error: negative log-likelihood on test data')
# range shrinkage curve
plt.loglog(shrinkages, negative_logliks, label="Negative log-likelihood")

In [None]:
plt.plot(plt.xlim(), 2 * [loglik_real], '--r',
         label="Real covariance likelihood")

adjust view

In [None]:
lik_max = np.amax(negative_logliks)
lik_min = np.amin(negative_logliks)
ymin = lik_min - 6. * np.log((plt.ylim()[1] - plt.ylim()[0]))
ymax = lik_max + 10. * np.log(lik_max - lik_min)
xmin = shrinkages[0]
xmax = shrinkages[-1]
# LW likelihood
plt.vlines(lw.shrinkage_, ymin, -loglik_lw, color='magenta',
           linewidth=3, label='Ledoit-Wolf estimate')
# OAS likelihood
plt.vlines(oa.shrinkage_, ymin, -loglik_oa, color='purple',
           linewidth=3, label='OAS estimate')
# best CV estimator likelihood
plt.vlines(cv.best_estimator_.shrinkage, ymin,
           -cv.best_estimator_.score(X_test), color='cyan',
           linewidth=3, label='Cross-validation best estimate')

In [None]:
plt.ylim(ymin, ymax)
plt.xlim(xmin, xmax)
plt.legend()

In [None]:
plt.show()