# Introduction

This notebook is about using the GLasso algorithm

I will use the scikit learn version of it, with automated cross validation.

In [1]:
import pandas as pd
from sklearn.covariance import GraphLasso,GraphLassoCV,EmpiricalCovariance
import numpy as np
from numpy.linalg import inv
%matplotlib inline

In [2]:
df_mc=pd.read_csv("data/X_test.csv",header=None)

In [3]:
df_mc.head()

Unnamed: 0,0,1,2,3
0,-0.199058,0.155569,-0.067543,1.434591
1,0.309032,-0.046133,-1.78538,0.681475
2,0.09717,-0.376027,0.327607,-0.81416
3,-1.237077,-1.328336,1.955127,-1.544141
4,1.161363,1.067888,0.02233,0.603614


In [4]:
df_mc.describe()

Unnamed: 0,0,1,2,3
count,1000.0,1000.0,1000.0,1000.0
mean,-1.6431300000000003e-17,-5.984102e-17,1.144917e-16,1.1046720000000001e-17
std,1.0005,1.0005,1.0005,1.0005
min,-2.991931,-3.013649,-3.469819,-3.652721
25%,-0.7167359,-0.7111839,-0.6850007,-0.6896844
50%,0.030826,-0.00417444,-0.004400892,-0.01310217
75%,0.7093195,0.6905593,0.659335,0.6756227
max,3.198116,3.164379,2.749569,3.101828


Now let's fit a cross validated graph lasso estimator

In [11]:
glasso=GraphLassoCV()
glasso.fit(df_mc)

GraphLassoCV(alphas=4, assume_centered=False, cv=None, enet_tol=0.0001,
       max_iter=100, mode='cd', n_jobs=1, n_refinements=4, tol=0.0001,
       verbose=False)

In [12]:
glasso.covariance_

array([[ 1.        ,  0.82703486, -0.441698  ,  0.38985575],
       [ 0.82703486,  1.        , -0.5340743 ,  0.47138974],
       [-0.441698  , -0.5340743 ,  1.        , -0.23098084],
       [ 0.38985575,  0.47138974, -0.23098084,  1.        ]])

In [13]:
glasso.score(df_mc)

-4.7938536946241177

Let's compare this to the empirical estimate

In [14]:
empir=EmpiricalCovariance()
empir.fit(df_mc)

EmpiricalCovariance(assume_centered=False, store_precision=True)

In [15]:
empir.covariance_

array([[ 1.        ,  0.82999523, -0.4436062 ,  0.38911997],
       [ 0.82999523,  1.        , -0.5370343 ,  0.47434828],
       [-0.4436062 , -0.5370343 ,  1.        , -0.22802231],
       [ 0.38911997,  0.47434828, -0.22802231,  1.        ]])

In [16]:
empir.score(df_mc)

-4.7936693172926006