# Lecture 9

* Cross validation
* Regularization

Import necessary libraries:

In [1]:
import numpy as np
import pandas as pd
import random
import math
import sklearn.datasets as ds
%matplotlib inline

# Cross validation

Cross validation is one of the most important procedure, when you select the model or tune the parameters. There are several possibilities to perform cross validation in the ```scikit-learn```.

In [2]:
from sklearn.cross_validation import train_test_split
from sklearn.cross_validation import StratifiedKFold
from sklearn.cross_validation import LeaveOneOut

The simplest way to build cross validation splits is to use the method ```train_test_split```:

In [3]:
boston = ds.load_boston()
X = boston.data
y = boston.target/50.
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.33, 
                                                    random_state=42)

In [4]:
print 'Number of training examples: ' + str(X_train.shape[0])
print 'Number of validation examples: ' + str(X_test.shape[0])

Number of training examples: 339
Number of validation examples: 167


Another way is to use K-fold or leave-one-out cross validation procedures. Notice that we use the method ```StratifiedKFold``` that guarantees that the distribution of target variable stays the same for different folds.

In [5]:
skf = StratifiedKFold(y, n_folds=3, shuffle=True, random_state=21387)
for train_index, test_index in skf:
    X_train = X[train_index]
    X_test = X[test_index]    
    print 'Number of training examples: ' + str(X_train.shape[0])
    print 'Number of validation examples: ' + str(X_test.shape[0])

Number of training examples: 323
Number of validation examples: 183
Number of training examples: 332
Number of validation examples: 174
Number of training examples: 357
Number of validation examples: 149




Using the method ```LeaveOneOut``` we can perform leave-one-out cross validation procedure in similar way.

# Regularization

In this section we show how regularization procedure affects the weights of the linear regression model. We generate data with 10 features with 6 out of 10 informative features: 

In [6]:
X,y = ds.make_regression(n_samples=100, 
                         n_features=10, 
                         n_informative=6, 
                         noise=1.0, 
                         bias=0, 
                         random_state=2016)
y = y/10.

Different types of regularizations are implemented in the ```scikit-learn``` package. We will use ```Lasso``` class that implements L1 regularization.

In [7]:
from sklearn.linear_model import Lasso

Now let us observe the model coefficients with respect to different values of the regularization parameter $\alpha$.

In [8]:
for a in [0.0, 0.1, 0.5, 1.0, 10.0]:
    print 'alpha = ' + str(a) + ':'
    model = Lasso(alpha=a, normalize=False, max_iter=1000000)
    model.fit(X,y)
    print model.coef_
    print '***************'

alpha = 0.0:
[ -5.52107945e-03  -3.67860089e-03   7.64629577e+00   1.73280696e-01
   2.82826703e-01   2.28833747e+00   5.71482110e-03   4.49741291e+00
   5.56384364e+00   1.73914964e-03]
***************
alpha = 0.1:
[ 0.          0.          7.51109466  0.09366317  0.14180965  2.18308828
 -0.          4.39417767  5.4250924  -0.        ]
***************
alpha = 0.5:
[ 0.          0.          7.00822133  0.          0.          1.77766907
 -0.          3.9454249   4.87546516 -0.        ]
***************
alpha = 1.0:
[ 0.          0.          6.39221165  0.          0.          1.27978247
 -0.          3.36381678  4.19028074 -0.        ]
***************
alpha = 10.0:
[ 0.  0.  0.  0. -0.  0.  0.  0.  0. -0.]
***************


  positive)
