The purpose of this notebook is to get clear idea of cross validation and Grid search cross validation methods.

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
df = pd.read_csv('/content/drive/MyDrive/UNZIP_FOR_NOTEBOOKS_FINAL/08-Linear-Regression-Models/Advertising.csv')
df.head()

Unnamed: 0,TV,radio,newspaper,sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,9.3
3,151.5,41.3,58.5,18.5
4,180.8,10.8,58.4,12.9


----
----
----
## Train | Validation | Test Split Procedure 

This is often also called a "hold-out" set, since you should not adjust parameters based on the final test set, but instead use it *only* for reporting final expected performance.

0. Clean and adjust data as necessary for X and y
1. Split Data in Train/Validation/Test for both X and y
2. Fit/Train Scaler on Training X Data
3. Scale X Eval Data
4. Create Model
5. Fit/Train Model on X Train Data
6. Evaluate Model on X Evaluation Data (by creating predictions and comparing to Y_eval)
7. Adjust Parameters as Necessary and repeat steps 5 and 6
8. Get final metrics on Test set (not allowed to go back and adjust after this!)

In [5]:
X = df.drop('sales',axis=1)
y = df['sales']

In [6]:
from sklearn.model_selection import train_test_split

# 70% of data is training data, set aside other 30%
X_train, X_OTHER, y_train, y_OTHER = train_test_split(X, y, test_size=0.3, random_state=101)

# Remaining 30% is split into evaluation and test sets
# Each is 15% of the original data size
X_eval, X_test, y_eval, y_test = train_test_split(X_OTHER, y_OTHER, test_size=0.5, random_state=101)

## Scaling

In [8]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_eval = scaler.transform(X_eval)
X_test = scaler.transform(X_test)


In [10]:
from sklearn.linear_model import Ridge
model = Ridge(alpha = 100)   #Choosing a model

# Cross_val_score

In [11]:
from sklearn.model_selection import cross_val_score

scores = cross_val_score(model,X_train,y_train, scoring = 'neg_mean_squared_error', cv = 5)

In [12]:
scores

array([ -9.32552967,  -4.9449624 , -11.39665242,  -7.0242106 ,
        -8.38562723])

In [14]:
abs(scores.mean())

8.215396464543606

Adjusting the model's parmeters to improve the performance

In [15]:
model = Ridge(alpha = 1)

scores = cross_val_score(model,X_train,y_train, scoring = 'neg_mean_squared_error', cv = 5)

abs(scores.mean())

3.344839296530695


# Cross Validation with cross_validate

The cross_validate function differs from cross_val_score in two ways:

It allows specifying multiple metrics for evaluation.

It returns a dict containing fit-times, score-times (and optionally training scores as well as fitted estimators) in addition to the test score.

In [16]:
from sklearn.model_selection import cross_validate

In [17]:
scores = cross_validate(model,X_train,y_train,
                         scoring=['neg_mean_absolute_error','neg_mean_squared_error','max_error'],cv=5)

In [21]:
pd.DataFrame(scores)

Unnamed: 0,fit_time,score_time,test_neg_mean_absolute_error,test_neg_mean_squared_error,test_max_error
0,0.001312,0.001112,-1.547117,-3.155132,-3.0883
1,0.001075,0.000916,-1.026044,-1.58087,-2.817441
2,0.007902,0.001191,-1.400793,-5.404556,-9.353209
3,0.001055,0.000933,-1.154251,-2.216545,-4.055856
4,0.001023,0.000912,-1.470222,-4.367094,-6.490922


In [22]:
pd.DataFrame(scores).mean()

fit_time                        0.002473
score_time                      0.001013
test_neg_mean_absolute_error   -1.319685
test_neg_mean_squared_error    -3.344839
test_max_error                 -5.161145
dtype: float64

Final Evaluation

In [24]:
model.fit(X_train,y_train)

Ridge(alpha=1)

In [25]:
y_final_test_pred = model.predict(X_test)

In [27]:
from sklearn.metrics import mean_squared_error
mean_squared_error(y_test,y_final_test_pred)

2.2542600838005176

# GridSearch


We can search through a variety of combinations of hyperparameters with a grid search. While many linear models are quite simple and even come with their own specialized versions that do a search for you, this method of a grid search will can be applied to any model from sklearn.

In [39]:
from sklearn.linear_model import ElasticNet
from sklearn.model_selection import GridSearchCV

In [30]:
base_model = ElasticNet()

In [None]:
help(ElasticNet)

A search consists of:

* an estimator (base_model);
* a parameter space;
* a method for searching or sampling candidates;
* a cross-validation scheme 
* a score function.

In [32]:
param_grid = {'alpha':[0.1,1,5,10,50,100],
              'l1_ratio':[.1, .5, .7, .9, .95, .99, 1]}

In [43]:
grid_search = GridSearchCV(estimator = base_model,
                           param_grid=param_grid,
                           scoring = 'neg_mean_squared_error',
                           cv = 5, verbose = 2)

In [44]:
grid_search.fit(X_train,y_train)

Fitting 5 folds for each of 42 candidates, totalling 210 fits
[CV] END ............................alpha=0.1, l1_ratio=0.1; total time=   0.0s
[CV] END ............................alpha=0.1, l1_ratio=0.1; total time=   0.0s
[CV] END ............................alpha=0.1, l1_ratio=0.1; total time=   0.0s
[CV] END ............................alpha=0.1, l1_ratio=0.1; total time=   0.0s
[CV] END ............................alpha=0.1, l1_ratio=0.1; total time=   0.0s
[CV] END ............................alpha=0.1, l1_ratio=0.5; total time=   0.0s
[CV] END ............................alpha=0.1, l1_ratio=0.5; total time=   0.0s
[CV] END ............................alpha=0.1, l1_ratio=0.5; total time=   0.0s
[CV] END ............................alpha=0.1, l1_ratio=0.5; total time=   0.0s
[CV] END ............................alpha=0.1, l1_ratio=0.5; total time=   0.0s
[CV] END ............................alpha=0.1, l1_ratio=0.7; total time=   0.0s
[CV] END ............................alpha=0.1,

GridSearchCV(cv=5, estimator=ElasticNet(),
             param_grid={'alpha': [0.1, 1, 5, 10, 50, 100],
                         'l1_ratio': [0.1, 0.5, 0.7, 0.9, 0.95, 0.99, 1]},
             scoring='neg_mean_squared_error', verbose=2)

In [46]:
grid_search.best_params_

{'alpha': 0.1, 'l1_ratio': 1}

In [47]:
pd.DataFrame(grid_search.cv_results_)

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_alpha,param_l1_ratio,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.001222,0.000645,0.000469,0.0002,0.1,0.1,"{'alpha': 0.1, 'l1_ratio': 0.1}",-3.453021,-1.40519,-5.789125,-2.187302,-4.645576,-3.496043,1.591601,7
1,0.000838,1e-05,0.000354,7e-06,0.1,0.5,"{'alpha': 0.1, 'l1_ratio': 0.5}",-3.32544,-1.427522,-5.59561,-2.163089,-4.451679,-3.392668,1.506827,6
2,0.000808,1.9e-05,0.000351,6e-06,0.1,0.7,"{'alpha': 0.1, 'l1_ratio': 0.7}",-3.26988,-1.442432,-5.502437,-2.16395,-4.356738,-3.347088,1.462765,5
3,0.000827,4.7e-05,0.000372,6e-06,0.1,0.9,"{'alpha': 0.1, 'l1_ratio': 0.9}",-3.221397,-1.465339,-5.416447,-2.173493,-4.263887,-3.308112,1.417693,4
4,0.000803,6e-06,0.000348,4e-06,0.1,0.95,"{'alpha': 0.1, 'l1_ratio': 0.95}",-3.213052,-1.472417,-5.396258,-2.177452,-4.24108,-3.300052,1.406248,3
5,0.000786,1.1e-05,0.000371,1.7e-05,0.1,0.99,"{'alpha': 0.1, 'l1_ratio': 0.99}",-3.208124,-1.478489,-5.380242,-2.181097,-4.222968,-3.294184,1.396953,2
6,0.000808,1.1e-05,0.000344,6e-06,0.1,1.0,"{'alpha': 0.1, 'l1_ratio': 1}",-3.206943,-1.480065,-5.376257,-2.182076,-4.21846,-3.29276,1.394613,1
7,0.000803,1.1e-05,0.000365,1e-05,1.0,0.1,"{'alpha': 1, 'l1_ratio': 0.1}",-9.827475,-5.261525,-11.875347,-7.449195,-8.542329,-8.591174,2.222939,14
8,0.000815,9e-06,0.00035,5e-06,1.0,0.5,"{'alpha': 1, 'l1_ratio': 0.5}",-8.707071,-4.214228,-10.879261,-6.204545,-7.173031,-7.435627,2.255532,13
9,0.001304,0.000581,0.000507,0.000123,1.0,0.7,"{'alpha': 1, 'l1_ratio': 0.7}",-7.92087,-3.549562,-10.024877,-5.379553,-6.324836,-6.63994,2.206213,12


In [51]:
y_pred_grid = grid_search.predict(X_test)
mean_squared_error(y_test,y_pred_grid)

2.304617137424956

## Conclusion: Through gridsearch and ElasticNet we have determined that L1 regularization is the best suited model for this dataset with alpha = 1.