### OVERVIEW:

* Cross Validation in Detail:
    * Train|Test Split
    * Train|Validation|Test Split
    * Scikit-Learn cross_val_score
    * Scikit-Learn cross_validation
* Grid Search

### TRAIN | TEST SPLIT PROCEDURE
    
    0. Clean and adjust data as necessary for X and y
    1. Split Data in Train/Test for both X and y
    2. Fit/ Train Scalar on Training X data
    3. Scale X Test Data
    4. Create Model
    5. Fit/ Train Model in X Trian Data
    6. Evaluate Model on X Test Data (by creating predictions and comparing to Y_test)
    7. Adjust Parameters as Necessary and repeat steps 5 and 6

### Cross_val_score function

In [4]:
import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

In [6]:
df= pd.read_csv('Advertising.csv') # reads the csv file

In [7]:
X = df.drop('sales', axis =1) # Drops the sales column

In [8]:
y=df['sales']

In [9]:
from sklearn.model_selection import train_test_split  # Imports train test split

In [10]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101) # creating the training and testing set size

In [11]:
from sklearn.preprocessing import StandardScaler # Importing the scaler

In [12]:
scaler = StandardScaler() # Creating a scalar instance

In [13]:
scaler.fit(X_train) # fits the X_train to the scaler

StandardScaler(copy=True, with_mean=True, with_std=True)

In [15]:
X_train = scaler.transform(X_train) # Transforms the X_train value

In [16]:
X_test = scaler.transform(X_test) # Transforms the X_test value

In [18]:
from sklearn.linear_model import Ridge

In [19]:
model= Ridge(alpha =100)

In [20]:
from sklearn.model_selection import cross_val_score

In [22]:
scores = cross_val_score(model, X_train,y_train, scoring= 'neg_mean_squared_error', cv=5) # The model is the model for which we need to cross_validate. the datas are the X_train and scores it for 5 times(cv=5) and the neg_mean_squared_error is the format in which we want the result in

In [23]:
scores # Gives the negative mean squared error

array([ -9.32552967,  -4.9449624 , -11.39665242,  -7.0242106 ,
        -8.38562723])

In [24]:
abs(scores.mean()) # Gives the positive mean of the scores variable

8.215396464543607

In [25]:
model = Ridge(alpha=1) # Reducing the alpha as the abs score wasn't great. So creating another model

In [26]:
scores = cross_val_score(model, X_train,y_train, scoring= 'neg_mean_squared_error', cv=5) # The model is the model for which we need to cross_validate. the datas are the X_train and scores it for 5 times(cv=5) and the neg_mean_squared_error is the format in which we want the result in

In [27]:
abs(scores.mean()) # Gives the positive mean of the scores variable # Now this score is better as we have modified the alpha.

3.344839296530695

In [28]:
model.fit(X_train,y_train)# Fitting on the training datas

Ridge(alpha=1, copy_X=True, fit_intercept=True, max_iter=None, normalize=False,
      random_state=None, solver='auto', tol=0.001)

In [29]:
y_final_predictions = model.predict(X_test) # Predicts on the X_test datas

In [30]:
from  sklearn.metrics import mean_absolute_error,mean_squared_error

In [31]:
mean_squared_error(y_test,y_final_predictions) # Gets the mean squared error on the y_test and y_final_prediction

2.3190215794287514

### Cross_Validate Function

* The cross_validate function allows us to view multiple performance metrics from cross validation on a model and explore how much time fitting and testing took.

In [33]:
# Create X and y
X = df.drop('sales', axis =1) # Drops the sales column
y = df['sales']

# TRAIN TEST SPLIT
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)

# SCALE DATA
from sklearn.preprocessing import StandardScaler 
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train) # X_train is a segment of data that we are using for cross validation
X_test =scaler.transform(X_test) # X_test is the final holdout test set for the final performance matrix. So we don't modify the hyperparameter for X_test

In [34]:
from sklearn.model_selection import cross_validate

In [35]:
model = Ridge(alpha=100)

In [36]:
score = cross_validate(model, X_train, y_train, scoring =['neg_mean_squared_error','neg_mean_absolute_error'], cv=10) # here the model has been run of 10 differnet folds and the scores has been calculated based on the dictionary entries

In [40]:
score # GIves the scores as a dictionary output

{'fit_time': array([0.00299788, 0.00199676, 0.00199914, 0.00199866, 0.00199842,
        0.00199842, 0.00199914, 0.00199866, 0.00199795, 0.00299644]),
 'score_time': array([0.00200009, 0.00199842, 0.00199819, 0.00199866, 0.00199914,
        0.0019989 , 0.00100017, 0.00199866, 0.00200033, 0.0010004 ]),
 'test_neg_mean_squared_error': array([ -6.06067062, -10.62703078,  -3.99342608,  -5.00949402,
         -9.14179955, -13.08625636,  -3.83940454,  -9.05878567,
         -9.05545685,  -5.77888211]),
 'test_neg_mean_absolute_error': array([-1.8102116 , -2.54195751, -1.46959386, -1.86276886, -2.52069737,
        -2.45999491, -1.45197069, -2.37739501, -2.44334397, -1.89979708])}

In [42]:
score = pd.DataFrame(score) # Creates a dataframe from the dictionary output obtained above
score # scored the neg MSE and Neg MAE based on the X_train and y_train

Unnamed: 0,fit_time,score_time,test_neg_mean_squared_error,test_neg_mean_absolute_error
0,0.002998,0.002,-6.060671,-1.810212
1,0.001997,0.001998,-10.627031,-2.541958
2,0.001999,0.001998,-3.993426,-1.469594
3,0.001999,0.001999,-5.009494,-1.862769
4,0.001998,0.001999,-9.1418,-2.520697
5,0.001998,0.001999,-13.086256,-2.459995
6,0.001999,0.001,-3.839405,-1.451971
7,0.001999,0.001999,-9.058786,-2.377395
8,0.001998,0.002,-9.055457,-2.443344
9,0.002996,0.001,-5.778882,-1.899797


In [43]:
score.mean() # Gives the mean of every column

fit_time                        0.002198
score_time                      0.001799
test_neg_mean_squared_error    -7.565121
test_neg_mean_absolute_error   -2.083773
dtype: float64

In [44]:
model = Ridge(alpha=1) # We can see if the model can perform better with alpha=1, so we do the below steps again

In [46]:
score = cross_validate(model, X_train, y_train, scoring =['neg_mean_squared_error','neg_mean_absolute_error'], cv=10)
score = pd.DataFrame(score)
score

Unnamed: 0,fit_time,score_time,test_neg_mean_squared_error,test_neg_mean_absolute_error
0,0.002998,0.001002,-2.962508,-1.457174
1,0.001999,0.001,-3.057378,-1.555308
2,0.001998,0.000999,-2.17374,-1.23877
3,0.000999,0.001999,-0.833034,-0.768938
4,0.002998,0.0,-3.464018,-1.434489
5,0.000999,0.0,-8.232647,-1.494316
6,0.001,0.000999,-1.905864,-1.081362
7,0.001,0.001,-2.765048,-1.250011
8,0.000999,0.002002,-4.989505,-1.580971
9,0.001999,0.000999,-2.846438,-1.223326


In [47]:
score.mean() # here we can see that the error has reduced when the alpha was changed

fit_time                        0.001699
score_time                      0.001000
test_neg_mean_squared_error    -3.323018
test_neg_mean_absolute_error   -1.308467
dtype: float64

In [48]:
model.fit(X_train,y_train)

Ridge(alpha=1, copy_X=True, fit_intercept=True, max_iter=None, normalize=False,
      random_state=None, solver='auto', tol=0.001)

In [49]:
y_final_pred = model.predict(X_test)

In [50]:
mean_squared_error(y_test,y_final_pred)

2.3190215794287514

### GRID SEARCH

* A grid search is a way of training and validating a model on every possible combiantion of multiple hyperparameter options.
* Scikit learn includes a GridSearchCV class capable of testing a dictionary of multiple hyperparameter options through cross-validation.
* This allows for both cross-validation and a grid search to be performed in a generalized way for any model

* Grid search is used to do both parameter choices(alpha and l1 ration) and the cross validation in one go.Each alpha is tested for eacy li and the cross validation is done

In [51]:
from sklearn.linear_model import ElasticNet

In [52]:
base_elastic_net_model= ElasticNet() # creating a instance of elastic net

In [53]:
param_grid ={'alpha':[0.1,1,5,10,50,100],'l1_ratio':[.1,.5,.7,.95,.99,1]} # Create a dicionary of paramater grid with a list of alphas and l1 ratios 

In [54]:
from sklearn.model_selection import GridSearchCV

In [55]:
grid_model = GridSearchCV(estimator=base_elastic_net_model,param_grid= param_grid,scoring='neg_mean_squared_error',cv=5, verbose=2) # verbose is the amount of messages we want.
# Here the estimator is the model that we want to apply on
# Param grid is the alpha and l1 ratios given as a dictinary.
# scoring is the error types that we want from the model run(MAE or MSE)
# CV is the number of folds
# Verbose is the amount of data we want in the output

In [56]:
grid_model.fit(X_train,y_train) # Fits the grid over the training set"

Fitting 5 folds for each of 36 candidates, totalling 180 fits
[CV] alpha=0.1, l1_ratio=0.1 .........................................
[CV] .......................... alpha=0.1, l1_ratio=0.1, total=   0.2s
[CV] alpha=0.1, l1_ratio=0.1 .........................................
[CV] .......................... alpha=0.1, l1_ratio=0.1, total=   0.0s
[CV] alpha=0.1, l1_ratio=0.1 .........................................
[CV] .......................... alpha=0.1, l1_ratio=0.1, total=   0.0s
[CV] alpha=0.1, l1_ratio=0.1 .........................................
[CV] .......................... alpha=0.1, l1_ratio=0.1, total=   0.0s
[CV] alpha=0.1, l1_ratio=0.1 .........................................
[CV] .......................... alpha=0.1, l1_ratio=0.1, total=   0.0s
[CV] alpha=0.1, l1_ratio=0.5 .........................................
[CV] .......................... alpha=0.1, l1_ratio=0.5, total=   0.0s
[CV] alpha=0.1, l1_ratio=0.5 .........................................
[CV] ..........

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.1s remaining:    0.0s


[CV] ............................ alpha=1, l1_ratio=0.1, total=   0.1s
[CV] alpha=1, l1_ratio=0.1 ...........................................
[CV] ............................ alpha=1, l1_ratio=0.1, total=   0.0s
[CV] alpha=1, l1_ratio=0.1 ...........................................
[CV] ............................ alpha=1, l1_ratio=0.1, total=   0.0s
[CV] alpha=1, l1_ratio=0.1 ...........................................
[CV] ............................ alpha=1, l1_ratio=0.1, total=   0.0s
[CV] alpha=1, l1_ratio=0.1 ...........................................
[CV] ............................ alpha=1, l1_ratio=0.1, total=   0.0s
[CV] alpha=1, l1_ratio=0.5 ...........................................
[CV] ............................ alpha=1, l1_ratio=0.5, total=   0.0s
[CV] alpha=1, l1_ratio=0.5 ...........................................
[CV] ............................ alpha=1, l1_ratio=0.5, total=   0.0s
[CV] alpha=1, l1_ratio=0.5 ...........................................
[CV] .

[CV] alpha=10, l1_ratio=0.1 ..........................................
[CV] ........................... alpha=10, l1_ratio=0.1, total=   0.0s
[CV] alpha=10, l1_ratio=0.1 ..........................................
[CV] ........................... alpha=10, l1_ratio=0.1, total=   0.0s
[CV] alpha=10, l1_ratio=0.5 ..........................................
[CV] ........................... alpha=10, l1_ratio=0.5, total=   0.0s
[CV] alpha=10, l1_ratio=0.5 ..........................................
[CV] ........................... alpha=10, l1_ratio=0.5, total=   0.0s
[CV] alpha=10, l1_ratio=0.5 ..........................................
[CV] ........................... alpha=10, l1_ratio=0.5, total=   0.0s
[CV] alpha=10, l1_ratio=0.5 ..........................................
[CV] ........................... alpha=10, l1_ratio=0.5, total=   0.0s
[CV] alpha=10, l1_ratio=0.5 ..........................................
[CV] ........................... alpha=10, l1_ratio=0.5, total=   0.0s
[CV] a

[CV] .......................... alpha=100, l1_ratio=0.1, total=   0.0s
[CV] alpha=100, l1_ratio=0.1 .........................................
[CV] .......................... alpha=100, l1_ratio=0.1, total=   0.0s
[CV] alpha=100, l1_ratio=0.1 .........................................
[CV] .......................... alpha=100, l1_ratio=0.1, total=   0.0s
[CV] alpha=100, l1_ratio=0.1 .........................................
[CV] .......................... alpha=100, l1_ratio=0.1, total=   0.0s
[CV] alpha=100, l1_ratio=0.5 .........................................
[CV] .......................... alpha=100, l1_ratio=0.5, total=   0.0s
[CV] alpha=100, l1_ratio=0.5 .........................................
[CV] .......................... alpha=100, l1_ratio=0.5, total=   0.0s
[CV] alpha=100, l1_ratio=0.5 .........................................
[CV] .......................... alpha=100, l1_ratio=0.5, total=   0.0s
[CV] alpha=100, l1_ratio=0.5 .........................................
[CV] .

[Parallel(n_jobs=1)]: Done 180 out of 180 | elapsed:    0.7s finished


GridSearchCV(cv=5, error_score='raise-deprecating',
             estimator=ElasticNet(alpha=1.0, copy_X=True, fit_intercept=True,
                                  l1_ratio=0.5, max_iter=1000, normalize=False,
                                  positive=False, precompute=False,
                                  random_state=None, selection='cyclic',
                                  tol=0.0001, warm_start=False),
             iid='warn', n_jobs=None,
             param_grid={'alpha': [0.1, 1, 5, 10, 50, 100],
                         'l1_ratio': [0.1, 0.5, 0.7, 0.95, 0.99, 1]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring='neg_mean_squared_error', verbose=2)

In [58]:
grid_model.best_estimator_ # Gives the best estimator combination

ElasticNet(alpha=0.1, copy_X=True, fit_intercept=True, l1_ratio=1,
           max_iter=1000, normalize=False, positive=False, precompute=False,
           random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [59]:
grid_model.best_params_ # Gives the best paramters as a dictionary

{'alpha': 0.1, 'l1_ratio': 1}

In [60]:
grid_model.cv_results_

{'mean_fit_time': array([0.03178864, 0.00139856, 0.00120053, 0.00120053, 0.00119543,
        0.00119925, 0.01809177, 0.00200248, 0.00179796, 0.00120292,
        0.00119905, 0.00079966, 0.00119929, 0.00119944, 0.00079994,
        0.00119877, 0.00160055, 0.00099931, 0.00119925, 0.00179901,
        0.00179839, 0.00139761, 0.00120072, 0.00119905, 0.0012002 ,
        0.00139961, 0.00099936, 0.0010006 , 0.00119948, 0.00119948,
        0.00139809, 0.0012001 , 0.00139861, 0.00119882, 0.00119987,
        0.00119977]),
 'std_fit_time': array([6.10798696e-02, 4.89690473e-04, 7.48368395e-04, 4.02403231e-04,
        4.02153926e-04, 3.98636406e-04, 2.82078186e-02, 6.25049016e-04,
        7.46878908e-04, 3.97969806e-04, 3.99210119e-04, 3.99828199e-04,
        3.99804183e-04, 3.99971150e-04, 3.99972174e-04, 3.98993531e-04,
        4.89046788e-04, 9.48893964e-07, 3.99709299e-04, 7.47093245e-04,
        3.99232320e-04, 4.88431170e-04, 3.99329965e-04, 3.98969949e-04,
        3.99113186e-04, 4.89414997e-0

In [61]:
pd.DataFrame(grid_model.cv_results_) # gives the above in the form of a dataframe

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_alpha,param_l1_ratio,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.031789,0.06107987,0.000801,0.0004006198,0.1,0.1,"{'alpha': 0.1, 'l1_ratio': 0.1}",-3.453021,-1.40519,-5.789125,-2.187302,-4.645576,-3.496043,1.591601,6
1,0.001399,0.0004896905,0.0004,0.0004902129,0.1,0.5,"{'alpha': 0.1, 'l1_ratio': 0.5}",-3.32544,-1.427522,-5.59561,-2.163089,-4.451679,-3.392668,1.506827,5
2,0.001201,0.0007483684,0.0004,0.0004896872,0.1,0.7,"{'alpha': 0.1, 'l1_ratio': 0.7}",-3.26988,-1.442432,-5.502437,-2.16395,-4.356738,-3.347088,1.462765,4
3,0.001201,0.0004024032,0.0004,0.000489804,0.1,0.95,"{'alpha': 0.1, 'l1_ratio': 0.95}",-3.213052,-1.472417,-5.396258,-2.177452,-4.24108,-3.300052,1.406248,3
4,0.001195,0.0004021539,0.0006,0.0004897068,0.1,0.99,"{'alpha': 0.1, 'l1_ratio': 0.99}",-3.208124,-1.478489,-5.380242,-2.181097,-4.222968,-3.294184,1.396953,2
5,0.001199,0.0003986364,0.000201,0.0004010201,0.1,1.0,"{'alpha': 0.1, 'l1_ratio': 1}",-3.206943,-1.480065,-5.376257,-2.182076,-4.21846,-3.29276,1.394613,1
6,0.018092,0.02820782,0.001404,0.0004863019,1.0,0.1,"{'alpha': 1, 'l1_ratio': 0.1}",-9.827475,-5.261525,-11.875347,-7.449195,-8.542329,-8.591174,2.222939,12
7,0.002002,0.000625049,0.000797,0.0003983431,1.0,0.5,"{'alpha': 1, 'l1_ratio': 0.5}",-8.707071,-4.214228,-10.879261,-6.204545,-7.173031,-7.435627,2.255532,11
8,0.001798,0.0007468789,0.0008,0.0004001384,1.0,0.7,"{'alpha': 1, 'l1_ratio': 0.7}",-7.92087,-3.549562,-10.024877,-5.379553,-6.324836,-6.63994,2.206213,10
9,0.001203,0.0003979698,0.0006,0.0004895617,1.0,0.95,"{'alpha': 1, 'l1_ratio': 0.95}",-6.729435,-2.591285,-8.709842,-4.156317,-5.329916,-5.503359,2.102835,9


In [62]:
y_pred = grid_model.predict(X_test) # Now we can directly use predict and it would choose the best predictor and apply over the data by itself

In [63]:
from sklearn.metrics import mean_squared_error

In [64]:
mean_squared_error(y_test,y_pred) # Gets the error between y_test and y_pred

2.3873426420874737