# Cross validation

Check this [link](https://scikit-learn.org/stable/modules/cross_validation.html) for more details.

## Section 1

## Train and Test Splits Procedure 

This is the standard procedure we follow usually. Below in the file we will go through all the steps.

0. Clean and adjust data as necessary for X and y
1. Split Data in Train/Test for both X and y
2. Fit/Train Scaler on Training X Data
3. Scale X Test Data
4. Create Model
5. Fit/Train Model on X Train Data
6. Evaluate Model on X Test Data (by creating predictions and comparing to Y_test)
7. Adjust Parameters as Necessary and repeat steps 5 and 6

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#### Clean and adjust the data

In [2]:
tv = np.array([181,9,58,120,9,200,66,215,24,98,204,195,68,281,69,147,218,237,13,228,62,263,143,240,249])
radio = np.array([11,49,33,20,2,3,6,24,35,8,33,48,37,40,21,24,28,5,16,17,13,4,29,17,27])
newspaper = np.array([58,75,24,12,1,21,24,4,66,7,46,53,114,56,18,19,53,24,50,26,18,20,13,23,23])
sales = np.array([13,7,12,13,5,11,9,17,9,10,19,22,13,24,11,15,18,13,6,16,10,12,15,16,19])

df = pd.DataFrame({'tv': tv, 'radio': radio, 'newspaper': newspaper, 'sales': sales})
df.head()

Unnamed: 0,tv,radio,newspaper,sales
0,181,11,58,13
1,9,49,75,7
2,58,33,24,12
3,120,20,12,13
4,9,2,1,5


#### Split data

In [3]:
X = df.drop('sales', axis=1)
y = df['sales']

#### Training data

In [4]:
from sklearn.model_selection import train_test_split

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=99)

#### Scale the data

In [6]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

#### Create model

In [7]:
from sklearn.linear_model import Ridge

In [8]:
# poor alpha choice on purpose so we can also experiment later in adjustment phase with different values
model = Ridge(alpha=100)

#### Fit the model

In [9]:
model.fit(X_train, y_train)

In [10]:
y_pred = model.predict(X_test)

#### Evaluate

In [11]:
from sklearn.metrics import mean_squared_error

In [12]:
mean_squared_error(y_test, y_pred)

17.773862887978993

#### Adjust Parameters and Re-evaluate

In [13]:
# above we tried with alpha=100, let's try this time with alpha=1
model = Ridge(alpha=1)

In [14]:
model.fit(X_train, y_train)

In [15]:
y_pred = model.predict(X_test)

Much better! We could repeat this until satisfied with performance metrics. (We previously showed RidgeCV can do this for us, but the purpose of this example is to generalize the CV process for any model, not relying on using the CV version of the model).

In [16]:
mean_squared_error(y_test, y_pred)

5.5501643549300415

The downside with the above algorithm is, that we are adjusting the alpha based test data so we are adapting to the specific test split we have.

Below in Section 2 we will introduce better approach.

## Section 2

## Train, Validation and Test Splits Procedure 

This is often also called a "hold-out" set, since you should not adjust parameters based on the final test set, but instead use it *only* for reporting final expected performance.

0. Clean and adjust data as necessary for X and y
1. Split Data in Train/Validation/Test for both X and y
2. Fit/Train Scaler on Training X Data
3. Scale X Eval Data
4. Create Model
5. Fit/Train Model on X Train Data
6. Evaluate Model on X Evaluation Data (by creating predictions and comparing to Y_eval)
7. Adjust Parameters as Necessary and repeat steps 5 and 6
8. Get final metrics on Test set (not allowed to go back and adjust after this)

In [17]:
df = pd.DataFrame({'tv': tv, 'radio': radio, 'newspaper': newspaper, 'sales': sales})
df.head()

Unnamed: 0,tv,radio,newspaper,sales
0,181,11,58,13
1,9,49,75,7
2,58,33,24,12
3,120,20,12,13
4,9,2,1,5


In [18]:
# Split twice. Here we create TRAIN, VALIDATION and TEST splits
from sklearn.model_selection import train_test_split

# 70% of data is training data, set aside other 30%
X_train, X_OTHER, y_train, y_OTHER = train_test_split(X, y, test_size=0.3, random_state=99)

# Remaining 30% is split into evaluation and test sets
# Each is 15% of the original data size -> 50% of 30% = 15% of all data
X_eval, X_test, y_eval, y_test = train_test_split(X_OTHER, y_OTHER, test_size=0.5, random_state=99)

#### Scale

In [19]:
from sklearn.preprocessing import StandardScaler

In [20]:
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_eval = scaler.transform(X_eval)
X_test = scaler.transform(X_test)

#### Create model

In [21]:
from sklearn.linear_model import Ridge

In [22]:
# poor alpha choice on purpose so we can also experiment later in adjustment phase with different values
model = Ridge(alpha=100)
model.fit(X_train,y_train)

In [23]:
y_eval_pred = model.predict(X_eval)

#### Evaluate

In [24]:
from sklearn.metrics import mean_squared_error

In [25]:
mean_squared_error(y_eval, y_eval_pred)

33.195286034463614

#### Adjust parameters and Re-evaluate

In [26]:
model = Ridge(alpha=1)

In [27]:
model.fit(X_train, y_train)

In [28]:
y_eval_pred = model.predict(X_eval)

#### Another Evaluation

In [29]:
mean_squared_error(y_eval, y_eval_pred)

3.889194598720733

Final Evaluation (Can no longer edit parameters after this)

In [30]:
y_final_test_pred = model.predict(X_test)
mean_squared_error(y_test, y_final_test_pred)

7.2111341111393505

## Section 3

### Cross Validation with cross_val_score

In [31]:
df = pd.DataFrame({'tv': tv, 'radio': radio, 'newspaper': newspaper, 'sales': sales})
X = df.drop('sales', axis=1)
y = df['sales']

In [32]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=99)

In [33]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

In [34]:
model = Ridge(alpha=100)

In [35]:
from sklearn.model_selection import cross_val_score

scores = cross_val_score(model, X_train, y_train, scoring='neg_mean_squared_error', cv=5)
scores

array([-34.17245134,  -9.30048809, -28.30425856, -16.79666462,
       -21.75595568])

In [36]:
# Average of the MSE scores (we set back to positive)
abs(scores.mean())

22.06596365751537

#### Adjust model based on metrics

In [37]:
model = Ridge(alpha=1)

In [38]:
scores = cross_val_score(model, X_train, y_train, scoring='neg_mean_squared_error', cv=5)
scores

array([-9.57966163e+00, -9.72698049e-01, -3.37151451e+00, -8.54606227e-03,
       -2.27048848e+01])

In [39]:
abs(scores.mean())

7.327461010136988

#### Final Evaluation (Can no longer edit parameters after this)

In [40]:
model.fit(X_train, y_train)

In [41]:
y_final_test_pred = model.predict(X_test)

In [42]:
mean_squared_error(y_test, y_final_test_pred)

5.5501643549300415

## Section 4

### Cross Validation with cross_validate

In [43]:
df = pd.DataFrame({'tv': tv, 'radio': radio, 'newspaper': newspaper, 'sales': sales})
X = df.drop('sales', axis=1)
y = df['sales']

In [44]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=99)

In [45]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

In [46]:
model = Ridge(alpha=100)

In [47]:
from sklearn.model_selection import cross_validate

scores = cross_validate(model, X_train, y_train, scoring=['neg_mean_absolute_error','neg_mean_squared_error','max_error'], cv=5)
scores

{'fit_time': array([0.0010004 , 0.00100064, 0.00100017, 0.0010004 , 0.00100017]),
 'score_time': array([0.00099921, 0.00099969, 0.00099969, 0.00100017, 0.00100136]),
 'test_neg_mean_absolute_error': array([-4.79737317, -2.76182389, -4.89634718, -3.57446258, -3.78106983]),
 'test_neg_mean_squared_error': array([-34.17245134,  -9.30048809, -28.30425856, -16.79666462,
        -21.75595568]),
 'test_max_error': array([-9.78452126, -4.30661694, -7.2130104 , -6.33999698, -6.7831127 ])}

In [48]:
pd.DataFrame(scores)

Unnamed: 0,fit_time,score_time,test_neg_mean_absolute_error,test_neg_mean_squared_error,test_max_error
0,0.001,0.000999,-4.797373,-34.172451,-9.784521
1,0.001001,0.001,-2.761824,-9.300488,-4.306617
2,0.001,0.001,-4.896347,-28.304259,-7.21301
3,0.001,0.001,-3.574463,-16.796665,-6.339997
4,0.001,0.001001,-3.78107,-21.755956,-6.783113


In [49]:
pd.DataFrame(scores).mean()

fit_time                         0.001000
score_time                       0.001000
test_neg_mean_absolute_error    -3.962215
test_neg_mean_squared_error    -22.065964
test_max_error                  -6.885452
dtype: float64

#### Adjust model based on metrics

In [50]:
model = Ridge(alpha=1)

In [51]:
scores = cross_validate(model, X_train, y_train, scoring=['neg_mean_absolute_error','neg_mean_squared_error','max_error'], cv=5)

In [52]:
pd.DataFrame(scores).mean()

fit_time                        0.001193
score_time                      0.001010
test_neg_mean_absolute_error   -1.775063
test_neg_mean_squared_error    -7.327461
test_max_error                 -3.423313
dtype: float64

#### Final Evaluation (Can no longer edit parameters after this)

In [53]:
model.fit(X_train, y_train)

In [54]:
y_final_test_pred = model.predict(X_test)

In [55]:
mean_squared_error(y_test, y_final_test_pred)

5.5501643549300415

## Section 5

### Grid Search

We can search through a variety of combinations of hyperparameters with a grid search. While many linear models are quite simple and even come with their own specialized versions that do a search for you, this method of a grid search will can be applied to any model from sklearn, and we will need to use it later on for more complex models, such as Support Vector Machines.

In [56]:
df = pd.DataFrame({'tv': tv, 'radio': radio, 'newspaper': newspaper, 'sales': sales})
X = df.drop('sales', axis=1)
y = df['sales']

In [57]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=99)

In [58]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

#### Model

In [59]:
from sklearn.linear_model import ElasticNet

In [60]:
base_elastic_model = ElasticNet()

#### Grid search

A search consists of:

- an estimator (regressor or classifier such as sklearn.svm.SVC());
- a parameter space;
- a method for searching or sampling candidates;
- a cross-validation scheme 
- a score function.

In [61]:
param_grid = {'alpha':[0.1,1,5,10,50,100], 'l1_ratio':[.1, .5, .7, .9, .95, .99, 1]}

In [62]:
from sklearn.model_selection import GridSearchCV

In [63]:
grid_model = GridSearchCV(estimator=base_elastic_model,
                          param_grid=param_grid,
                          scoring='neg_mean_squared_error',
                          cv=5,
                          verbose=2)

In [64]:
grid_model.fit(X_train, y_train)

Fitting 5 folds for each of 42 candidates, totalling 210 fits
[CV] END ............................alpha=0.1, l1_ratio=0.1; total time=   0.0s
[CV] END ............................alpha=0.1, l1_ratio=0.1; total time=   0.0s
[CV] END ............................alpha=0.1, l1_ratio=0.1; total time=   0.0s
[CV] END ............................alpha=0.1, l1_ratio=0.1; total time=   0.0s
[CV] END ............................alpha=0.1, l1_ratio=0.1; total time=   0.0s
[CV] END ............................alpha=0.1, l1_ratio=0.5; total time=   0.0s
[CV] END ............................alpha=0.1, l1_ratio=0.5; total time=   0.0s
[CV] END ............................alpha=0.1, l1_ratio=0.5; total time=   0.0s
[CV] END ............................alpha=0.1, l1_ratio=0.5; total time=   0.0s
[CV] END ............................alpha=0.1, l1_ratio=0.5; total time=   0.0s
[CV] END ............................alpha=0.1, l1_ratio=0.7; total time=   0.0s
[CV] END ............................alpha=0.1,

[CV] END .............................alpha=50, l1_ratio=0.9; total time=   0.0s
[CV] END .............................alpha=50, l1_ratio=0.9; total time=   0.0s
[CV] END .............................alpha=50, l1_ratio=0.9; total time=   0.0s
[CV] END .............................alpha=50, l1_ratio=0.9; total time=   0.0s
[CV] END .............................alpha=50, l1_ratio=0.9; total time=   0.0s
[CV] END ............................alpha=50, l1_ratio=0.95; total time=   0.0s
[CV] END ............................alpha=50, l1_ratio=0.95; total time=   0.0s
[CV] END ............................alpha=50, l1_ratio=0.95; total time=   0.0s
[CV] END ............................alpha=50, l1_ratio=0.95; total time=   0.0s
[CV] END ............................alpha=50, l1_ratio=0.95; total time=   0.0s
[CV] END ............................alpha=50, l1_ratio=0.99; total time=   0.0s
[CV] END ............................alpha=50, l1_ratio=0.99; total time=   0.0s
[CV] END ...................

In [65]:
grid_model.best_estimator_

In [66]:
grid_model.best_params_

{'alpha': 0.1, 'l1_ratio': 0.1}

#### Using Best Model From Grid Search

In [67]:
y_pred = grid_model.predict(X_test)

In [68]:
from sklearn.metrics import mean_squared_error

In [69]:
mean_squared_error(y_test,y_pred)

5.500255800636027