___

<a href='http://www.pieriandata.com'><img src='../Pierian_Data_Logo.png'/></a>
___
<center><em>Copyright by Pierian Data Inc.</em></center>
<center><em>For more information, visit us at <a href='http://www.pieriandata.com'>www.pieriandata.com</a></em></center>

# Regularization with SciKit-Learn

Previously we created a new polynomial feature set and then applied our standard linear regression on it, but we can be smarter about model choice and utilize regularization.

Regularization attempts to minimize the RSS (residual sum of squares) *and* a penalty factor. This penalty factor will penalize models that have coefficients that are too large. Some methods of regularization will actually cause non useful features to have a coefficient of zero, in which case the model does not consider the feature.

Let's explore two methods of regularization, Ridge Regression and Lasso. We'll combine these with the polynomial feature set (it wouldn't be as effective to perform regularization of a model on such a small original feature set of the original X).

## Imports

In [57]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Data and Setup

In [58]:
df = pd.read_csv("Advertising.csv")
X = df.drop('sales',axis=1)
y = df['sales']

### Polynomial Conversion

In [59]:
from sklearn.preprocessing import PolynomialFeatures

Create all the polynomial features -

In [60]:
polynomial_converter = PolynomialFeatures(degree=3,include_bias=False)

Fit and transform the new features into a usable array -

In [61]:
poly_features = polynomial_converter.fit_transform(X)

In [62]:
poly_features.shape

(200, 19)

### Train | Test Split

In [63]:
from sklearn.model_selection import train_test_split

Pass the new 'poly_features' array in to apply a train/test split -

In [64]:
X_train, X_test, y_train, y_test = train_test_split(poly_features, y, test_size=0.3, random_state=101)

----
----

## Scaling the Data

While our particular data set has all the values in the same order of magnitude ($1000s of dollars spent), typically that won't be the case on a dataset, and since the mathematics behind regularized models will sum coefficients together, its important to standardize the features. Review the theory videos for more info, as well as a discussion on why we only **fit** to the training data, and **transform** on both sets separately.

Import the standard scaler function -

In [65]:
from sklearn.preprocessing import StandardScaler

In [66]:
# help(StandardScaler)

Create an instance of the scaler -

In [67]:
scaler = StandardScaler()

Ensure we fit the model to the features from **ONLY the training set** so there is no data leakage. This creates the algorithm -

In [68]:
scaler.fit(X_train)

StandardScaler()

...Which we then apply the feature sets to -

In [69]:
X_train = scaler.transform(X_train)

In [70]:
X_test = scaler.transform(X_test)

Notice the values are all very small now -

In [71]:
X_train[0]

array([ 0.49300171, -0.33994238,  1.61586707,  0.28407363, -0.02568776,
        1.49677566, -0.59023161,  0.41659155,  1.6137853 ,  0.08057172,
       -0.05392229,  1.01524393, -0.36986163,  0.52457967,  1.48737034,
       -0.66096022, -0.16360242,  0.54694754,  1.37075536])

...as opposed to their original values -

In [72]:
poly_features[0]

array([2.30100000e+02, 3.78000000e+01, 6.92000000e+01, 5.29460100e+04,
       8.69778000e+03, 1.59229200e+04, 1.42884000e+03, 2.61576000e+03,
       4.78864000e+03, 1.21828769e+07, 2.00135918e+06, 3.66386389e+06,
       3.28776084e+05, 6.01886376e+05, 1.10186606e+06, 5.40101520e+04,
       9.88757280e+04, 1.81010592e+05, 3.31373888e+05])

## Ridge Regression

Make sure to view video lectures for full explanation of Ridge Regression and choosing an alpha.

In [73]:
from sklearn.linear_model import Ridge

Create a Ridge regression instance, with the lambda ('alpha' in this case) penalty of 10 -

In [81]:
ridge_model = Ridge(alpha=10)

Pass in the training features to the model...

In [82]:
ridge_model.fit(X_train,y_train)

Ridge(alpha=10)

...then generate a new prediction on the test features -

In [83]:
test_predictions = ridge_model.predict(X_test)

In [84]:
from sklearn.metrics import mean_absolute_error,mean_squared_error

Obtain the mean and mean_squared functions to gauge performance, then pass in the test values and the test predictions to determine -

In [92]:
MAE = mean_absolute_error(y_test,test_predictions)
MSE = mean_squared_error(y_test,test_predictions)
RMSE = np.sqrt(MSE)

In [93]:
MAE

0.5774404204714177

In [94]:
RMSE

0.894638646131968

How did it perform on the training set? (This will be used later on for comparison)

In [95]:
# Training Set Performance
train_predictions = ridge_model.predict(X_train)
MAE = mean_absolute_error(y_train,train_predictions)
MAE

0.5288348183025332

### Choosing an alpha value with Cross-Validation

Review the theory video for full details.

Here we will use sklearn's cross validation function to determine the best lambda figures to use -

In [96]:
from sklearn.linear_model import RidgeCV

In [97]:
help(RidgeCV)

Help on class RidgeCV in module sklearn.linear_model._ridge:

class RidgeCV(sklearn.base.MultiOutputMixin, sklearn.base.RegressorMixin, _BaseRidgeCV)
 |  RidgeCV(alphas=(0.1, 1.0, 10.0), *, fit_intercept=True, normalize=False, scoring=None, cv=None, gcv_mode=None, store_cv_values=False)
 |  
 |  Ridge regression with built-in cross-validation.
 |  
 |  See glossary entry for :term:`cross-validation estimator`.
 |  
 |  By default, it performs Generalized Cross-Validation, which is a form of
 |  efficient Leave-One-Out cross-validation.
 |  
 |  Read more in the :ref:`User Guide <ridge_regression>`.
 |  
 |  Parameters
 |  ----------
 |  alphas : ndarray of shape (n_alphas,), default=(0.1, 1.0, 10.0)
 |      Array of alpha values to try.
 |      Regularization strength; must be a positive float. Regularization
 |      improves the conditioning of the problem and reduces the variance of
 |      the estimates. Larger values specify stronger regularization.
 |      Alpha corresponds to ``1

Plug in several lambda ('alpha') figures to try -

Choosing a scoring method to determine which lambda figure worked the best - https://scikit-learn.org/stable/modules/model_evaluation.html
See all options: sklearn.metrics.SCORERS.keys()

Here we use 'Negative RMSE' so all metrics follow the convention that "Higher is better".

In [115]:
ridge_cv_model = RidgeCV(alphas=(0.1, 1.0, 10.0),scoring='neg_mean_absolute_error')

The default cross validation is 'leave one out'. With larger datasets though, running this on ALL rows could take a long time and consume a lot of power -

In [116]:
# The more alpha options you pass, the longer this will take.
# Fortunately our data set is still pretty small
ridge_cv_model.fit(X_train,y_train)

RidgeCV(alphas=array([ 0.1,  1. , 10. ]), scoring='neg_mean_absolute_error')

Now call 'alpha_' to return which lambda value worked the best (in this case, the 'neg_mean_absolute_error' -

In [119]:
ridge_cv_model.alpha_

0.1

Again run the same functions to show the results between test and predictions -

In [126]:
test_predictions = ridge_cv_model.predict(X_test)

In [127]:
MAE = mean_absolute_error(y_test,test_predictions)
MSE = mean_squared_error(y_test,test_predictions)
RMSE = np.sqrt(MSE)

In [128]:
MAE

0.42737748843313855

In [129]:
RMSE

0.6180719926906028

In [130]:
# Training Set Performance
# Training Set Performance
train_predictions = ridge_cv_model.predict(X_train)
MAE = mean_absolute_error(y_train,train_predictions)
MAE

0.3094132105668577

In [125]:
ridge_cv_model.coef_

array([ 5.40769392,  0.5885865 ,  0.40390395, -6.18263924,  4.59607939,
       -1.18789654, -1.15200458,  0.57837796, -0.1261586 ,  2.5569777 ,
       -1.38900471,  0.86059434,  0.72219553, -0.26129256,  0.17870787,
        0.44353612, -0.21362436, -0.04622473, -0.06441449])


-----

## Lasso Regression

In [37]:
from sklearn.linear_model import LassoCV

In [38]:
# https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html
lasso_cv_model = LassoCV(eps=0.1,n_alphas=100,cv=5)

In [39]:
lasso_cv_model.fit(X_train,y_train)

LassoCV(cv=5, eps=0.1)

In [40]:
lasso_cv_model.alpha_

0.4943070909225828

In [41]:
test_predictions = lasso_cv_model.predict(X_test)

In [42]:
MAE = mean_absolute_error(y_test,test_predictions)
MSE = mean_squared_error(y_test,test_predictions)
RMSE = np.sqrt(MSE)

Notice this MAE from the CV Ridge Regression is better than the previous Ridge Regression model (0.577) -

In [131]:
MAE

0.3094132105668577

In [132]:
RMSE

0.6180719926906028

In [133]:
# Training Set Performance
# Training Set Performance
train_predictions = lasso_cv_model.predict(X_train)
MAE = mean_absolute_error(y_train,train_predictions)
MAE

0.6912807140820695

In [46]:
lasso_cv_model.coef_

array([1.002651  , 0.        , 0.        , 0.        , 3.79745279,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        ])

## Elastic Net

Elastic Net combines the penalties of ridge regression and lasso in an attempt to get the best of both worlds!

In [47]:
from sklearn.linear_model import ElasticNetCV

In [48]:
elastic_model = ElasticNetCV(l1_ratio=[.1, .5, .7,.9, .95, .99, 1],tol=0.01)

In [49]:
elastic_model.fit(X_train,y_train)

ElasticNetCV(l1_ratio=[0.1, 0.5, 0.7, 0.9, 0.95, 0.99, 1], tol=0.01)

In [50]:
elastic_model.l1_ratio_

1.0

In [51]:
test_predictions = elastic_model.predict(X_test)

In [52]:
MAE = mean_absolute_error(y_test,test_predictions)
MSE = mean_squared_error(y_test,test_predictions)
RMSE = np.sqrt(MSE)

In [53]:
MAE

0.5663262117569448

In [54]:
RMSE

0.7485546215633724

In [55]:
# Training Set Performance
# Training Set Performance
train_predictions = elastic_model.predict(X_train)
MAE = mean_absolute_error(y_train,train_predictions)
MAE

0.4307582990472369

In [56]:
elastic_model.coef_

array([ 3.78993643,  0.89232919,  0.28765395, -1.01843566,  2.15516144,
       -0.3567547 , -0.271502  ,  0.09741081,  0.        , -1.05563151,
        0.2362506 ,  0.07980911,  1.26170778,  0.01464706,  0.00462336,
       -0.39986069,  0.        ,  0.        , -0.05343757])

-----
---