# Regularization with SciKit-Learn

Previously we created a new polynomial feature set and then applied our standard linear regression on it, but we can be smarter about model choice and utilize regularization.

Regularization attempts to minimize the RSS (residual sum of squares) *and* a penalty factor. This penalty factor will penalize models that have coefficients that are too large. Some methods of regularization will actually cause non useful features to have a coefficient of zero, in which case the model does not consider the feature.

Let's explore two methods of regularization, Ridge Regression and Lasso. We'll combine these with the polynomial feature set (it wouldn't be as effective to perform regularization of a model on such a small original feature set of the original X).

In [53]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [54]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [55]:
with open('/content/drive/MyDrive/UNZIP_FOR_NOTEBOOKS_FINAL/08-Linear-Regression-Models/Advertising.csv') as f:
  df = pd.read_csv(f)
#df = pd.read_csv("Advertising.csv")
X = df.drop('sales',axis=1)
y = df['sales']

### Converting original feature dataset(X) to polynomial feature dataset.

In [56]:
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import RidgeCV,Ridge     #Ridge model with Cross validation

In [57]:
from sklearn.metrics import mean_squared_error,mean_absolute_error

In [58]:
poly_features = PolynomialFeatures(degree=3,include_bias=False)

In [59]:
X1 = poly_features.fit_transform(X)

In [60]:
X_train, X_test, y_train, y_test = train_test_split(X1, y, test_size=0.30, random_state=42)

In [61]:
scaler = StandardScaler()   # scaling the features

In [62]:
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

### First lets perform Ridge Regression by providing an alpha.

In [63]:
ridge = Ridge(alpha=10)

In [64]:
ridge.fit(X_train,y_train)
ridge_pred = ridge.predict(X_test)

In [65]:
ridge_MAE = mean_absolute_error(y_test,ridge_pred)
ridge_MSE = mean_squared_error(y_test,ridge_pred)
ridge_RMSE = np.sqrt(ridge_MSE)

In [66]:
ridge_result = {'MAE': ridge_MAE,'MSE':ridge_MSE,'RMSE':ridge_RMSE}
ridge_result

{'MAE': 0.6296591346758612,
 'MSE': 0.7950089683107221,
 'RMSE': 0.8916327541710892}

Lets perform RidgeCV:Choosing an alpha value with Cross-Validation

In [67]:
ridgecv = RidgeCV(alphas=(5, 10, 20),scoring='neg_mean_squared_error')
ridgecv.fit(X_train,y_train)

# Negative RMSE so all metrics follow convention "Higher is better"
# See all options: sklearn.metrics.SCORERS.keys()

RidgeCV(alphas=array([ 5, 10, 20]), scoring='neg_mean_squared_error')

In [68]:
ridgecv.alpha_

5

In [69]:
ridgecv_pred = ridgecv.predict(X_test)

In [70]:
ridgecv_MAE = mean_absolute_error(y_test,ridgecv_pred)
ridgecv_MSE = mean_squared_error(y_test,ridgecv_pred)
ridgecv_RMSE = np.sqrt(ridgecv_MSE)

ridgecv_result = {'MAE': ridgecv_MAE,'MSE':ridgecv_MSE,'RMSE':ridgecv_RMSE}
ridgecv_result

{'MAE': 0.6193232511180445,
 'MSE': 0.6447926793757125,
 'RMSE': 0.8029898376540717}

In [71]:
ridgecv.coef_

array([ 2.67031963e+00,  6.88247438e-01,  4.07438555e-02,  5.02581350e-02,
        2.00032701e+00,  2.89817433e-01,  7.64481861e-02,  1.71653947e-01,
       -4.48869584e-02, -8.88043972e-01,  3.42274205e-01, -5.81276876e-01,
        9.84127236e-01,  5.23451016e-02,  6.90469374e-02, -2.53111696e-01,
       -1.14267826e-03,  1.94496267e-02,  2.18503294e-02])

## Lasso regularization

In [72]:
from sklearn.linear_model import LassoCV   #Lasso with CV

In [73]:
lassocv = LassoCV(eps=0.1,n_alphas=100,cv=5)
lassocv.fit(X_train,y_train)

LassoCV(cv=5, eps=0.1)

In [74]:
lassocv.alpha_

0.4924531806474871

In [75]:
lasso_pred = lassocv.predict(X_test)

In [76]:
lassocv_MAE = mean_absolute_error(y_test,lasso_pred)
lassocv_MSE = mean_squared_error(y_test,lasso_pred)
lassocv_RMSE = np.sqrt(lassocv_MSE)

lassocv_result = {'MAE': lassocv_MAE,'MSE':lassocv_MSE,'RMSE':lassocv_RMSE}
lassocv_result

{'MAE': 0.6811456342837985,
 'MSE': 1.0710443722690077,
 'RMSE': 1.0349127365478732}

In [77]:
lassocv.coef_

array([0.97675148, 0.        , 0.        , 0.        , 3.8148913 ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        ])

This model actually shrinks the irrelevant features to zero.

### Now lets adjust hyperparameters of LASSOCV to improve its performance

In [110]:
lasso_cv = LassoCV(eps=0.01,n_alphas=100, max_iter = 10000)                        # Defaul CV value is 5.
lasso_cv.fit(X_train,y_train)

LassoCV(eps=0.01, max_iter=10000)

In [111]:
lasso_cv.alpha_

0.04924531806474871

In [112]:
lasso_cv_pred = lasso_cv.predict(X_test)

In [113]:
lasso_cv_MAE = mean_absolute_error(y_test,lasso_cv_pred)
lasso_cv_MSE = mean_squared_error(y_test,lasso_cv_pred)
lasso_cv_RMSE = np.sqrt(lasso_cv_MSE)

lasso_cv_result = {'MAE': lasso_cv_MAE,'MSE':lasso_cv_MSE,'RMSE':lasso_cv_RMSE}
lasso_cv_result

{'MAE': 0.5541413345881179,
 'MSE': 0.5217385139784726,
 'RMSE': 0.7223146917919312}

In [109]:
lasso_cv.coef_

array([ 5.15048089,  0.4274257 ,  0.29684446, -4.53337994,  3.38937185,
       -0.4288993 ,  0.        ,  0.        ,  0.        ,  1.17891049,
       -0.        ,  0.        ,  0.16706037, -0.        ,  0.        ,
        0.        ,  0.11083672,  0.        ,  0.06155549])

## When passed apppropriate parameters LASSO performed better than ridge.
### Using LASSO also benefits by simplifying the model as many coefficients are changed to zero. This means that these parameters are can be neglected.

## Performing ElasticNet regularization. It is a combination of L1 and L2 regularization and provides the most suitable ratio of L1 and L2 to be used as alpha.

In [84]:
from sklearn.linear_model import ElasticNetCV

In [85]:
elastic_model = ElasticNetCV(l1_ratio=[.1, .5, .7,.9, .95, .99, 1],tol=0.01)

In [86]:
elastic_model.fit(X_train,y_train)

ElasticNetCV(l1_ratio=[0.1, 0.5, 0.7, 0.9, 0.95, 0.99, 1], tol=0.01)

In [87]:
elastic_model.l1_ratio_

0.95

In [88]:
elastic_pred = elastic_model.predict(X_test)

In [89]:
elasticcv_MAE = mean_absolute_error(y_test,elastic_pred)
elasticcv_MSE = mean_squared_error(y_test,elastic_pred)
elasticcv_RMSE = np.sqrt(elasticcv_MSE)

elasticcv_result = {'MAE': elasticcv_MAE,'MSE':elasticcv_MSE,'RMSE':elasticcv_RMSE}
lassocv_result

{'MAE': 0.6811456342837985,
 'MSE': 1.0710443722690077,
 'RMSE': 1.0349127365478732}

In [90]:
elastic_model.coef_

array([ 3.95312352,  0.98671224,  0.2194859 , -1.01798785,  1.97463372,
       -0.3782983 , -0.12009502,  0.07739924,  0.02861239, -1.10946628,
        0.49812979, -0.        ,  0.95685199, -0.05702842,  0.04842821,
       -0.36288403,  0.1257612 ,  0.00643697,  0.        ])

## Conclusion:LASSOcv has the best performance for this dataset.

## <div> <center> Thankyou!! <div>