# Regularization with SciKit-Learn

Previously we created a new polynomial feature set and then applied our standard linear regression on it, but we can be smarter about model choice and utilize regularization.

Regularization attempts to minimize the RSS (residual sum of squares) *and* a penalty factor. This penalty factor will penalize models that have coefficients that are too large. Some methods of regularization will actually cause non useful features to have a coefficient of zero, in which case the model does not consider the feature.

Let's explore two methods of regularization, Ridge Regression and Lasso. We'll combine these with the polynomial feature set (it wouldn't be as effective to perform regularization of a model on such a small original feature set of the original X).

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
df = pd.read_csv("Advertising.csv")
X = df.drop('sales',axis=1)
y = df['sales']

### Converting original feature dataset(X) to polynomial feature dataset.

In [80]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.linear_model import RidgeCV,Ridge     #Ridge model with Cross validation

In [17]:
from sklearn.metrics import mean_squared_error,mean_absolute_error

In [5]:
poly_features = PolynomialFeatures(degree=3,include_bias=False)

In [10]:
X1 = poly_features.fit_transform(X)

In [13]:
X_train, X_test, y_train, y_test = train_test_split(X1, y, test_size=0.30, random_state=42)

### First lets perform Ridge Regression by providing an alpha.

In [14]:
ridge = Ridge(alpha=10)

In [18]:
ridge.fit(X_train,y_train)
ridge_pred = ridge.predict(X_test)

In [20]:
ridge_MAE = mean_absolute_error(y_test,ridge_pred)
ridge_MSE = mean_squared_error(y_test,ridge_pred)
ridge_RMSE = np.sqrt(ridge_MSE)

In [21]:
ridge_result = {'MAE': ridge_MAE,'MSE':ridge_MSE,'RMSE':ridge_RMSE}
ridge_result

{'MAE': 0.3974983466770129,
 'MSE': 0.29061113322646465,
 'RMSE': 0.5390836050432851}

Lets perform RidgeCV:Choosing an alpha value with Cross-Validation

In [55]:
ridgecv = RidgeCV(alphas=(5, 10, 20),scoring='neg_mean_squared_error')
ridgecv.fit(X_train,y_train)

# Negative RMSE so all metrics follow convention "Higher is better"
# See all options: sklearn.metrics.SCORERS.keys()

RidgeCV(alphas=array([ 5, 10, 20]), scoring='neg_mean_squared_error')

In [52]:
ridgecv.alpha_

20

In [53]:
ridgecv_pred = ridgecv.predict(X_test)

In [54]:
ridgecv_MAE = mean_absolute_error(y_test,ridgecv_pred)
ridgecv_MSE = mean_squared_error(y_test,ridgecv_pred)
ridgecv_RMSE = np.sqrt(ridgecv_MSE)

ridgecv_result = {'MAE': ridgecv_MAE,'MSE':ridgecv_MSE,'RMSE':ridgecv_RMSE}
ridgecv_result

{'MAE': 0.3774046073867227,
 'MSE': 0.2683235043468009,
 'RMSE': 0.5179995215700502}

In [57]:
ridgecv.coef_

array([ 9.35403060e-02,  8.23888743e-03,  2.57052690e-02, -4.45687081e-04,
        1.42849996e-03, -3.37673677e-04,  2.40661193e-04,  2.94923473e-04,
        1.30001062e-04,  7.19246600e-07, -9.28229220e-07,  1.08817244e-06,
       -1.69585860e-07, -3.03284784e-06,  3.76429625e-08, -3.68551409e-06,
        3.46402358e-06, -1.86914251e-06, -1.51407265e-06])

## Lasso regularization

In [59]:
from sklearn.linear_model import LassoCV   #Lasso with CV

In [60]:
lassocv = LassoCV(eps=0.1,n_alphas=100,cv=5)
lassocv.fit(X_train,y_train)

LassoCV(cv=5, eps=0.1)

In [61]:
lassocv.alpha_

2368033.2970681335

In [63]:
lasso_pred = lassocv.predict(X_test)

In [66]:
lassocv_MAE = mean_absolute_error(y_test,lasso_pred)
lassocv_MSE = mean_squared_error(y_test,lasso_pred)
lassocv_RMSE = np.sqrt(lassocv_MSE)

lassocv_result = {'MAE': lassocv_MAE,'MSE':lassocv_MSE,'RMSE':lassocv_RMSE}
lassocv_result

{'MAE': 3.002747147591353,
 'MSE': 13.561636855420993,
 'RMSE': 3.6826127756554845}

In [67]:
lassocv.coef_

array([0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 4.14387332e-07, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00])

This model actually shrinks the irrelevant features to zero.

## Performing ElasticNet regularization

In [68]:
from sklearn.linear_model import ElasticNetCV

In [71]:
elastic_model = ElasticNetCV(l1_ratio=[.1, .5, .7,.9, .95, .99, 1],tol=0.01)

In [73]:
elastic_model.fit(X_train,y_train)

ElasticNetCV(l1_ratio=[0.1, 0.5, 0.7, 0.9, 0.95, 0.99, 1], tol=0.01)

In [74]:
elastic_model.l1_ratio_

1.0

In [75]:
elastic_pred = elastic_model.predict(X_test)

In [78]:
elasticcv_MAE = mean_absolute_error(y_test,elastic_pred)
elasticcv_MSE = mean_squared_error(y_test,elastic_pred)
elasticcv_RMSE = np.sqrt(elasticcv_MSE)

elasticcv_result = {'MAE': elasticcv_MAE,'MSE':elasticcv_MSE,'RMSE':elasticcv_RMSE}
lassocv_result

{'MAE': 3.002747147591353,
 'MSE': 13.561636855420993,
 'RMSE': 3.6826127756554845}

In [79]:
elastic_model.coef_

array([ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  7.54399185e-08,  3.33425352e-06, -3.07263465e-08,
        1.04403704e-05,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00, -0.00000000e+00])

## Conclusion:Ridge CV has the best performance for this dataset.

## <div> <center> Thankyou!! <div>