# Regularized Linear Regression

    Regularised regression models can handle the correlated independent variables well and helps in overcoming overfitting. Ridge penalty shrinks the coefficients of correlated predictors towards each other, while the Lasso tends to pick one of a pair of correlated features and discard the other.
    
## l2 Regularization or Ridge Regression

    To understand Ridge Regression, we need to remind ourselves of what happens during gradient descent, when our model coefficients are trained. During training, our initial weights are updated according to a gradient update rule using a learning rate and a gradient. Ridge regression adds a penalty to the update, and as a result shrinks the size of our weights. This is implemented in scikit-learn as a class called Ridge.
    
    We will create a new pipeline, this time using Ridge. We will specify our regularization strength by passing in a parameter, alpha. This can be really small, like 0.1, or as large as you would want it to be. The larger the value of alpha, the less variance your model will exhibit.

In [1]:
#imports
import numpy as np
import pandas as pd
import math

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error

from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

import matplotlib.pyplot as plt
import seaborn as sns

from datetime import datetime #to access datetime
from pandas import Series, DataFrame #to work on series & dataframe
from pathlib import Path #to create path to directories and files

sns.set()
%matplotlib inline

import warnings #to ignore the warnings
warnings.filterwarnings('ignore')

In [2]:
#https://pbpython.com/notebook-process.html
today = datetime.today()
train_original = Path.cwd() /'data'/'raw'/'Train_File.csv'
test_original = Path.cwd() /'data'/'raw'/'Test_File.csv'
summary_file_train = Path.cwd() /'data'/'processed'/f'summary_train{today:%b-%d-%Y}.pkl'
summary_file_test = Path.cwd() /'data'/'processed'/f'summary_test{today:%b-%d-%Y}.pkl'

In [3]:
#reading data
train = pd.read_pickle(summary_file_train)
test = pd.read_pickle(summary_file_test)

In [4]:
train = train.drop(['Item_Identifier', 'Outlet_Identifier'], axis=1)
test = test.drop(['Item_Identifier', 'Outlet_Identifier'], axis=1)

In [5]:
X = train.drop('Item_Outlet_Sales', axis=1)
y = train['Item_Outlet_Sales']

In [6]:
x_train, x_cv, y_train, y_cv = train_test_split(X, y, test_size=0.3)

### Ridge Prediction

In [7]:
steps = [
    ('scalar', StandardScaler()),
    ('poly', PolynomialFeatures(degree=2)),
    ('model', Ridge(alpha=10, fit_intercept=True))
]

ridge_pipe = Pipeline(steps)
ridge_pipe.fit(x_train, y_train)

print('Training Score: {}'.format(ridge_pipe.score(x_train, y_train)))
print('Test Score: {}'.format(ridge_pipe.score(x_cv, y_cv)))


Training Score: 0.622802706418365
Test Score: 0.5725345939909212


## l1 Regularization or Lasso Regression

    By creating a polynomial model, we created additional features. The question we need to ask ourselves is which of our features are relevant to our model, and which are not.
    
    l1 regularization tries to answer this question by driving the values of certain coefficients down to 0. This eliminates the least important features in our model. We will create a pipeline similar to the one above, but using Lasso. You can play around with the value of alpha, which can range from 0.1 to 1.

In [8]:
steps = [
    ('scalar', StandardScaler()),
    ('poly', PolynomialFeatures(degree=2)),
    ('model', Lasso(alpha=0.3, fit_intercept=True))
]

lasso_pipe = Pipeline(steps)
lasso_pipe.fit(x_train, y_train)

print('Training Score: {}'.format(lasso_pipe.score(x_train, y_train)))
print('Test Score: {}'.format(lasso_pipe.score(x_cv, y_cv)))

Training Score: 0.6227984435265985
Test Score: 0.5728321745528362


In [9]:
#predicting on cv
pred2_cv = ridge_pipe.predict(x_cv)

#calculating rmse
mse = mean_squared_error(y_cv, pred2_cv)
rmse = math.sqrt(mse)

print('RMSE: {}'.format(rmse))

RMSE: 1088.6554394193124


In [10]:
#predicting on cv
pred3_cv = lasso_pipe.predict(x_cv)

#calculating rmse
mse = mean_squared_error(y_cv, pred3_cv)
rmse = math.sqrt(mse)

print('RMSE: {}'.format(rmse))

RMSE: 1088.276439043809
