### To understand how lasso, ridge and elastic net regression techniques work, we will implement all these techniques on the same problem and compare their performance. This will help us in understanding how these techniques can be used to improve the regression models. 

### For this implementation, we will use the 50_Statrtups data where the task is to predict the profit by a company based on the different spendings. First of all, we will implement the solution using a linear regression model and after that, we will implement the same using Lasso, Ridge and  Elastic Net regression. Finally, we will compare the performance of all these four models and analyze how they worked. 

In [1]:
# Importing libraries and dataset

import numpy as np
import pandas as pd

data = pd.read_csv('F:\MachineHack\Regression Analysis\Lasso, Ridge & Elastic Net Rgression - Regularization techniques\\50_Startups.csv')
data

Unnamed: 0,R&D Spend,Administration,Marketing Spend,State,Profit
0,165349.2,136897.8,471784.1,New York,192261.83
1,162597.7,151377.59,443898.53,California,191792.06
2,153441.51,101145.55,407934.54,Florida,191050.39
3,144372.41,118671.85,383199.62,New York,182901.99
4,142107.34,91391.77,366168.42,Florida,166187.94
5,131876.9,99814.71,362861.36,New York,156991.12
6,134615.46,147198.87,127716.82,California,156122.51
7,130298.13,145530.06,323876.68,Florida,155752.6
8,120542.52,148718.95,311613.29,New York,152211.77
9,123334.88,108679.17,304981.62,California,149759.96


In [2]:
# Defining input and output features

x = data.iloc[:, :-1].values
y = data.iloc[:, -1].values
print(x.shape, y.shape)

(50, 4) (50,)


In [3]:
# We can see that column 'State' has categorical values.
# Hence we will use use One Hot Encoding

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [3])], remainder='passthrough')
x = np.array(ct.fit_transform(x))
print(x)

[[0.0 0.0 1.0 165349.2 136897.8 471784.1]
 [1.0 0.0 0.0 162597.7 151377.59 443898.53]
 [0.0 1.0 0.0 153441.51 101145.55 407934.54]
 [0.0 0.0 1.0 144372.41 118671.85 383199.62]
 [0.0 1.0 0.0 142107.34 91391.77 366168.42]
 [0.0 0.0 1.0 131876.9 99814.71 362861.36]
 [1.0 0.0 0.0 134615.46 147198.87 127716.82]
 [0.0 1.0 0.0 130298.13 145530.06 323876.68]
 [0.0 0.0 1.0 120542.52 148718.95 311613.29]
 [1.0 0.0 0.0 123334.88 108679.17 304981.62]
 [0.0 1.0 0.0 101913.08 110594.11 229160.95]
 [1.0 0.0 0.0 100671.96 91790.61 249744.55]
 [0.0 1.0 0.0 93863.75 127320.38 249839.44]
 [1.0 0.0 0.0 91992.39 135495.07 252664.93]
 [0.0 1.0 0.0 119943.24 156547.42 256512.92]
 [0.0 0.0 1.0 114523.61 122616.84 261776.23]
 [1.0 0.0 0.0 78013.11 121597.55 264346.06]
 [0.0 0.0 1.0 94657.16 145077.58 282574.31]
 [0.0 1.0 0.0 91749.16 114175.79 294919.57]
 [0.0 0.0 1.0 86419.7 153514.11 0.0]
 [1.0 0.0 0.0 76253.86 113867.3 298664.47]
 [0.0 0.0 1.0 78389.47 153773.43 299737.29]
 [0.0 1.0 0.0 73994.56 122782.75 3

#### We can see that there are three dummy variables created as a result of one-hot encoding of the categorical feature, ‘states’. 

In [4]:
# Splitting the dataset into training and testing dataset

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state = 0)

print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)

(40, 6)
(10, 6)
(40,)
(10,)


In [5]:
# Defining and training a linear regression model

from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(x_train, y_train)

LinearRegression()

In [6]:
# Checking the regression coefficients of linear regression model

print(lin_reg.coef_)

[ 8.66383692e+01 -8.72645791e+02  7.86007422e+02  7.73467193e-01
  3.28845975e-02  3.66100259e-02]


In [7]:
# Evaluationg the model on the basis of prediction made on testing data

y_pred = lin_reg.predict(x_test)

from sklearn.metrics import mean_squared_error, r2_score
import math

mse_lin = mean_squared_error(y_test, y_pred)
rmse_lin = math.sqrt(mse_lin)
r2_lin = r2_score(y_test, y_pred)
adj_r2_lin = 1 - ((1-r2_lin)*(x_train.shape[0]-1)/(x_train.shape[0]-x_train.shape[1]-1))

print('MSE = ', mse_lin)
print('RMSE = ', rmse_lin)
print('R-Squared = ', r2_lin)
print('Adjusted R-Squared = ', adj_r2_lin)

MSE =  83502864.03258514
RMSE =  9137.99015279537
R-Squared =  0.9347068473282364
Adjusted R-Squared =  0.9228353650242793


## Lasso Regresison

#### The most important factor with the lasso regression is the alpha value. We need to find out the best value of alpha so that the model can give the best results. We can either go with the hit and trial way, put the random values of alpha and check the model's performance. Alternatively, we can use hyperparameter tuning methods to find the optimal values.

#### Here we will use the grid search cross-validation approach to find the optimal value of alpha for the lasso regression model.

In [8]:
# Finding the optimal values of alpha

from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import Lasso

lasso = Lasso(random_state=0, max_iter=10000)
alphas = np.logspace(-4, -0.5, 30)
grid1 = GridSearchCV(estimator=lasso, cv=5, scoring='r2', param_grid=dict(alpha=alphas))
grid1.fit(x, y)

print('Best value for alpha = ', grid1.best_estimator_.alpha)

Best value for alpha =  0.31622776601683794


In [9]:
# Defining and training the Lasso Regressor

lasso_reg = Lasso(alpha=grid1.best_estimator_.alpha, normalize=True)
lasso_reg.fit(x_train, y_train)

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Lasso())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * np.sqrt(n_samples). 


Lasso(alpha=0.31622776601683794, normalize=True)

In [10]:
# printing coefficients of lasso regression model

print(lasso_reg.coef_)

[-1.52407525e+02 -1.10851664e+03  5.43180150e+02  7.73508965e-01
  3.27905025e-02  3.65825686e-02]


#### As we can see in the output, the lasso regression has shrunk the values of regression coefficients as compared to the linear regression. 

In [11]:
# Evaluationg the model on the basis of prediction made on testing data

y_pred = lasso_reg.predict(x_test)

mse_lasso = mean_squared_error(y_test, y_pred)
rmse_lasso = math.sqrt(mse_lasso)
r2_lasso = r2_score(y_test, y_pred)
adj_r2_lasso = 1-((1-r2_lasso)*(x_train.shape[0]-1)/(x_train.shape[0]-x_train.shape[1]-1))

print('MSE(Lasso) = ', mse_lasso)
print('RMSE(Lasso) = ', rmse_lasso)
print('R-Squared(Lasso) = ', r2_lasso)
print('Adjusted R-Squared(Lasso) = ', adj_r2_lasso)

MSE(Lasso) =  83446561.62924908
RMSE(Lasso) =  9134.90895571757
R-Squared(Lasso) =  0.9347508717034407
Adjusted R-Squared(Lasso) =  0.9228873938313391


## Ridge Regression

In [12]:
# Defining and training the ridge regressor

from sklearn.linear_model import Ridge

ridge_reg = Ridge(alpha=0.02, normalize=True)
ridge_reg.fit(x_train,y_train)

# Checking the regression coefficients of the fitted model

print(ridge_reg.coef_)

[-7.08294081e+01 -1.19321559e+03  1.05814057e+03  7.38658980e-01
  5.01460340e-02  4.49020995e-02]


If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 


In [16]:
# Evaluationg the model on the basis of prediction made on testing data

y_pred = ridge_reg.predict(x_test)

mse_ridge = mean_squared_error(y_test, y_pred)
rmse_ridge = math.sqrt(mse_ridge)
r2_ridge = r2_score(y_test, y_pred)
adj_r2_ridge = 1-((1-r2_ridge)*(x_train.shape[0]-1)/(x_train.shape[0]-x_train.shape[1]-1))

print('MSE(Ridge) = ', mse_ridge)
print('RMSE(Ridge) = ', rmse_ridge)
print('R-Squared(Ridge) = ', r2_ridge)
print('Adjusted R-Squared(Ridge) = ', adj_r2_ridge)

MSE(Ridge) =  96455376.14174965
RMSE(Ridge) =  9821.169794976036
R-Squared(Ridge) =  0.9245789270416158
Adjusted R-Squared(Ridge) =  0.9108660046855459


## ElasticNet Regression

In [19]:
# Defining and training the elastic net regressor

from sklearn.linear_model import ElasticNet

en_reg = ElasticNet(alpha=grid1.best_estimator_.alpha, l1_ratio=0.5, normalize=False)
en_reg.fit(x_train,y_train)

# Checking the regression coefficients of the fitted model

print(en_reg.coef_)

[ 3.60238631e+01 -5.40647541e+02  5.03623674e+02  7.75424124e-01
  3.16010555e-02  3.58982077e-02]




In [20]:
# Evaluationg the model on the basis of prediction made on testing data

y_pred = en_reg.predict(x_test)

mse_en = mean_squared_error(y_test, y_pred)
rmse_en = math.sqrt(mse_en)
r2_en = r2_score(y_test, y_pred)
adj_r2_en = 1-((1-r2_en)*(x_train.shape[0]-1)/(x_train.shape[0]-x_train.shape[1]-1))

print('MSE(EN) = ', mse_en)
print('RMSE(EN) = ', rmse_en)
print('R-Squared(EN) = ', r2_en)
print('Adjusted R-Squared(EN) = ', adj_r2_en)

MSE(EN) =  81084480.20025307
RMSE(EN) =  9004.692121347241
R-Squared(EN) =  0.93659784719528
Adjusted R-Squared(EN) =  0.9250701830489674


#### Now we will compare all the models based on their respective metrics

In [21]:
pd.DataFrame(data = {'Regressor':['Lin', 'Lasso', 'Ridge', 'EN'],
                     'MSE':[mse_lin, mse_lasso, mse_ridge, mse_en],
                     'RMSE':[rmse_lin, rmse_lasso, rmse_ridge, rmse_en],
                     'R2':[r2_lin, r2_lasso, r2_ridge, r2_en],
                     'Adj R2':[adj_r2_lin, adj_r2_lasso, adj_r2_ridge, adj_r2_en]
                     })

Unnamed: 0,Regressor,MSE,RMSE,R2,Adj R2
0,Lin,83502860.0,9137.990153,0.934707,0.922835
1,Lasso,83446560.0,9134.908956,0.934751,0.922887
2,Ridge,96455380.0,9821.169795,0.924579,0.910866
3,EN,81084480.0,9004.692121,0.936598,0.92507
