# Ridge Lasso and ElasticNet Regression

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error

In [2]:
df = pd.read_csv('../SDataset/Advertising.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,TV,radio,newspaper,sales
0,1,230.1,37.8,69.2,22.1
1,2,44.5,39.3,45.1,10.4
2,3,17.2,45.9,69.3,9.3
3,4,151.5,41.3,58.5,18.5
4,5,180.8,10.8,58.4,12.9


In [3]:
df.drop(['Unnamed: 0'], axis = 1, inplace = True)
df

Unnamed: 0,TV,radio,newspaper,sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,9.3
3,151.5,41.3,58.5,18.5
4,180.8,10.8,58.4,12.9
...,...,...,...,...
195,38.2,3.7,13.8,7.6
196,94.2,4.9,8.1,9.7
197,177.0,9.3,6.4,12.8
198,283.6,42.0,66.2,25.5


In [4]:
X_train, X_test, Y_train, Y_test = train_test_split(df[['TV', 'radio', 'newspaper']], df.sales, test_size = 0.25, random_state = 42)


## Ridge Regression
Ridge Regression (L2) is used when there is a problem of multicollinearity.
By adding a degree of bias to the regression estimates, ridge regression reduces the standard errors.

The main idea is to find a new line that has some bias with respect to the training data, In return for that small amount of bias, a significant drop in variance is achieved.

Loss Equation defined as : $ \large \sum{(\hat Y_i - Y_i)^2} + \lambda \sum{{x_i}^2} $

 - $\lambda $ controls the strength of the penalty term
 - $ \hat Y_i $ is predicted datapoints
 - $ Y_i $ is actual datapoints

In [5]:
from sklearn.linear_model import Ridge
Ridge().get_params()

{'alpha': 1.0,
 'copy_X': True,
 'fit_intercept': True,
 'max_iter': None,
 'normalize': False,
 'random_state': None,
 'solver': 'auto',
 'tol': 0.001}

In [6]:
rr = Ridge(alpha = 0.3)
rr.fit(X_train, Y_train)

Ridge(alpha=0.3)

In [7]:
rr.coef_

array([0.04543356, 0.19145451, 0.00256863])

In [8]:
rr.intercept_

2.778334360513526

In [9]:
rpred = rr.predict(X_test)
rpred

array([16.38347793, 20.92431281, 21.61495156, 10.49068077, 22.17683816,
       13.02666961, 21.10304993,  7.31814754, 13.56735239, 15.12238909,
        8.92492346,  6.49927956, 14.30126956,  8.77231064,  9.58669603,
       12.09488727,  8.5962126 , 16.2533704 , 10.16948858, 18.85751537,
       19.57990962, 13.15878916, 12.25099318, 21.35141633,  7.69608868,
        5.64690759, 20.79776311, 11.90951965,  9.0658316 ,  8.37293311,
       12.40819628,  9.89416016, 21.42704025, 12.14235961, 18.28779163,
       20.18111168, 13.99297071, 20.8998824 , 10.93139945,  4.3872552 ,
        9.58216243, 12.61704056,  9.9385153 ,  8.06817796, 13.45501316,
        5.25772518,  9.15401612, 14.09552139,  8.71033641, 11.55101215])

In [10]:
mean_squared_error(Y_test, rpred)

2.8800156243370565

In [11]:
r2_score(Y_test, rpred)

0.8935166317120741

## Lasso Regression

Lasso Regression (L1) is similar to ridge, but it also performs feature selection.
It will set the coefficient value for features that do not help in decision making very low, potentially zero.

Lasso regression tends to exclude variables that are not required from the equation, whereas ridge tends 
to do better when all variables are present.

Loss Equation defined as : $ \large \sum{(\hat Y_i - Y_i)^2} + \lambda \sum{{x_i}} $

 - $\lambda $ controls the strength of the penalty term
 - $ \hat Y_i $ is predicted datapoints
 - $ Y_i $ is actual datapoints

In [12]:
from sklearn.linear_model import Lasso
Lasso().get_params()

{'alpha': 1.0,
 'copy_X': True,
 'fit_intercept': True,
 'max_iter': 1000,
 'normalize': False,
 'positive': False,
 'precompute': False,
 'random_state': None,
 'selection': 'cyclic',
 'tol': 0.0001,
 'warm_start': False}

In [13]:
lreg = Lasso(alpha = 0.9)
lreg.fit(X_train, Y_train)

Lasso(alpha=0.9)

In [14]:
lpred = lreg.predict(X_test)
lpred

array([16.32867031, 20.81116094, 21.5704462 , 10.44093814, 22.11443168,
       12.96876421, 21.03069428,  7.3827303 , 13.64992098, 15.12251188,
        8.96536588,  6.57045356, 14.29964133,  8.78867798,  9.66594118,
       12.14168436,  8.58151712, 16.26865389, 10.18545843, 18.85831012,
       19.4917581 , 13.02507385, 12.22565328, 21.24901218,  7.78704401,
        5.74066115, 20.72116659, 11.96518471,  9.11751823,  8.40674695,
       12.48627015,  9.92646857, 21.29949328, 12.01756832, 18.30108846,
       20.16507432, 13.94757871, 20.83602142, 10.99402962,  4.48565506,
        9.6484498 , 12.69140074,  9.93253718,  8.128277  , 13.5379696 ,
        5.35589071,  9.17558788, 14.10782999,  8.77148427, 11.56126673])

In [15]:
mean_squared_error(Y_test, lpred )

2.83868029685398

In [16]:
r2_score(Y_test, lpred)

0.8950449306777075

## ElasticNet Regression
ElasticNet regression combines 
the strength of lasso and ridge regression. 

If you are not sure whether to use lasso or ridge, use ElasticNet

Loss Objective defined as :  $ \large \sum{(\hat Y_i - Y_i)^2} + \lambda _1 \sum{x_i} + \lambda _2 \sum{x_i}^2$


In [17]:
from sklearn.linear_model import ElasticNet
ElasticNet().get_params()

{'alpha': 1.0,
 'copy_X': True,
 'fit_intercept': True,
 'l1_ratio': 0.5,
 'max_iter': 1000,
 'normalize': False,
 'positive': False,
 'precompute': False,
 'random_state': None,
 'selection': 'cyclic',
 'tol': 0.0001,
 'warm_start': False}

In [18]:
enreg = ElasticNet(alpha = 1.0, l1_ratio = 0.5)
enreg.fit(X_train, Y_train)

ElasticNet()

In [19]:
enpred = enreg.predict(X_test)
enpred

array([16.35200777, 20.85238609, 21.5895651 , 10.45831418, 22.12575308,
       12.99173893, 21.05222371,  7.3583322 , 13.62093858, 15.12309966,
        8.94301061,  6.54760041, 14.31775323,  8.77534083,  9.64089449,
       12.1293744 ,  8.58720222, 16.25977339, 10.18022294, 18.85334385,
       19.53244812, 13.08920983, 12.22599612, 21.29368413,  7.74971666,
        5.70850214, 20.74592053, 11.9422075 ,  9.09976894,  8.38602117,
       12.46077458,  9.91195517, 21.34722472, 12.07086319, 18.30087744,
       20.16342115, 13.95302739, 20.86566578, 10.96720609,  4.45153272,
        9.6258873 , 12.66220686,  9.93420318,  8.10536013, 13.50965215,
        5.31989363,  9.17113098, 14.10062529,  8.75372831, 11.55274678])

In [20]:
mean_squared_error(Y_test, enpred )

2.8543424465527103

In [21]:
r2_score(Y_test, enpred)

0.8944658510225635