# Penalized linear regression

In this notebook, you will discover how to use penalized linear regression Lasso (L1), Ridge (L2) and Elasticnet (L1 + L2).

These penalties integrated to the cost function will help you train less complex models to avoid overlearning.

# Packages importation

In [None]:
# Importation of the data for our regression example
from sklearn.datasets import load_boston

# Importation of the function to standardize the data
from sklearn.preprocessing import StandardScaler

# Importation of the train_test_split function which split randomly our data 
# into a train and test set
from sklearn.model_selection import train_test_split

# Importation of the linear regression algorithm
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge
from sklearn.linear_model import ElasticNet

# Importation of the performance metrics
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Importation of the maplotlib package to create graphics
import matplotlib.pyplot as plt

# Importation of numpy to use of vectors, matrices, tensors.
import numpy as np 

#Data Importation

In [None]:
# Data frame for ou regression example
boston = load_boston()
X_reg = boston.data[:, ]
y_reg = boston.target

Use the Sklearn function *train_test_split* to split your dataset into two random set.

Use a random_state of 123 and use 10% of your dataset for the test set.

Feel free to use the [doc](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html).

In [None]:
# Use the function train_test_split to create your train and test set
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(X_reg, y_reg, 
                                                                    test_size=0.10, 
                                                                    random_state=123)

# Step 1 : Data standardization

For the use of a linear model it is essential to go through a step of normalization of the data.

This step allows to make the model interpretable but also to facilitate the convergence of the model.

Feel free to use the [doc](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html).

In [None]:
# Initialize the StandardScaler function
scaler = StandardScaler()

# Fit the StandardScaler on the trainig set
scaler.fit(X_train_reg)

# Standardization of the training set
X_train_reg_norm = scaler.transform(X_train_reg)

# Standardization of the validation set
X_test_reg_norm = scaler.transform(X_test_reg)

In [None]:
print('Mean of the training set : '+str(X_train_reg_norm.mean(axis=0)))
print('Standard deviation of the training set : '+str(X_train_reg_norm.std(axis=0)))

print('Mean of the testing set : '+str(X_test_reg_norm.mean(axis=0)))
print('Standard deviation of the testing set : '+str(X_test_reg_norm.std(axis=0)))

Mean of the training set : [-6.47223423e-17  1.68363492e-17  5.99764438e-16 -9.00378673e-17
 -5.52281054e-15 -4.49359719e-15 -7.98628563e-16 -9.09650865e-16
 -1.07362227e-17  3.31846882e-17  1.65064543e-14  1.13037022e-14
  1.83174599e-15]
Standard deviation of the training set : [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
Mean of the testing set : [ 0.06747655  0.05717168 -0.14001179  0.41277236 -0.12235192  0.2219691
  0.05579181 -0.00310857 -0.03264025 -0.06104133 -0.06444473 -0.1090823
 -0.25945322]
Standard deviation of the testing set : [1.10921879 1.02009199 0.89698995 1.53927663 0.79001036 1.06378901
 0.93794752 0.87199757 1.00028353 0.98200915 1.08832968 1.10249559
 0.90211127]


Answers expected :

Mean of the training set : [-6.47223423e-17  1.68363492e-17  5.99764438e-16 -9.00378673e-17
 -5.52281054e-15 -4.49359719e-15 -7.98628563e-16 -9.09650865e-16
 -1.07362227e-17  3.31846882e-17  1.65064543e-14  1.13037022e-14
  1.83174599e-15]


Standard deviation of the training set : [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]


Mean of the testing set : [ 0.06747655  0.05717168 -0.14001179  0.41277236 -0.12235192  0.2219691
  0.05579181 -0.00310857 -0.03264025 -0.06104133 -0.06444473 -0.1090823
 -0.25945322]

 
Standard deviation of the testing set : [1.10921879 1.02009199 0.89698995 1.53927663 0.79001036 1.06378901
 0.93794752 0.87199757 1.00028353 0.98200915 1.08832968 1.10249559
 0.90211127]

# Step 2 : Model initialization

In the case of regression, there is no choice of hyperparameter.

It is therefore sufficient to just initialize the function.

Feel free to use the [doc](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html).

In [None]:
reg = LinearRegression()

In the case of lasso regression, you have to choose a value for alpha.

Alpha will control the regularization of the model.

$ J(w) =  \frac{1}{2m}[\sum^m_{i=1}(\hat{y}^{(i)}-y^{(i)})^2+\alpha\sum^n_{j=1}|w_j|$ 

For this example initialize the regression with an alpha of 0.2 and a random_state of 123.

Feel free to use the [doc](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html#sklearn.linear_model.Lasso).

In [None]:
lasso = Lasso(alpha=0.2, random_state=123)

In the case of ridge regression, you have to choose a value for alpha.

Alpha will control the regularization of the model.

$ J(w) =  \frac{1}{2m}[\sum^m_{i=1}(\hat{y}^{(i)}-y^{(i)})^2+\alpha\sum^n_{j=1}w_j^2$ 

For this example initialize the regression with an alpha of 0.1 and a random_state of 123.

Feel free to use the [doc](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge).

In [None]:
ridge = Ridge(alpha=0.5, random_state=123)

In the case of elasticnet regression, you have to choose a value for alpha and ratio.

Alpha will control the regularization of the model.

ratio is the mixing parameter beween lasso (ratio=0) and ridge (ratio=1)

$ J(w) =  \frac{1}{2m}[\sum^m_{i=1}(\hat{y}^{(i)}-y^{(i)})^2+\alpha[\frac{1-ratio}{2}\sum^n_{j=1}w_j^2 + ratio\sum^n_{j=1}|w_j|]]$ 

For this example initialize the regression with an alpha of 0.15, the l1_ratio at 0.5 and a random_state of 123.

Feel free to use the [doc](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html#sklearn.linear_model.ElasticNet).

In [None]:
elasticnet = ElasticNet(alpha=0.15, l1_ratio=0.5, random_state=123)

# Step 3 : Model training

You must train the four models.

In [None]:
# Classic linear regression
reg.fit(X_train_reg_norm, y_train_reg)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [None]:
# Lasso regression
lasso.fit(X_train_reg_norm, y_train_reg)

Lasso(alpha=0.2, copy_X=True, fit_intercept=True, max_iter=1000,
      normalize=False, positive=False, precompute=False, random_state=123,
      selection='cyclic', tol=0.0001, warm_start=False)

In [None]:
# Ridge regression
ridge.fit(X_train_reg_norm, y_train_reg)

Ridge(alpha=0.5, copy_X=True, fit_intercept=True, max_iter=None,
      normalize=False, random_state=123, solver='auto', tol=0.001)

In [None]:
# ElasticNet regression
elasticnet.fit(X_train_reg_norm, y_train_reg)

ElasticNet(alpha=0.15, copy_X=True, fit_intercept=True, l1_ratio=0.5,
           max_iter=1000, normalize=False, positive=False, precompute=False,
           random_state=123, selection='cyclic', tol=0.0001, warm_start=False)

# Step 4 : Model validation

Your model is now trained, use it to predict the price of the appartment for your training and testing set for the four models.

In [None]:
# Classic linear regression
x_train_reg_prediction = reg.predict(X_train_reg_norm)

x_test_reg_prediction = reg.predict(X_test_reg_norm)

In [None]:
# Lasso regression
x_train_lasso_prediction = lasso.predict(X_train_reg_norm)

x_test_lasso_prediction = lasso.predict(X_test_reg_norm)

In [None]:
# Ridge regression
x_train_ridge_prediction = ridge.predict(X_train_reg_norm)

x_test_ridge_prediction = ridge.predict(X_test_reg_norm)

In [None]:
# ElasticNet regression
x_train_elasticnet_prediction = elasticnet.predict(X_train_reg_norm)

x_test_elasticnet_prediction = elasticnet.predict(X_test_reg_norm)

Compute the MAE for each model

In [None]:
# Classic linear regression
mae_train_reg = mean_absolute_error(x_train_reg_prediction, y_train_reg)

mae_test_reg = mean_absolute_error(x_test_reg_prediction, y_test_reg)

print('MAE for the training set : '+str(mae_train_reg))

print('MAE for the testing set : '+str(mae_test_reg))

MAE for the training set : 3.103406389414602
MAE for the testing set : 4.256125350322445


In [None]:
# Lasso regression
mae_train_lasso = mean_absolute_error(x_train_lasso_prediction, y_train_reg)

mae_test_lasso = mean_absolute_error(x_test_lasso_prediction, y_test_reg)

print('MAE for the training set : '+str(mae_train_lasso))

print('MAE for the testing set : '+str(mae_test_lasso))

MAE for the training set : 3.17216122871174
MAE for the testing set : 4.434191124080261


In [None]:
# Ridge regression
mae_train_ridge = mean_absolute_error(x_train_ridge_prediction, y_train_reg)

mae_test_ridge = mean_absolute_error(x_test_ridge_prediction, y_test_reg)

print('MAE for the training set : '+str(mae_train_ridge))

print('MAE for the testing set : '+str(mae_test_ridge))

MAE for the training set : 3.1013876623290284
MAE for the testing set : 4.256555902029828


In [None]:
# ElasticNet regression
mae_train_elasticnet = mean_absolute_error(x_train_elasticnet_prediction, y_train_reg)

mae_test_elasticnet = mean_absolute_error(x_test_elasticnet_prediction, y_test_reg)

print('MAE for the training set : '+str(mae_train_elasticnet))

print('MAE for the testing set : '+str(mae_test_elasticnet))

MAE for the training set : 3.1009648123909748
MAE for the testing set : 4.35444084676284


# Step 5 : Impact of the regularization term on the coefficient

Impact of the regularization term for the Lasso regression.

In [None]:
for alpha_values in [0.1, 0.2, 0.5, 1, 10] :
  lasso = Lasso(alpha=alpha_values, random_state=123)
  lasso.fit(X_train_reg_norm, y_train_reg)
  print('Alpha = '+str(alpha_values))
  print(lasso.coef_)

Alpha = 0.1
[-0.57732108  0.7262442  -0.          0.28556171 -1.41633702  3.04322547
 -0.         -2.25946766  1.10951869 -0.98632096 -1.83179253  0.70493876
 -3.47844933]
Alpha = 0.2
[-0.32336373  0.38964117 -0.          0.23411727 -0.99833996  3.17187108
 -0.         -1.56593581  0.         -0.10583449 -1.7239464   0.61369434
 -3.46106444]
Alpha = 0.5
[-0.10560102  0.         -0.          0.         -0.          3.18644561
 -0.         -0.05536803 -0.         -0.14228215 -1.55262974  0.47985314
 -3.32699066]
Alpha = 1
[-0.          0.         -0.          0.         -0.          2.85961944
 -0.         -0.         -0.         -0.05715977 -1.27401359  0.11691698
 -3.33991387]
Alpha = 10
[-0.  0. -0.  0. -0.  0. -0.  0. -0. -0. -0.  0. -0.]


Impact of the regularization term for the Ridge regression.

In [None]:
for alpha_values in [0.1, 1, 10, 10000000000] :
  ridge = Ridge(alpha=alpha_values, random_state=123)
  ridge.fit(X_train_reg_norm, y_train_reg)
  print('Alpha = '+str(alpha_values))
  print(ridge.coef_)

Alpha = 0.1
[-0.86096973  1.07489356  0.25609599  0.32320217 -1.86365283  2.93479885
 -0.15489242 -2.95786166  2.53031838 -2.20168119 -1.97211399  0.8217588
 -3.46014621]
Alpha = 1
[-0.85428794  1.06103908  0.23205917  0.3259221  -1.83860673  2.94120332
 -0.15956384 -2.93305968  2.46359052 -2.13793714 -1.96459897  0.82073673
 -3.44791885]
Alpha = 10
[-0.80101052  0.94744629  0.0514288   0.34558421 -1.62487365  2.98664712
 -0.19336543 -2.70510686  1.95270339 -1.6664614  -1.90044139  0.81142697
 -3.33723333]
Alpha = 10000000000
[-1.65574988e-07  1.53262664e-07 -2.09539460e-07  4.12357919e-08
 -1.85921869e-07  2.90461860e-07 -1.64579869e-07  1.15554267e-07
 -1.68318600e-07 -2.05763699e-07 -2.07052818e-07  1.34836309e-07
 -3.02501216e-07]


Impact of the regularization term and the l1_ratio for the elasticnet regression.

In [None]:
for alpha_values, ratio_values in zip([0.5, 0.5, 1, 10, 100, 100], [0, 1, 0.1, 0.5, 1, 0]) :
  elasticnet = ElasticNet(alpha=alpha_values, l1_ratio=ratio_values, random_state=123)
  elasticnet.fit(X_train_reg_norm, y_train_reg)
  print('Alpha = '+str(alpha_values))
  print('ratio_values = '+str(ratio_values))
  print(elasticnet.coef_)

Alpha = 0.5
ratio_values = 0
[-0.57569046  0.46837038 -0.47608925  0.35791987 -0.56525103  2.53187418
 -0.26269753 -0.91620615  0.16392595 -0.55216893 -1.36954511  0.62887315
 -2.20465361]
Alpha = 0.5
ratio_values = 1
[-0.10560102  0.         -0.          0.         -0.          3.18644561
 -0.         -0.05536803 -0.         -0.14228215 -1.55262974  0.47985314
 -3.32699066]
Alpha = 1
ratio_values = 0.1
[-0.50483818  0.37488773 -0.48159798  0.27301025 -0.43004159  2.09735359
 -0.23625205 -0.39407955 -0.03786916 -0.51102397 -1.14757158  0.50945119
 -1.79696873]
Alpha = 10
ratio_values = 0.5
[-0.          0.         -0.          0.         -0.          0.20465026
 -0.          0.         -0.         -0.         -0.          0.
 -0.2537799 ]
Alpha = 100
ratio_values = 1
[-0.  0. -0.  0. -0.  0. -0.  0. -0. -0. -0.  0. -0.]
Alpha = 100
ratio_values = 0
[-0.03431393  0.03161897 -0.04322456  0.00892123 -0.03816381  0.06179619
 -0.03368931  0.02293727 -0.03440968 -0.04244174 -0.04363212  0.02

  positive)
  positive)
