# Ridge (L2) / Lasso (L1) / ElasticNet Regression

______

## Environment Set-Up

### Load relevant Python Packages

In [28]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error, r2_score

### Data Import

In [2]:
#data has been saved using a .pkl file 
path = './data/df_model.pkl'
df_model = pd.read_pickle(path)
df_model.head(2)

Unnamed: 0,dist_km,elv_m,elapsed_time,moving_time,start_time,day_of_week,speed_km/h,pace_min/km,pace_min/100m,power_W,...,h/r_zone3,h/r_zone4,h/r_zone5,year,month,day,dayofyear,hour,minute,second
0,90.8,797.0,12497,12014,51327,Sunday,27.2,132.3,13.23,186.0,...,912,0,0,2021,3,28,87,14,15,27
1,95.54,769.0,18614,16282,45154,Saturday,21.1,170.4,17.04,94.0,...,1341,276,0,2021,3,27,86,12,32,34


### Setting Up Training & Test Dataframes

The dataframe is split into a training set (80%) and a test set (20%).

In [7]:
# define features and target
X = df_model.drop('power_W', axis=1)
y = df_model.power_W

In [8]:
# train-test-split
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size = .2, random_state=105)

In [9]:
# Let's check the shape of our dataframes
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

(90132, 38)
(90132,)
(22534, 38)
(22534,)


General information: Ridge Regression and Lasso Regression can be used to reduce model complexity and prevent overfitting which may result from simple linear regression.

## Ridge Regression (L2 regularization)

A commenly used alternative to the "normal" linear regression model is Ridge regression. It is also a linear model that uses basically the same formula that is used for ordinary least squares. However, our ridge regression model will also try to keep the magnitude of coefficients to be as small as possible. In other words, all entries of w should be close to zero. We can also say, each feature should have as little effect on the outcome as possible while still predicting well.

Performing L2 regularization with different alpha values:

In [10]:
# initialize and train model with (default value) alpha = 1 
ridge = Ridge(alpha=1)
ridge.fit(X_train, y_train)

  return linalg.solve(A, Xy, sym_pos=True,


Ridge(alpha=1)

In [11]:
# predict on test-set
y_pred_ridge = ridge.predict(X_test)

In [12]:
# R-squared scores for train and test set
train_score_ridge = ridge.score(X_train, y_train)
test_score_ridge = ridge.score(X_test, y_test)
print("Train score: {:.2f}".format(train_score_ridge))
print("Test score: {:.2f}".format(test_score_ridge))

Train score: 0.65
Test score: 0.68


In [13]:
# RMSE of test set
print("MSE:", round((mean_squared_error(y_test, y_pred_ridge)), 3))

MSE: 632.35


In [14]:
# initialize and train model with alpha = 10 
ridge_10 = Ridge(alpha=10)
ridge_10.fit(X_train, y_train)

  return linalg.solve(A, Xy, sym_pos=True,


Ridge(alpha=10)

In [15]:
# predict on test-set
y_pred_ridge_10 = ridge_10.predict(X_test)

In [16]:
# R-squared scores for train and test set
train_score_ridge_10 = ridge_10.score(X_train, y_train)
test_score_ridge_10 = ridge_10.score(X_test, y_test)
print("Train score: {:.2f}".format(train_score_ridge_10))
print("Test score: {:.2f}".format(test_score_ridge_10))

Train score: 0.65
Test score: 0.68


In [18]:
# RMSE of test set
print("MSE:", round(mean_squared_error(y_test, y_pred_ridge_10), 3))

MSE: 632.336


In [19]:
# initialize and train model with alpha = 0.1
ridge_01 = Ridge(alpha=0.1)
ridge_01.fit(X_train, y_train)

  return linalg.solve(A, Xy, sym_pos=True,


Ridge(alpha=0.1)

In [20]:
# predict on test-set
y_pred_ridge_01 = ridge_01.predict(X_test)

In [21]:
# R-squared scores for train and test set
train_score_ridge_01 = ridge_01.score(X_train, y_train)
test_score_ridge_01 = ridge_01.score(X_test, y_test)
print("Train score: {:.2f}".format(train_score_ridge_01))
print("Test score: {:.2f}".format(test_score_ridge_01))

Train score: 0.65
Test score: 0.68


In [23]:
# RMSE of test set
print("MSE:", round(mean_squared_error(y_test, y_pred_ridge_01), 3))

MSE: 632.352


## Lasso Regression (L1 regularization)

An alternative to Ridge is Lasso regression. Similarly to ridge regression lasso restricts coefficients to be close to zero. It does so in a slightly different way so that when using lasso some coefficients become exactly zero. This means some features are entirely ignored by the model. It can be seen as an automatic feature selection which makes models often easier to interpret and can reveal the most important features.

Performing L1 regularization with different alpha values:

In [24]:
# initialize and train model with (default value) alpha = 1.0
lasso = Lasso(alpha=1, max_iter=10e5)
lasso.fit(X_train,y_train)

# predict on test-set
y_pred_lasso = lasso.predict(X_test)

# R-squared scores for train and test set
train_score_lasso = lasso.score(X_train, y_train)
test_score_lasso = lasso.score(X_test, y_test)
print("Train score: {:.2f}".format(train_score_lasso))
print("Test score: {:.2f}".format(test_score_lasso))

# RMSE of test set
print("MSE:", round(mean_squared_error(y_test, y_pred_lasso), 3))

# number of features used
coeff_used = np.sum(lasso.coef_!=0)
print("# features: ", coeff_used)

Train score: 0.65
Test score: 0.68
MSE: 634.891
# features:  24


In [25]:
# initialize and train model with alpha 0.01
# We'll also increase the amount of max_iter otherwise it will raise a warning. 
lasso_01 = Lasso(alpha=0.01, max_iter=1000000)
lasso_01.fit(X_train,y_train)

# predict on test-set
y_pred_lasso_01 = lasso_01.predict(X_test)

# R-squared scores for train and test set
train_score_lasso_01 = lasso_01.score(X_train, y_train)
test_score_lasso_01 = lasso_01.score(X_test, y_test)
print("Train score: {:.2f}".format(train_score_lasso_01))
print("Test score: {:.2f}".format(test_score_lasso_01))

# RMSE of test set
print("MSE:", round(mean_squared_error(y_test, y_pred_lasso_01), 3))

# number of features used
coeff_used = np.sum(lasso_01.coef_!=0)
print("# features: ", coeff_used)

Train score: 0.65
Test score: 0.68
MSE: 632.294
# features:  34


In [26]:
# initialize and train model with alpha 0.0001
lasso_0001 = Lasso(alpha=0.0001, max_iter=1000000)
lasso_0001.fit(X_train,y_train)

# predict on test-set
y_pred_lasso_0001 = lasso_0001.predict(X_test)

# R-squared scores for train and test set
train_score_lasso_0001 = lasso_0001.score(X_train, y_train)
test_score_lasso_0001 = lasso_0001.score(X_test, y_test)
print("Train score: {:.2f}".format(train_score_lasso_0001))
print("Test score: {:.2f}".format(test_score_lasso_0001))

# RMSE of test set
print("MSE:", round(mean_squared_error(y_test, y_pred_lasso_0001), 3))

# number of features used
coeff_used = np.sum(lasso_0001.coef_!=0)
print("# features: ", coeff_used)

Train score: 0.65
Test score: 0.68
MSE: 632.35
# features:  36


##  ElasticNet Regression

The ElasticNet regression combines the penalties of Lasso and Ridge. Often this combination works best, though at the price of having two parameters to adjust: one for the L1 regularization, and one for the L2 regularization.

In [31]:
# initialize and train model with (default value) alpha = 1.0
elastic = ElasticNet(alpha=1, l1_ratio=0.6, max_iter=10e5)
elastic.fit(X_train,y_train)

# predict on test-set
y_pred_elastic = elastic.predict(X_test)

# R-squared scores for train and test set
train_score_elastic = elastic.score(X_train, y_train)
test_score_elastic = elastic.score(X_test, y_test)
print("Train score: {:.2f}".format(train_score_elastic))
print("Test score: {:.2f}".format(test_score_elastic))

# RMSE of test set
print("MSE:", round(mean_squared_error(y_test, y_pred_elastic)), 3)

# number of features used
coeff_used = np.sum(elastic.coef_!=0)
print("# features: ", coeff_used)

print('This cell was last run on: ')
print(datetime.now())

Train score: 0.65
Test score: 0.68
MSE: 636 3
# features:  24


----

## Conclusion

Now you saw three different ways to prevent a linear regression model from overfitting and for prediction. In practice, ridge regression is usually the first choice. But if there are  a large amount of features and want to improve the interpretability of a model it makes sense to go for lasso regression since it will eliminate some of your features.
With  the ElasticNet both can be combined.
All in all and with a look on the outcome for the three typs - the results are not satisfying!
For this reason go on with: [CyPer_TensorFlow](CyPer_TensorFlow.ipynb).