# Lasso and Ridge regression

Linear Regression is a basic but powerful algorithm often used for predictive modeling. Yet, its effectiveness can be hindered when handling complex datasets that tend to lead to overfitting. 
Lasso (Least Absolute Shrinkage and Selection Operator) and Ridge regressions are both variants of linear regression that aim to improve model performance and prevent overfitting, particularly when dealing with datasets that have a large number of features or variables. They are part of a family of regularized regression techniques that add a penalty term to the linear regression cost function.

### Boston Housing Price

#### Libraries

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Lasso, Ridge
from sklearn.metrics import mean_squared_error

#### Reading the data

In [2]:
colnames = ["CRIM", "ZN", "Indus", "CHAS", "NOS", "RM", "AGE","DIS","RAD","TAX", "PTRATIO","B","LSTAT","MEDV"]
data = pd.read_csv("/Users/maralhajizadeh/Downloads/housing.csv", delim_whitespace=True, header=None, names=colnames )
data.head(10)

Unnamed: 0,CRIM,ZN,Indus,CHAS,NOS,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222.0,18.7,396.9,5.33,36.2
5,0.02985,0.0,2.18,0,0.458,6.43,58.7,6.0622,3,222.0,18.7,394.12,5.21,28.7
6,0.08829,12.5,7.87,0,0.524,6.012,66.6,5.5605,5,311.0,15.2,395.6,12.43,22.9
7,0.14455,12.5,7.87,0,0.524,6.172,96.1,5.9505,5,311.0,15.2,396.9,19.15,27.1
8,0.21124,12.5,7.87,0,0.524,5.631,100.0,6.0821,5,311.0,15.2,386.63,29.93,16.5
9,0.17004,12.5,7.87,0,0.524,6.004,85.9,6.5921,5,311.0,15.2,386.71,17.1,18.9


In [3]:
print(data.shape)

(506, 14)


#### Separating target and features

In [4]:
target = ["MEDV"]
features = ["CRIM", "ZN", "Indus", "CHAS", "NOS", "RM", "AGE","DIS","RAD","TAX", "PTRATIO","B","LSTAT"]

y = data[target]
X = data[features]

#### Train test split

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state= 5 )
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(404, 13)
(102, 13)
(404, 1)
(102, 1)


#### Linear regression

In [6]:
linear_regression = LinearRegression()
linear_regression.fit(X_train, y_train)

y_prediction = linear_regression.predict(X_test)

In [7]:
mse_linear = mean_squared_error(y_prediction, y_test)
print(mse_linear)

20.86929218377082


In [8]:
print(linear_regression.coef_)

[[-1.30799852e-01  4.94030235e-02  1.09535045e-03  2.70536624e+00
  -1.59570504e+01  3.41397332e+00  1.11887670e-03 -1.49308124e+00
   3.64422378e-01 -1.31718155e-02 -9.52369666e-01  1.17492092e-02
  -5.94076089e-01]]


#### Lasso Regression

Lasso regression not only minimizes the sum of squared errors between the predicted and actual values but also adds a penalty term equal to the absolute value of the coefficients of the regression variables. This penalty term encourages less important features to have their coefficients shrink to zero, effectively performing feature selection. Lasso regression is valuable when dealing with datasets with many features, as it can help identify the most significant features for the model, leading to simpler and more interpretable models.

In [9]:
lambda_values = [0.000001, 0.0001, 0.001, 0.005, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5]

for lambda_val in lambda_values:
    lasso_reg = Lasso(lambda_val)
    lasso_reg.fit(X_train, y_train)
    y_lasso_pred = lasso_reg.predict(X_test)
    mse_lasso_reg = mean_squared_error(y_lasso_pred, y_test)
    print(("Lasso MSE with Lambda={} is {}").format(lambda_val, mse_lasso_reg))

Lasso MSE with Lambda=1e-06 is 20.869316466359194
Lasso MSE with Lambda=0.0001 is 20.871726598779265
Lasso MSE with Lambda=0.001 is 20.89369835693941
Lasso MSE with Lambda=0.005 is 21.00345613086737
Lasso MSE with Lambda=0.01 is 21.167936856245273
Lasso MSE with Lambda=0.05 is 23.100190101810377
Lasso MSE with Lambda=0.1 is 23.40636423156822
Lasso MSE with Lambda=0.2 is 24.00621891011563
Lasso MSE with Lambda=0.3 is 24.375391018990012
Lasso MSE with Lambda=0.4 is 24.85064790987889
Lasso MSE with Lambda=0.5 is 25.40742550777958


In [10]:
print(lasso_reg.coef_)

[-0.10423955  0.0555089  -0.00434685  0.         -0.          1.98853933
  0.00834505 -0.94912292  0.33753916 -0.01617382 -0.81471466  0.0108382
 -0.7345398 ]


#### Ridge Regression

Ridge regression, on the other hand, adds a penalty term equal to the square of the coefficients of the regression variables. This penalty term prevents overfitting by shrinking the coefficients of less important features, effectively reducing their impact on the model. Ridge regression is particularly useful when dealing with multicollinearity, a situation where two or more predictor variables are highly correlated. By reducing the impact of these correlated features, Ridge regression improves the stability of the model's estimates.

In [11]:
lambda_values = [0.000001, 0.0001, 0.001, 0.005, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5]

for lambda_val in lambda_values:
    ridge_reg = Ridge(lambda_val)
    ridge_reg.fit(X_train, y_train)
    y_ridge_pred = ridge_reg.predict(X_test)
    mse_ridge_reg = mean_squared_error(y_ridge_pred, y_test)
    print(("Ridge MSE with Lambda={} is {}").format(lambda_val, mse_lasso_reg))

Ridge MSE with Lambda=1e-06 is 25.40742550777958
Ridge MSE with Lambda=0.0001 is 25.40742550777958
Ridge MSE with Lambda=0.001 is 25.40742550777958
Ridge MSE with Lambda=0.005 is 25.40742550777958
Ridge MSE with Lambda=0.01 is 25.40742550777958
Ridge MSE with Lambda=0.05 is 25.40742550777958
Ridge MSE with Lambda=0.1 is 25.40742550777958
Ridge MSE with Lambda=0.2 is 25.40742550777958
Ridge MSE with Lambda=0.3 is 25.40742550777958
Ridge MSE with Lambda=0.4 is 25.40742550777958
Ridge MSE with Lambda=0.5 is 25.40742550777958


In [12]:
print(ridge_reg.coef_)

[[-1.29041139e-01  5.05043549e-02 -1.91019826e-02  2.57832650e+00
  -1.10800075e+01  3.42613786e+00 -2.28696986e-03 -1.41932773e+00
   3.54363863e-01 -1.36443753e-02 -9.04455056e-01  1.19076825e-02
  -6.03367738e-01]]


Both Lasso and Ridge regressions are powerful techniques for addressing overfitting and improving the generalization of linear regression models. They strike a balance between model complexity and performance, enabling more robust and reliable predictions, particularly in situations where there is a high risk of overfitting or when dealing with datasets with a large number of features.