The initial steps remain the same, where we import the necessary libraries, separate the dependent and independent variables, and split the dataset.

In [155]:
# import the necessary libraries
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.linear_model import RidgeCV, Ridge


In [156]:
# read the dataset
df = pd.read_csv("Energy_Efficiency_Overfit_Dataset_Updated.csv")

In [157]:
# separate the dependent and independent variables
X = df.drop('Energy_Efficiency_Rating', axis = 1)
y = df['Energy_Efficiency_Rating']

In [158]:
# split the data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

<p style = 'color:green'><b>Run all the cells above before you begin</b><p>

# L2 Regularization on Original Dataset

Now, we know that l2 reduces the coefficients to a value close to zero. To check this we can compare the coefficient values that we get from standard linear regression and compare it with the coefficients we get after applying L2 regularization. Let's begin with fitting the linear regression on the original dataset and get the coefficient values for each feature.

In [161]:
# fti the linear regression model
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)

LinearRegression()

In [162]:
# Get the coefficients of the features before L2 regularization
coefficients_before_l2 = pd.Series(linear_model.coef_, index=X_train.columns)
coefficients_before_l2

Wall_Area              0.302253
Roof_Area              0.241509
Window_Area            0.259234
Overall_Height        -0.103891
Outdoor_Temperature   -0.080775
Humidity              -0.066741
Noise_Feature_1       -1.157413
Noise_Feature_2        0.456279
Noise_Feature_3       -1.231271
Noise_Feature_4       -0.009997
Noise_Feature_5        0.815266
Noise_Feature_6        1.790095
Noise_Feature_7       -0.289109
Noise_Feature_8        4.026383
Noise_Feature_9        0.855612
Noise_Feature_10      -0.977680
Orientation_East       0.919734
Orientation_North     -0.817762
Orientation_South      1.497093
Orientation_West      -1.599065
Glazing_Type_Type_A    0.264858
Glazing_Type_Type_B    0.139014
Glazing_Type_Type_C   -0.403872
dtype: float64

Now, let us get the coefficients for ridge regularization. Just like we did for Lasso, we will first create an array of possible alpha values using numpy linspace, then we will construct the ridge_cv 10-fold cross validation models to get the model with the best value of alpha.

In [173]:
# Create an array with 20 numbers equally spaced between 0 to 10
alphas = np.linspace(0, 10, 20)
alphas

array([ 0.        ,  0.52631579,  1.05263158,  1.57894737,  2.10526316,
        2.63157895,  3.15789474,  3.68421053,  4.21052632,  4.73684211,
        5.26315789,  5.78947368,  6.31578947,  6.84210526,  7.36842105,
        7.89473684,  8.42105263,  8.94736842,  9.47368421, 10.        ])

In [174]:
# Initialize RidgeCV to find the best alpha for L2 regularization
ridge_cv = RidgeCV(alphas=alphas, cv=10, scoring='r2')
ridge_cv.fit(X_train, y_train)

  return linalg.solve(A, Xy, sym_pos=True, overwrite_a=True).T
  return linalg.solve(A, Xy, sym_pos=True, overwrite_a=True).T
  return linalg.solve(A, Xy, sym_pos=True, overwrite_a=True).T
  return linalg.solve(A, Xy, sym_pos=True, overwrite_a=True).T
  return linalg.solve(A, Xy, sym_pos=True, overwrite_a=True).T


RidgeCV(alphas=array([ 0.        ,  0.52631579,  1.05263158,  1.57894737,  2.10526316,
        2.63157895,  3.15789474,  3.68421053,  4.21052632,  4.73684211,
        5.26315789,  5.78947368,  6.31578947,  6.84210526,  7.36842105,
        7.89473684,  8.42105263,  8.94736842,  9.47368421, 10.        ]),
        cv=10, scoring='r2')

We can see the possible alpha values above. Lets get the best apha using the .alpha function.

In [175]:
# Find the best alpha value
best_alpha_ridge = ridge_cv.alpha_
best_alpha_ridge

10.0

Since we got an alpha which was at the end of the list of alphas that we were trying, let's re-build a model by expanding the range of alphas to vary from 10 to 30. 

In [176]:
# Create an array with 20 numbers equally spaced between 0 to 10
alphas = np.linspace(10, 30, 20)
# Initialize RidgeCV to find the best alpha for L2 regularization
ridge_cv = RidgeCV(alphas=alphas, cv=10, scoring='r2')
ridge_cv.fit(X_train, y_train)
ridge_cv.alpha_

24.736842105263158

Now, we can observe that the best aplha is 24.7, which is also in the middle of the range that we specified. The model with the best alpha is automatically saved in the Ridge_CV model instance.Let's use it to get the coefficients of the model.

In [177]:
# fit the coefficients after L2 regularization
coefficients_after_l2 = pd.Series(ridge_cv.coef_, index=X_train.columns)

In [179]:
# Compare the coefficients before and after L2 regularization
coefficients_comparison = pd.DataFrame({
    'Standard Regularization': coefficients_before_l2,
    'After L2 Regularization': coefficients_after_l2
})

coefficients_comparison

Unnamed: 0,Standard Regularization,After L2 Regularization
Wall_Area,0.302253,0.300379
Roof_Area,0.241509,0.244928
Window_Area,0.259234,0.259251
Overall_Height,-0.103891,-0.05981
Outdoor_Temperature,-0.080775,-0.090001
Humidity,-0.066741,-0.06619
Noise_Feature_1,-1.157413,-0.426444
Noise_Feature_2,0.456279,0.211002
Noise_Feature_3,-1.231271,-0.358951
Noise_Feature_4,-0.009997,0.359567


As you can observe almost all the coefficients have decreased in magnitude after applying L2 regularization, which is consistent with the expected behavior of Ridge regression. It penalizes the size of coefficients, thus shrinking them towards zero but not setting them to zero.
The coefficients associated with some Noise_Feature variables have shown significant shrinkage for example Noise_feature 6 and 8. This suggests that these may have been contributing to overfitting in the __non-regularized model__.
Similarly, the coefficients of Orientation and Glazing type features have also reduced.


Now, comes the last step, where we need to Calculate the R-squared score for both the training and testing sets to evaluate the performance of the model  with ridge regularization.

In [181]:
# R-squared scores for the Ridge model
r2_train_ridge = ridge_cv.score(X_train, y_train)
r2_test_ridge = ridge_cv.score(X_test, y_test)


r2_train_ridge, r2_test_ridge

(0.9447358107549229, 0.8721256088178196)

Here we can see that the Ridge model achieved an R-squared value of about 0.94 on the training set, which is slightly higher than 0.93 achieved with L1 regularization. When we step into the testing ground, the Ridge model presented an R-squared value of approximately 0.87,which is the similar to L1 model. This is a slightly more overfitting model compare to L1 regulalrization.

