# L1 and L2 regularization

A regularization is done to solve the overfit problem of a model. The overfit problem occurs when a model learns from a training data all the signal and noise. Becoming too good in the training but performing poorly in test or validation set.

<img src="https://miro.medium.com/max/1125/1*_7OPgojau8hkiPUiHoGK_w.png"/>

Some ways to prevent this are cross-validation, low number of features, regularization, pruning which is a technique associated with decision trees, etc.

The idea behind regularization is to add penalty as the model increses in terms of complexity.

<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcS7l4SjTqpmGb6fS8-yc12sdVZ4tBSBB8_Wchtq-6qu7RiK71wdPZQ4Z97ZjosYipkdE6U&usqp=CAU"/> 


For a regression model a L1 regularization technique is **Lasso Regression** and L2 **Ridge Regression**.

<img src="https://miro.medium.com/max/550/1*-LydhQEDyg-4yy5hGEj5wA.png"/>

The difference between these two are:
   - L1 consider **absolute** value of coefficient as penalty to the loss function. 
   - L2 consider a “squared magnitude” of coefficient as penalty to the loss function.
    
According to <a href="https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c">Anuja Nagpal</a>:
"Traditional methods like cross-validation, stepwise regression to handle overfitting and perform feature selection work well with a small set of features but these techniques are a great alternative when we are dealing with a large set of features."

L1 regularization is good for feature selection because it eliminates the features that are not important. This is helpful when the number of feature points are large in number.

In [240]:
import pandas as pd

In [241]:
df = pd.read_csv('sample_data/housing.csv')

In [242]:
df.shape

(20640, 10)

In [243]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 10 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   longitude           20640 non-null  float64
 1   latitude            20640 non-null  float64
 2   housing_median_age  20640 non-null  float64
 3   total_rooms         20640 non-null  float64
 4   total_bedrooms      20433 non-null  float64
 5   population          20640 non-null  float64
 6   households          20640 non-null  float64
 7   median_income       20640 non-null  float64
 8   median_house_value  20640 non-null  float64
 9   ocean_proximity     20640 non-null  object 
dtypes: float64(9), object(1)
memory usage: 1.6+ MB


In [244]:
df.isna().sum()

longitude               0
latitude                0
housing_median_age      0
total_rooms             0
total_bedrooms        207
population              0
households              0
median_income           0
median_house_value      0
ocean_proximity         0
dtype: int64

In [245]:
df.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,ocean_proximity
0,-122.23,37.88,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0,NEAR BAY
1,-122.22,37.86,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0,NEAR BAY
2,-122.24,37.85,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0,NEAR BAY
3,-122.25,37.85,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0,NEAR BAY
4,-122.25,37.85,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0,NEAR BAY


In [246]:
X = df.drop(['median_house_value', 'ocean_proximity'], axis=1)
y = df['median_house_value']

In [247]:
from sklearn.model_selection import train_test_split

In [248]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) 
X_train.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income
12069,-117.55,33.83,6.0,502.0,76.0,228.0,65.0,4.2386
15925,-122.44,37.73,52.0,2381.0,492.0,1485.0,447.0,4.3898
11162,-118.0,33.83,26.0,1718.0,385.0,1022.0,368.0,3.9333
4904,-118.26,34.01,38.0,697.0,208.0,749.0,206.0,1.4653
4683,-118.36,34.08,52.0,2373.0,601.0,1135.0,576.0,3.1765


In [249]:
from sklearn.impute import SimpleImputer
import numpy as np
si = SimpleImputer(strategy='mean', missing_values=np.nan)

In [250]:
X_train = pd.DataFrame(si.fit_transform(X_train), columns=X_train.columns)
X_train.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income
0,-117.55,33.83,6.0,502.0,76.0,228.0,65.0,4.2386
1,-122.44,37.73,52.0,2381.0,492.0,1485.0,447.0,4.3898
2,-118.0,33.83,26.0,1718.0,385.0,1022.0,368.0,3.9333
3,-118.26,34.01,38.0,697.0,208.0,749.0,206.0,1.4653
4,-118.36,34.08,52.0,2373.0,601.0,1135.0,576.0,3.1765


In [251]:
X_test = pd.DataFrame(si.fit_transform(X_test), columns=X_test.columns)
X_test.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income
0,-117.05,32.58,22.0,2101.0,399.0,1551.0,371.0,4.1518
1,-117.97,33.92,32.0,2620.0,398.0,1296.0,429.0,5.7796
2,-121.84,38.65,29.0,3167.0,548.0,1554.0,534.0,4.3487
3,-115.6,33.2,37.0,709.0,187.0,390.0,142.0,2.4511
4,-122.43,37.79,25.0,1637.0,394.0,649.0,379.0,5.0049


In [252]:
X_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16512 entries, 0 to 16511
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   longitude           16512 non-null  float64
 1   latitude            16512 non-null  float64
 2   housing_median_age  16512 non-null  float64
 3   total_rooms         16512 non-null  float64
 4   total_bedrooms      16512 non-null  float64
 5   population          16512 non-null  float64
 6   households          16512 non-null  float64
 7   median_income       16512 non-null  float64
dtypes: float64(8)
memory usage: 1.0 MB


In [253]:
X_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4128 entries, 0 to 4127
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   longitude           4128 non-null   float64
 1   latitude            4128 non-null   float64
 2   housing_median_age  4128 non-null   float64
 3   total_rooms         4128 non-null   float64
 4   total_bedrooms      4128 non-null   float64
 5   population          4128 non-null   float64
 6   households          4128 non-null   float64
 7   median_income       4128 non-null   float64
dtypes: float64(8)
memory usage: 258.1 KB


In [255]:
from sklearn.linear_model import LinearRegression
regular_model = LinearRegression().fit(X_train, y_train)

In [256]:
regular_model.score(X_test, y_test) # Return the coefficient of determination R^2 of the prediction.

0.6261171986364505

In [257]:
regular_model.score(X_train, y_train) # Return the coefficient of determination R^2 of the prediction.

0.6378897814491684

# Lasso Regression = l1 regularization

In [258]:
from sklearn.linear_model import Lasso

In [264]:
lasso_reg = Lasso(alpha=50, max_iter=200, tol=0.1).fit(X_train, y_train)

In [265]:
lasso_reg.score(X_test, y_test) # Return the coefficient of determination R^2 of the prediction.

0.6261092075030315

In [266]:
lasso_reg.score(X_train, y_train) # Return the coefficient of determination R^2 of the prediction.

0.6378884795119291

# Ridge Regression = l2 regularization

In [267]:
from sklearn.linear_model import Ridge
ridge_reg = Ridge(alpha=50, max_iter=200, tol=0.1).fit(X_train, y_train)

In [268]:
ridge_reg.score(X_test, y_test) # Return the coefficient of determination R^2 of the prediction.

0.6260912437441071

In [269]:
lasso_reg.score(X_test, y_test) # Return the coefficient of determination R^2 of the prediction.

0.6261092075030315