# **L1 and L2 Regularization**

![Graphs for Overfitt, Underfitt, and Balancedfitt](resources/Graphs.png)

Let's first take a look at **_Underfit_** graph shown in the image.  
  
Let's say we're trying to predict the number of matches won based on age. Usually when the player gets aged, the matches won reduces. We can have this kind of distribution where to build a model, we can create a simple Linear Regression model and the equation might looks like:  

> match won = θ₀ + θ₁ * age
>
> Where θ₀ and θ₁ are just constants. This is a regular and simple linear equation.

But we can see that the line is not accurately describing all the datapoints. It is trying to find a best fit in terms of a straight line but we can see some of the datapoints are kind of going away. This is not a very accurate representation of our data distribution.  

Then we can build a distribution which might looks like the second graph labelled as **_Overfit_**. Here we're trying to build a line which kind of exactly passing through all of the datapoints. In that case our equation might look like this:  

> match won = θ₀ + θ₁ * age + θ₂ * age² + θ₃ * age³ + θ₄ * age⁴
>
> It's a higher order polynomial equation.

But the issue is the equation os really complicated and the line is zig zag type of line which is passing through all the datapoints. This is not generalizing the distribution really well.  

What might be better if we have a line like shown in thrid graph labelled as **_Balanced Fit_**. It shows a balance between those two cases we discussed above. The equation might look like:  

> match won = θ₀ + θ₁ * age + θ₂ * age²

The line looks like a curve and it generalize the data really well. If the more datapoints come in, it will be able to make a solid prediction for them as well. 

## How do we reduce overfitting?

![Reducing Overfitting](resources/reduce_overfitting.png)

We know that in Linear Regression we have to calculate _Mean Square Error_. This _MSE_ helps us to reduce the θ₃ and θ₄ close to zero. The equation of _Mean Square Error_ is as follow:  

> $$ \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - y_{\text{predicted}})^2 $$
>
> Here $$ y_{\text{predicted}} $$ is actually $$ h_θ (x_i) $$ where $$ h_θ (x_i) = θ_0 + θ_1x_1 + θ_2(x_2)^2 + θ_3(x_3)^3 $$
>
> It is a high order polynomial equation. And $$ x_1, x_2, x_3 $$ are the featrues. In our case these are the age of a person.

This _Mean Square Error_ will be used during the training and we want to minimize its value after each iteration. So for that purpose what if we add another parameter in this equation like following:  

> $$ \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - y_{\text{predicted}})^2 + λ \sum_{i=1}^{n} (θ_i)^2 $$
>
> Where **λ** is a free parameter like a tuning nob and we are making a square of each of the **θ** parameters.
>
> Now if **θ** gets bigger, the new added value will get bigger which will increase the value of error.

We've added a penalty for model if it tries to make higher values of θ. This panelty make sure that θ value doesn't go too high.  

This Final Equation is called **L 2 Regularization** because we're using a sqaure in Lambda value.  

On the Other Hand,  

**L 1 Regularization** use the absolute value like following:  

> $$ \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - y_{\text{predicted}})^2 + λ \sum_{i=1}^{n} |θ_i| $$

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [5]:
import warnings
warnings.filterwarnings('ignore')

In [7]:
df = pd.read_csv('resources/Melbourne_housing.csv')
df.tail()

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
34852,Yarraville,13 Burns St,4,h,1480000.0,PI,Jas,24/02/2018,6.3,3013.0,...,1.0,3.0,593.0,,,Maribyrnong City Council,-37.81053,144.88467,Western Metropolitan,6543.0
34853,Yarraville,29A Murray St,2,h,888000.0,SP,Sweeney,24/02/2018,6.3,3013.0,...,2.0,1.0,98.0,104.0,2018.0,Maribyrnong City Council,-37.81551,144.88826,Western Metropolitan,6543.0
34854,Yarraville,147A Severn St,2,t,705000.0,S,Jas,24/02/2018,6.3,3013.0,...,1.0,2.0,220.0,120.0,2000.0,Maribyrnong City Council,-37.82286,144.87856,Western Metropolitan,6543.0
34855,Yarraville,12/37 Stephen St,3,h,1140000.0,SP,hockingstuart,24/02/2018,6.3,3013.0,...,,,,,,Maribyrnong City Council,,,Western Metropolitan,6543.0
34856,Yarraville,3 Tarrengower St,2,h,1020000.0,PI,RW,24/02/2018,6.3,3013.0,...,1.0,0.0,250.0,103.0,1930.0,Maribyrnong City Council,-37.8181,144.89351,Western Metropolitan,6543.0


In [8]:
df.nunique()

Suburb             351
Address          34009
Rooms               12
Type                 3
Price             2871
Method               9
SellerG            388
Date                78
Distance           215
Postcode           211
Bedroom2            15
Bathroom            11
Car                 15
Landsize          1684
BuildingArea       740
YearBuilt          160
CouncilArea         33
Lattitude        13402
Longtitude       14524
Regionname           8
Propertycount      342
dtype: int64

In [10]:
df.shape

(34857, 21)

In [11]:
cols_to_use = ['Suburb', 'Rooms', 'Type', 'Method', 'SellerG', 'Regionname', 'Propertycount', 'Distance', 'CouncilArea', 
               'Bedroom2', 'Bathroom', 'Car', 'Landsize', 'BuildingArea', 'Price']
df = df[cols_to_use]

In [12]:
df.head()

Unnamed: 0,Suburb,Rooms,Type,Method,SellerG,Regionname,Propertycount,Distance,CouncilArea,Bedroom2,Bathroom,Car,Landsize,BuildingArea,Price
0,Abbotsford,2,h,SS,Jellis,Northern Metropolitan,4019.0,2.5,Yarra City Council,2.0,1.0,1.0,126.0,,
1,Abbotsford,2,h,S,Biggin,Northern Metropolitan,4019.0,2.5,Yarra City Council,2.0,1.0,1.0,202.0,,1480000.0
2,Abbotsford,2,h,S,Biggin,Northern Metropolitan,4019.0,2.5,Yarra City Council,2.0,1.0,0.0,156.0,79.0,1035000.0
3,Abbotsford,3,u,VB,Rounds,Northern Metropolitan,4019.0,2.5,Yarra City Council,3.0,2.0,1.0,0.0,,
4,Abbotsford,3,h,SP,Biggin,Northern Metropolitan,4019.0,2.5,Yarra City Council,3.0,2.0,0.0,134.0,150.0,1465000.0


In [13]:
df.isna().sum()

Suburb               0
Rooms                0
Type                 0
Method               0
SellerG              0
Regionname           3
Propertycount        3
Distance             1
CouncilArea          3
Bedroom2          8217
Bathroom          8226
Car               8728
Landsize         11810
BuildingArea     21115
Price             7610
dtype: int64

In [14]:
cols_to_fill_zero = ['Propertycount', 'Distance', 'Bedroom2', "Bathroom", 'Car']
df[cols_to_fill_zero] = df[cols_to_fill_zero].fillna(0)

In [19]:
df['Landsize'] = df['Landsize'].fillna(df.Landsize.mean())
df['BuildingArea'] = df['BuildingArea'].fillna(df.BuildingArea.mean())

In [21]:
df.dropna(inplace=True)
df.isna().sum()

Suburb           0
Rooms            0
Type             0
Method           0
SellerG          0
Regionname       0
Propertycount    0
Distance         0
CouncilArea      0
Bedroom2         0
Bathroom         0
Car              0
Landsize         0
BuildingArea     0
Price            0
dtype: int64

In [23]:
df = pd.get_dummies(df, drop_first = True)

In [24]:
df.head()

Unnamed: 0,Rooms,Propertycount,Distance,Bedroom2,Bathroom,Car,Landsize,BuildingArea,Price,Suburb_Aberfeldie,...,CouncilArea_Moorabool Shire Council,CouncilArea_Moreland City Council,CouncilArea_Nillumbik Shire Council,CouncilArea_Port Phillip City Council,CouncilArea_Stonnington City Council,CouncilArea_Whitehorse City Council,CouncilArea_Whittlesea City Council,CouncilArea_Wyndham City Council,CouncilArea_Yarra City Council,CouncilArea_Yarra Ranges Shire Council
1,2,4019.0,2.5,2.0,1.0,1.0,202.0,160.2564,1480000.0,False,...,False,False,False,False,False,False,False,False,True,False
2,2,4019.0,2.5,2.0,1.0,0.0,156.0,79.0,1035000.0,False,...,False,False,False,False,False,False,False,False,True,False
4,3,4019.0,2.5,3.0,2.0,0.0,134.0,150.0,1465000.0,False,...,False,False,False,False,False,False,False,False,True,False
5,3,4019.0,2.5,3.0,2.0,1.0,94.0,160.2564,850000.0,False,...,False,False,False,False,False,False,False,False,True,False
6,4,4019.0,2.5,3.0,1.0,2.0,120.0,142.0,1600000.0,False,...,False,False,False,False,False,False,False,False,True,False


In [26]:
X = df.drop('Price', axis=1)
y = df['Price']

In [30]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 2)

In [31]:
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(X_train, y_train)

0,1,2
,fit_intercept,True
,copy_X,True
,tol,1e-06
,n_jobs,
,positive,False


In [32]:
reg.score(X_test, y_test)

0.13853683161574926

In [33]:
reg.score(X_train, y_train)

0.6827792395792722

In [41]:
from sklearn.linear_model import Lasso  #L1 Regularization
lesso_reg = Lasso(alpha = 50, max_iter = 100, tol = 0.1)
lesso_reg.fit(X_train, y_train)

0,1,2
,alpha,50
,fit_intercept,True
,precompute,False
,copy_X,True
,max_iter,100
,tol,0.1
,warm_start,False
,positive,False
,random_state,
,selection,'cyclic'


In [42]:
lesso_reg.score(X_test, y_test)

0.6636111369404487

In [43]:
lesso_reg.score(X_train, y_train)

0.6766985624766824

In [44]:
from sklearn.linear_model import Ridge  #L2 Regularization
ridge_reg = Ridge(alpha = 50, max_iter = 100, tol = 0.1)

In [45]:
ridge_reg.fit(X_train, y_train)

0,1,2
,alpha,50
,fit_intercept,True
,copy_X,True
,max_iter,100
,tol,0.1
,solver,'auto'
,positive,False
,random_state,


In [46]:
ridge_reg.score(X_test, y_test)

0.6670848945194976