<h1 style = "color:red; border-bottom: 4px solid gold; 
           align: center;
           padding-bottom: 5px;">Regularization Tutorial</h1>

## 🌟 Introduction to Regularization

**Regularization** is a method used in machine learning and statistics to prevent a model from **overfitting** — that is, fitting the training data too closely and performing poorly on unseen data.

---

### 🔹 Intuitive Idea

When a model learns, it tries to minimize the **error** (or loss) between predictions and actual values.  
However, if it focuses too much on the training data, it may start memorizing noise rather than learning real patterns.

Regularization helps by adding a **penalty** to the model’s loss function, discouraging overly large or complex weights.

---

### 🔹 The Penalization Term

The modified loss function looks like this:

$$
\text{New Loss} = \text{Original Loss (error)} + \lambda \times \text{Penalty}
$$

- **λ (lambda)** controls how strong the penalty is:
  - If λ = 0 → no regularization (risk of overfitting)
  - If λ is large → strong penalty (model becomes simpler)

---

### 🔸 Common Types of Regularization

1. **L1 Regularization (Lasso)**  
   Adds the absolute values of weights:  
   $$
   \text{Penalty} = \lambda \sum |w_i|
   $$  
   → pushes some weights exactly to **zero**, effectively performing **feature selection**.

2. **L2 Regularization (Ridge)**  
   Adds the squares of weights:  
   $$
   \text{Penalty} = \lambda \sum w_i^2
   $$  
   → shrinks weights toward zero but keeps all features.

3. **Elastic Net**  
   Combines both L1 and L2 penalties.

---

### 🔹 Intuition

Regularization is like telling the model:  
> “Fit the data well, but keep the coefficients small — don’t overcomplicate things.”

---


In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV


In [2]:


url = "https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv"
df = pd.read_csv(url)


## Linear Regressio

In [3]:
X = df.drop(["medv"],axis =1)
y = df.medv

In [4]:
lin_reg = LinearRegression()
mse = cross_val_score(lin_reg,X,y,scoring="neg_mean_squared_error",cv = 5)
np.mean(mse)

-37.13180746769902

## Ridge

In [5]:
ridge = Ridge()
parameters = {"alpha": [1e-15,1e-10,1e-5,1e-2,1e-1,1,2,3,5,10,20,25,30,50,75,100,1000,1000]}
ridge_regressor = GridSearchCV(ridge,parameters,scoring = "neg_mean_squared_error",cv =5)
ridge_regressor.fit(X,y)
pd.DataFrame(ridge_regressor.cv_results_)[["params","mean_test_score","rank_test_score"]]



Unnamed: 0,params,mean_test_score,rank_test_score
0,{'alpha': 1e-15},-37.131807,18
1,{'alpha': 1e-10},-37.131807,17
2,{'alpha': 1e-05},-37.131758,16
3,{'alpha': 0.01},-37.083135,15
4,{'alpha': 0.1},-36.707045,14
5,{'alpha': 1},-35.266731,13
6,{'alpha': 2},-34.785797,12
7,{'alpha': 3},-34.507178,11
8,{'alpha': 5},-34.112014,10
9,{'alpha': 10},-33.395224,9


In [6]:
ridge_regressor.best_params_

{'alpha': 100}

In [7]:
ridge_model =Ridge(alpha=100)
ridge_model.fit(X,y)
ridge_model.score(X,y)

0.717864397707179

## Lassos

In [8]:
lasso =Lasso()
lasso_model = GridSearchCV(lasso,parameters, scoring="neg_mean_squared_error", cv=5)
lasso_model.fit(X,y)
print(lasso_model.best_params_)
print(lasso_model.best_score_)

  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(


{'alpha': 0.1}
-34.830432318202675


In [9]:
lasso_model =Lasso(alpha=100)
lasso_model.fit(X,y)
lasso_model.score(X,y)

0.22497922550751603

### Melbourne housing

---

In [10]:
data =pd.read_csv("Melbourne_housing_FULL.csv")
data.head()

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
0,Abbotsford,68 Studley St,2,h,,SS,Jellis,3/09/2016,2.5,3067.0,...,1.0,1.0,126.0,,,Yarra City Council,-37.8014,144.9958,Northern Metropolitan,4019.0
1,Abbotsford,85 Turner St,2,h,1480000.0,S,Biggin,3/12/2016,2.5,3067.0,...,1.0,1.0,202.0,,,Yarra City Council,-37.7996,144.9984,Northern Metropolitan,4019.0
2,Abbotsford,25 Bloomburg St,2,h,1035000.0,S,Biggin,4/02/2016,2.5,3067.0,...,1.0,0.0,156.0,79.0,1900.0,Yarra City Council,-37.8079,144.9934,Northern Metropolitan,4019.0
3,Abbotsford,18/659 Victoria St,3,u,,VB,Rounds,4/02/2016,2.5,3067.0,...,2.0,1.0,0.0,,,Yarra City Council,-37.8114,145.0116,Northern Metropolitan,4019.0
4,Abbotsford,5 Charles St,3,h,1465000.0,SP,Biggin,4/03/2017,2.5,3067.0,...,2.0,0.0,134.0,150.0,1900.0,Yarra City Council,-37.8093,144.9944,Northern Metropolitan,4019.0


In [11]:
# let's use limited columns which makes more sense for serving our purpose
cols_to_use = ["Suburb","Type",'Rooms', 'Method', 'SellerG', 'Regionname', 'Propertycount', 
               'Distance', 'CouncilArea', 'Bedroom2', 'Bathroom', 'Car', 'Landsize', 'BuildingArea', 'Price']
data = data[cols_to_use]

In [12]:
data.isna().sum()

Suburb               0
Type                 0
Rooms                0
Method               0
SellerG              0
Regionname           3
Propertycount        3
Distance             1
CouncilArea          3
Bedroom2          8217
Bathroom          8226
Car               8728
Landsize         11810
BuildingArea     21115
Price             7610
dtype: int64

## Handling Missing values
---


In [13]:
# Some feature's missing values can be treated as zero (another class for NA values or absence of that feature)
# like 0 for Propertycount, Bedroom2 will refer to other class of NA values
# like 0 for Car feature will mean that there's no car parking feature with house
cols_to_fill_zero = ['Propertycount', 'Distance', 'Bedroom2', 'Bathroom', 'Car']
data[cols_to_fill_zero] = data[cols_to_fill_zero].fillna(0)

# other continuous features can be imputed with mean for faster results since our focus is on Reducing overfitting
# using Lasso and Ridge Regression
data['Landsize'] = data['Landsize'].fillna(data.Landsize.mean())
data['BuildingArea'] = data['BuildingArea'].fillna(data.BuildingArea.mean())

In [14]:
data.isna().sum()

Suburb              0
Type                0
Rooms               0
Method              0
SellerG             0
Regionname          3
Propertycount       0
Distance            0
CouncilArea         3
Bedroom2            0
Bathroom            0
Car                 0
Landsize            0
BuildingArea        0
Price            7610
dtype: int64

In [15]:
data.dropna(inplace=True)

## Data Preparation
---

In [16]:
## Data Preparation
data = pd.get_dummies(data,drop_first=True).astype('int')
X_train,X_test,y_train,y_test = train_test_split(data.drop(["Price"],axis=1),data["Price"],test_size=0.3,random_state=2)

#### Linear Regression
---

In [17]:
lin_model = LinearRegression()
lin_model.fit(X_train,y_train)

print(f"Linear Regression for Train R²: {lin_model.score(X_train,y_train):.10f}")

print(f"Linear Regression for Test R²: {lin_model.score(X_test,y_test):.10f}")

Linear Regression for Train R²: 0.6825862950
Linear Regression for Test R²: 0.1341316859


### Ridge Regression
---

In [18]:
ridge =Ridge()
ridge_parameters = {"alpha": list(np.linspace(4.1,4.2,50))}
ridge_reg= GridSearchCV(ridge,ridge_parameters,scoring="neg_mean_squared_error",cv=5)
ridge_reg.fit(X_train,y_train)

In [19]:
#pd.DataFrame(ridge_reg.cv_results_)[["params","rank_test_score"]]
ridge_reg.best_params_

{'alpha': 4.1}

In [23]:
ridge_model = Ridge(alpha=4.1)
ridge_model.fit(X_train,y_train)
ridge_model.score(X_train,y_train)
print(f"Ridge Regression for Train R²: {ridge_model.score(X_train,y_train):.10f}")
print(f"Ridge Regression for Test R²: {ridge_model.score(X_test,y_test):.10f}")

Ridge Regression for Train R²: 0.6772133390
Ridge Regression for Test R²: 0.6741324308


### Lasso Regression
---

In [None]:
lasso =Lasso()
lasso_parameters = {"alpha": [3,5,10]}
lasso_reg= GridSearchCV(lasso,lasso_parameters,scoring="neg_mean_squared_error",cv=5)
lasso_reg.fit(X_train,y_train)

lasso_reg.best_estimator_

In [24]:
lasso_model = Lasso()
lasso_model.fit(X_train,y_train)
lasso_model.score(X_train,y_train)

print(f"Lasso Regression for Train R²: {lasso_model.score(X_train,y_train):.10f}")
print(f"Lasso Regression for Test R²: {lasso_model.score(X_test,y_test):.10f}")

Lasso Regression for Train R²: 0.6825702746
Lasso Regression for Test R²: 0.1527488696


  model = cd_fast.enet_coordinate_descent(
