# L2 Regularization (Ridge Regression)

L2 regularization adds a penalty proportional to the **square of the weights** 
to the loss function.

It is mainly used to:

- Prevent overfitting
- Reduce model variance
- Stabilize weight estimates
- Handle multicollinearity

---

## 1. Problem Setup

Let:

$$
X \in \mathbb{R}^{n \times d}
$$

be the feature matrix, where:

- $n$ = number of samples  
- $d$ = number of features  

Let:

$$
y \in \mathbb{R}^{n}
$$

be the target vector.

We want to learn:

$$
w \in \mathbb{R}^{d}
$$

---

## 2. Linear Regression (Without Regularization)

The ordinary least squares objective:

$$
J(w) = \frac{1}{n} \| y - Xw \|_2^2
$$

This minimizes training error only.

Problem:

- Can overfit
- Large variance
- Unstable when features are correlated

---

## 3. L2 Regularized Objective (Ridge)

L2 adds a squared weight penalty:

$$
J(w) = \frac{1}{n} \| y - Xw \|_2^2 + \lambda \| w \|_2^2
$$

---

## 4. L2 Norm Definition

The L2 norm is:

$$
\| w \|_2^2 = \sum_{j=1}^{d} w_j^2
$$

So the objective becomes:

$$
J(w) = \frac{1}{n} \sum_{i=1}^{n} (y_i - x_i^T w)^2
+ \lambda \sum_{j=1}^{d} w_j^2
$$

---

## 5. Understanding Each Term

### (1) Data Fitting Term

$$
\frac{1}{n} \| y - Xw \|_2^2
$$

Ensures the model fits the data.

---

### (2) Regularization Term

$$
\lambda \| w \|_2^2
$$

Penalizes large weights.

Effect:

- Shrinks coefficients
- Reduces variance
- Improves generalization

---

## 6. Closed Form Solution (Very Important)

Unlike L1, L2 has a closed-form solution:

$$
w = (X^T X + \lambda I)^{-1} X^T y
$$

Where:

- $I$ = identity matrix  
- $\lambda I$ ensures invertibility  

This makes Ridge numerically stable.

---

## 7. Geometric Interpretation

Constraint form:

$$
\min_w \| y - Xw \|_2^2
\quad \text{subject to} \quad
\| w \|_2^2 \le t
$$

The L2 constraint region forms a **circle (or sphere in higher dimensions)**.

Because the boundary is smooth:

- Weights shrink
- But rarely become exactly zero

---

## 8. Effect of Increasing Î»

As $\lambda$ increases:

- All weights shrink smoothly
- No exact zeros
- Model becomes more biased
- Variance decreases

This illustrates the **biasâ€“variance tradeoff**.

---

## 9. L1 vs L2 Comparison

| Property | L1 | L2 |
|-----------|------|------|
| Penalty | $\sum |w_j|$ | $\sum w_j^2$ |
| Sparsity | Yes | No |
| Feature Selection | Yes | No |
| Closed Form | No | Yes |
| Geometry | Diamond | Circle |

---

## 10. When to Use L2

Use L2 when:

- Many features are useful
- You don't want feature selection
- Features are correlated
- You want stable, smooth shrinkage


In [None]:
# =============================
# ðŸ“Œ CELL 1: Import Libraries
# =============================

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error

print("Libraries Imported Successfully")

In [None]:
# =============================
# ðŸ“Œ CELL 2: Load Dataset
# =============================

diabetes = load_diabetes()

X = diabetes.data
y = diabetes.target
feature_names = diabetes.feature_names

df = pd.DataFrame(X, columns=feature_names)
df["target"] = y

print("Dataset Shape:", df.shape)

df.head()


In [None]:
# =============================
# ðŸ“Œ CELL 3: Summary Statistics
# =============================

df.describe()


In [None]:
# =============================
# ðŸ“Œ CELL 4: Target Distribution
# =============================

plt.figure()
plt.hist(df["target"], bins=20)
plt.title("Distribution of Target Variable")
plt.xlabel("Target Value")
plt.ylabel("Frequency")
plt.show()


In [None]:
# =============================
# ðŸ“Œ CELL 5: Correlation with Target
# =============================

correlation = df.corr()["target"].sort_values(ascending=False)

correlation


In [None]:
# =============================
# ðŸ“Œ CELL 6: Train-Test Split
# =============================

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print("Training Shape:", X_train.shape)
print("Testing Shape:", X_test.shape)
