
# Bias–Variance Tradeoff and Regularization (with Formulas)

This notebook explains bias, variance, and regularization **with mathematical intuition**.



## Bias and Variance (Mathematical View)

Let the true function be $f(x)$ and our model prediction be $\hat{f}(x)$.

The expected squared error can be decomposed as:

$$
\mathbb{E}[(y - \hat{f}(x))^2] =
\underbrace{\text{Bias}^2}_{\text{Error from wrong assumptions}} +
\underbrace{\text{Variance}}_{\text{Sensitivity to data}} +
\underbrace{\sigma^2}_{\text{Irreducible noise}}
$$

### Bias
$$
\text{Bias} = \mathbb{E}[\hat{f}(x)] - f(x)
$$

### Variance
$$
\text{Variance} = \mathbb{E}[(\hat{f}(x) - \mathbb{E}[\hat{f}(x)])^2]
$$


In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

np.random.seed(0)
X = np.sort(np.random.rand(20, 1), axis=0)
y = np.sin(2 * np.pi * X).ravel() + np.random.normal(0, 0.2, 20)

degrees = [1, 4, 15]

plt.figure()
for d in degrees:
    model = make_pipeline(PolynomialFeatures(d), LinearRegression())
    model.fit(X, y)
    X_test = np.linspace(0, 1, 200).reshape(-1, 1)
    y_pred = model.predict(X_test)
    plt.plot(X_test, y_pred, label=f"Degree {d}")

plt.scatter(X, y)
plt.legend()
plt.title("Bias–Variance Tradeoff")
plt.show()



## Regularization

Regularization modifies the loss function:

### Ordinary Least Squares
$$
\mathcal{L}_{OLS} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
$$



### L2 Regularization (Ridge)

$$
\mathcal{L}_{Ridge} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^{p} w_j^2
$$

- Shrinks coefficients
- Reduces variance



### L1 Regularization (Lasso)

$$
\mathcal{L}_{Lasso} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^{p} |w_j|
$$

- Produces sparse models
- Performs feature selection



### Elastic Net (L1 + L2)

$$
\mathcal{L}_{EN} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
+ \lambda \left( \alpha \sum |w_j| + (1-\alpha) \sum w_j^2 \right)
$$

- Combines sparsity and stability



## Effect on Bias and Variance

| Method | Bias | Variance |
|------|------|---------|
| No regularization | Low | High |
| Ridge | Slight ↑ | ↓ |
| Lasso | ↑ | ↓↓ |
| Elastic Net | Balanced | Balanced |
