# Lesson 10 - Bias/Variance, Regularization, Model Selection


## Objectives
- Visualize bias/variance tradeoffs with polynomial regression.
- Implement L2 and L1 regularization effects.
- Use a validation set for model selection.


## From the notes

**Bias/variance**
- Expected error = bias$^2$ + variance + noise.

**Regularization**
- L2: $J(\theta) = \frac{1}{2m}\sum (h_\theta(x^{(i)})-y^{(i)})^2 + \frac{\lambda}{2} \sum_{j=1}^n \theta_j^2$.
- L1 adds $\lambda \sum |\theta_j|$.

_TODO: Validate formulas in the CS229 main notes PDF._


## Intuition
High-capacity models reduce bias but increase variance. Regularization shrinks parameters to control variance and improve generalization.


## Data
We generate noisy samples from a sine curve and fit polynomials of varying degree.


In [None]:
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

def make_data(n=40):
    x = np.linspace(0, 1, n)
    y = np.sin(2 * np.pi * x) + 0.2 * np.random.randn(n)
    return x, y

x, y = make_data()

def poly_features(x, degree):
    return np.vstack([x**d for d in range(degree+1)]).T

def ridge_fit(X, y, lam=0.0):
    return np.linalg.pinv(X.T @ X + lam * np.eye(X.shape[1])) @ X.T @ y

degrees = [1, 3, 9]
preds = {}
for d in degrees:
    X = poly_features(x, d)
    theta = ridge_fit(X, y, lam=1e-2)
    preds[d] = X @ theta


## Experiments


In [None]:
# Validation split
idx = np.random.permutation(len(x))
split = int(0.7 * len(x))
train_idx, val_idx = idx[:split], idx[split:]
x_train, y_train = x[train_idx], y[train_idx]
x_val, y_val = x[val_idx], y[val_idx]

def val_mse(deg, lam):
    Xtr = poly_features(x_train, deg)
    Xval = poly_features(x_val, deg)
    theta = ridge_fit(Xtr, y_train, lam)
    return np.mean((Xval @ theta - y_val) ** 2)

errors = {(d, lam): val_mse(d, lam) for d in [1,3,5,9] for lam in [0.0, 0.01, 0.1]}
min(errors, key=errors.get), min(errors.values())


## Visualizations


In [None]:
plt.figure(figsize=(6,4))
plt.scatter(x, y, alpha=0.6, label="data")
for d in degrees:
    plt.plot(x, preds[d], label=f"degree {d}")
plt.title("Bias/variance via polynomial degree")
plt.xlabel("x")
plt.ylabel("y")
plt.legend()
plt.show()

# L1 vs L2 effect illustration
lam_vals = np.linspace(0, 1, 50)
l2_norms = []
for lam in lam_vals:
    theta = ridge_fit(poly_features(x, 5), y, lam)
    l2_norms.append(np.linalg.norm(theta[1:]))
plt.figure(figsize=(6,4))
plt.plot(lam_vals, l2_norms)
plt.title("L2 shrinkage vs lambda")
plt.xlabel("lambda")
plt.ylabel("||theta||")
plt.show()


## Takeaways
- Bias/variance tradeoffs are visible by changing model complexity.
- Regularization controls variance by shrinking parameter magnitude.


## Explain it in an interview
- Explain why regularization can improve test performance.
- Describe how you would select hyperparameters in practice.


## Exercises
- Implement L1-regularized regression via gradient descent.
- Plot training vs validation error for multiple degrees.
- Explain what happens as lambda → ∞.
