# Lesson 10: Regularization (L2/L1)## Objectives- Implement ridge regression and lasso (coordinate descent).- Visualize coefficient shrinkage as \(\lambda\) varies.- Interpret L1 vs L2 geometry.

## From the notesRegularized objective:\[J(	heta) = rac{1}{2m}\|X	heta - y\|^2 + \lambda \|	heta\|_p.\]L2 uses \(p=2\), L1 uses \(p=1\).

## IntuitionL2 shrinks coefficients smoothly; L1 encourages sparsity by creating corners in the constraint set.

## DataWe simulate a linear dataset with correlated features to observe shrinkage.

In [None]:
import numpy as npimport matplotlib.pyplot as pltnp.random.seed(42)

In [None]:
# Synthetic datam = 100X_raw = np.random.randn(m, 3)true_theta = np.array([2.0, -1.5, 0.5])y = X_raw @ true_theta + np.random.normal(0, 0.5, size=m)X = np.c_[np.ones(m), X_raw]

## Implementation: ridge regression

In [None]:
def ridge_theta(X, y, lam):    n = X.shape[1]    reg = lam * np.eye(n)    reg[0,0] = 0    return np.linalg.pinv(X.T @ X + reg) @ X.T @ y

## Implementation: lasso (coordinate descent)

In [None]:
def soft_threshold(rho, lam):    if rho < -lam:        return rho + lam    if rho > lam:        return rho - lam    return 0.0def lasso_cd(X, y, lam, num_iters=100):    m, n = X.shape    theta = np.zeros(n)    for _ in range(num_iters):        for j in range(n):            residual = y - X @ theta + theta[j] * X[:, j]            rho = (X[:, j] @ residual) / m            if j == 0:                theta[j] = rho            else:                theta[j] = soft_threshold(rho, lam)    return theta

## Experiments

In [None]:
lambdas = [0.0, 0.1, 0.5, 1.0]coefs_ridge = []coefs_lasso = []for lam in lambdas:    coefs_ridge.append(ridge_theta(X, y, lam))    coefs_lasso.append(lasso_cd(X, y, lam))

## Visualizations

In [None]:
plt.figure(figsize=(6,4))for i in range(X.shape[1]):    plt.plot(lambdas, [c[i] for c in coefs_ridge], label=f"theta{i}")plt.xlabel("lambda")plt.ylabel("ridge coefficients")plt.title("Ridge shrinkage")plt.legend()plt.show()plt.figure(figsize=(6,4))for i in range(X.shape[1]):    plt.plot(lambdas, [c[i] for c in coefs_lasso], label=f"theta{i}")plt.xlabel("lambda")plt.ylabel("lasso coefficients")plt.title("Lasso shrinkage")plt.legend()plt.show()

## Takeaways- L2 regularization reduces variance by shrinking weights.- L1 regularization can produce sparse models.

## Explain it in an interview- Compare the geometry of L1 vs L2 constraints.- Discuss when sparsity is beneficial.

## Exercises1. Plot contours of the least-squares loss with L1/L2 balls.2. Standardize features and re-run lasso.3. Implement elastic net and compare coefficients.