# Notebook 02: Regularization (Ridge & Lasso)

## The Art of Restraint

Overfitting is the enemy of generalization. Regularization tames it by penalizing complexity. Ridge shrinks coefficients smoothly. Lasso can zero them out entirely, performing automatic feature selection.

---

## Why Regularize?

### The Overfitting Problem

When a model has too many parameters relative to the data, it can memorize the training set instead of learning generalizable patterns.

### The Bias-Variance Tradeoff

Regularization introduces a controlled amount of **bias** to reduce **variance**.

## Ridge vs Lasso

- **Ridge (L2)**: Penalizes sum of squared coefficients. Shrinks all coefficients toward zero.
- **Lasso (L1)**: Penalizes sum of absolute coefficients. Can set coefficients to exactly zero.

We use cross-validation to pick the regularization strength (alpha).

## Setup and Imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import RidgeCV, LassoCV
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

import sys
from pathlib import Path
project_root = Path().resolve().parent if Path().resolve().name == 'notebooks' else Path().resolve()
sys.path.insert(0, str(project_root))

from src.utils import set_seed

sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)

set_seed(42)
print("✓ Imports successful!")

## Step 1: Load and Prepare Data

In [None]:
# Load diabetes dataset
data = load_diabetes(as_frame=True)
X = data.data
y = data.target

# Split into train/test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print(f"Training set: {X_train.shape}")
print(f"Test set: {X_test.shape}")

## Step 2: Fit RidgeCV with Cross-Validation

RidgeCV automatically finds the best alpha using cross-validation.

In [None]:
# === TODO (you code this) ===
# Pipeline: StandardScaler + RidgeCV over alphas logspace(1e-3..1e3)
# Hints:
#   - Create Pipeline with StandardScaler and RidgeCV
#   - Use alphas=np.logspace(-3, 3, 100) for RidgeCV
#   - Fit on X_train, y_train
#   - Make predictions and compute RMSE, R²
# Acceptance: Print best alpha, test RMSE, R2

## Step 3: Fit LassoCV with Cross-Validation

LassoCV automatically finds the best alpha and performs feature selection.

In [None]:
# === TODO (you code this) ===
# Pipeline: StandardScaler + LassoCV. Compare metrics with Ridge.
# Hints:
#   - Create Pipeline with StandardScaler and LassoCV
#   - Use alphas=np.logspace(-3, 1, 100) for LassoCV (smaller range)
#   - Fit and evaluate
#   - Create comparison table
# Acceptance: Table with RMSE, MAE, R2 for both; 2-sentence comparison

## Step 4: Visualize Coefficient Comparison

Compare how Ridge and Lasso treat coefficients differently.

In [None]:
# === TODO (you code this) ===
# Plot coefficient magnitudes side by side for Ridge vs Lasso.
# Hints:
#   - Extract coefficients from both models
#   - Create side-by-side bar plot
#   - Note which coefficients are zero in Lasso
#   - Save to images/02_ridge_lasso_coefficients.png
# Acceptance: Figure with clear legend; note which coefficients go to zero with Lasso

## Summary

In this notebook, we've learned about regularization:

- **Ridge** shrinks coefficients smoothly, handles multicollinearity
- **Lasso** can zero out coefficients, performs automatic feature selection
- Cross-validation helps us find the optimal regularization strength

**Next**: Notebook 03 will explore multicollinearity and PCA.