# Week 5: Linear Models - OLS, Ridge, Lasso, ElasticNet

## üéØ Learning Objectives

By the end of this week, you will understand:
- **Ordinary Least Squares (OLS)**: The foundation of regression
- **Ridge Regression (L2)**: Handling multicollinearity
- **Lasso Regression (L1)**: Feature selection through regularization
- **ElasticNet**: Combining L1 and L2 penalties
- **Finance Applications**: Factor models, risk attribution

---

## Why Linear Models in Finance?

Linear models are the **workhorses** of quantitative finance:
- Simple, interpretable, fast
- Foundation for factor models (Fama-French)
- Risk attribution and decomposition
- Baseline for comparing complex models

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import cross_val_score
import warnings
warnings.filterwarnings('ignore')

np.random.seed(42)
print("‚úÖ Libraries loaded!")
print("üìö Week 5: Linear Models Theory")

---

## Part 1: Ordinary Least Squares (OLS)

### The Problem

Given data $(X, y)$, find weights $\beta$ that minimize:

$$\min_{\beta} ||y - X\beta||_2^2 = \min_{\beta} \sum_{i=1}^{n} (y_i - x_i^T\beta)^2$$

### Closed-Form Solution

$$\hat{\beta} = (X^TX)^{-1}X^Ty$$

### ü§î Simple Explanation

OLS finds the line (or hyperplane) that minimizes the sum of squared errors. Think of it as finding the "best fit" line through your data points.

### Finance Application: Factor Model

$$R_i = \alpha + \beta_1 \cdot MKT + \beta_2 \cdot SMB + \beta_3 \cdot HML + \epsilon$$

In [None]:
# Generate synthetic factor returns
n_days = 252 * 5  # 5 years

# Market factor (beta to market)
mkt = np.random.normal(0.0004, 0.01, n_days)  # Market returns

# Size factor (SMB)
smb = np.random.normal(0.0001, 0.005, n_days)

# Value factor (HML)
hml = np.random.normal(0.0001, 0.005, n_days)

# Stock return = alpha + factor exposures + noise
true_alpha = 0.0002
true_betas = [1.2, 0.3, -0.2]  # MKT, SMB, HML exposures

stock_return = (true_alpha + 
                true_betas[0] * mkt + 
                true_betas[1] * smb + 
                true_betas[2] * hml + 
                np.random.normal(0, 0.008, n_days))

# Fit OLS
X = np.column_stack([mkt, smb, hml])
model = LinearRegression()
model.fit(X, stock_return)

print("OLS Factor Model Results")
print("="*50)
print(f"Alpha (daily): {model.intercept_:.6f} (true: {true_alpha})")
print(f"Alpha (annual): {model.intercept_ * 252:.2%}")
print(f"\nBeta Exposures:")
print(f"  MKT: {model.coef_[0]:.3f} (true: {true_betas[0]})")
print(f"  SMB: {model.coef_[1]:.3f} (true: {true_betas[1]})")
print(f"  HML: {model.coef_[2]:.3f} (true: {true_betas[2]})")

---

## Part 2: Ridge Regression (L2 Regularization)

### The Problem with OLS

When features are correlated (multicollinearity), $(X^TX)^{-1}$ becomes unstable ‚Üí large coefficient variance.

### Ridge Solution

Add L2 penalty to shrink coefficients:

$$\min_{\beta} ||y - X\beta||_2^2 + \lambda ||\beta||_2^2$$

Closed-form: $\hat{\beta}_{ridge} = (X^TX + \lambda I)^{-1}X^Ty$

### ü§î Simple Explanation

Ridge adds a "penalty" for large coefficients. It's like saying "I want a good fit, but I also want small, stable coefficients." This prevents overfitting when features are correlated.

### Finance Application

- Many financial factors are correlated (momentum vs. reversal)
- Ridge stabilizes factor loadings
- More robust out-of-sample

In [None]:
# Create correlated features (simulating multicollinearity)
n = 500
X1 = np.random.randn(n)
X2 = X1 + np.random.randn(n) * 0.1  # Highly correlated with X1
X3 = np.random.randn(n)

true_beta = [1, 1, 0.5]
y = true_beta[0]*X1 + true_beta[1]*X2 + true_beta[2]*X3 + np.random.randn(n) * 0.5

X = np.column_stack([X1, X2, X3])
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Compare OLS vs Ridge
ols = LinearRegression().fit(X_scaled, y)
ridge = Ridge(alpha=1.0).fit(X_scaled, y)

print("Multicollinearity Example")
print("="*50)
print(f"Correlation X1-X2: {np.corrcoef(X1, X2)[0,1]:.3f}")
print(f"\nCoefficients (true: {true_beta}):")
print(f"  OLS:   [{ols.coef_[0]:.2f}, {ols.coef_[1]:.2f}, {ols.coef_[2]:.2f}]")
print(f"  Ridge: [{ridge.coef_[0]:.2f}, {ridge.coef_[1]:.2f}, {ridge.coef_[2]:.2f}]")
print(f"\n‚ö†Ô∏è OLS has unstable coefficients due to multicollinearity!")

---

## Part 3: Lasso Regression (L1 Regularization)

### The Lasso Objective

$$\min_{\beta} ||y - X\beta||_2^2 + \lambda ||\beta||_1$$

### Key Property: Sparsity

Lasso drives coefficients **exactly to zero** ‚Üí automatic feature selection!

### ü§î Simple Explanation

Lasso is like Ridge, but instead of shrinking coefficients, it eliminates them entirely. If a feature isn't important, Lasso sets its coefficient to zero.

### Finance Application

- Select which factors actually matter
- Sparse portfolios (fewer positions)
- Interpretable models

In [None]:
# Create data with many features, only some relevant
n, p = 200, 50  # 200 samples, 50 features
X = np.random.randn(n, p)

# Only first 5 features matter
true_beta = np.zeros(p)
true_beta[:5] = [2, -1.5, 1, -0.5, 0.8]

y = X @ true_beta + np.random.randn(n) * 0.5

# Fit Lasso
lasso = Lasso(alpha=0.1)
lasso.fit(X, y)

print("Lasso Feature Selection")
print("="*50)
print(f"True non-zero features: 5")
print(f"Lasso non-zero features: {np.sum(lasso.coef_ != 0)}")
print(f"\nTrue coefficients (first 5): {true_beta[:5]}")
print(f"Lasso coefficients (first 5): {np.round(lasso.coef_[:5], 2)}")
print(f"\n‚úÖ Lasso correctly identified the important features!")

---

## Part 4: ElasticNet (L1 + L2)

### Best of Both Worlds

$$\min_{\beta} ||y - X\beta||_2^2 + \lambda_1 ||\beta||_1 + \lambda_2 ||\beta||_2^2$$

### When to Use

- Correlated features (Ridge helps)
- Want sparsity (Lasso helps)
- Many features with groups of correlated ones

### Finance Application

Factor models with correlated factors where you want to select the most important ones.

In [None]:
# Compare all methods
from sklearn.model_selection import cross_val_score

models = {
    'OLS': LinearRegression(),
    'Ridge': Ridge(alpha=1.0),
    'Lasso': Lasso(alpha=0.1),
    'ElasticNet': ElasticNet(alpha=0.1, l1_ratio=0.5)
}

print("Model Comparison (Cross-Validation R¬≤)")
print("="*50)
for name, model in models.items():
    scores = cross_val_score(model, X, y, cv=5, scoring='r2')
    print(f"{name:12} R¬≤ = {scores.mean():.3f} ¬± {scores.std():.3f}")

---

## Interview Questions

### Conceptual
1. When would you choose Ridge over Lasso?
2. What happens to Lasso coefficients as Œª increases?
3. How do you interpret a negative beta in a factor model?

### Technical
1. Derive the Ridge regression closed-form solution
2. Why doesn't Lasso have a closed-form solution?
3. How do you select the regularization parameter?

### Finance-Specific
1. Your factor model has 50 factors. How do you reduce it?
2. How would you test if alpha is statistically significant?
3. What's the difference between realized and predicted beta?

---

## Key Takeaways

| Model | Penalty | Sparsity | When to Use |
|-------|---------|----------|-------------|
| OLS | None | No | Simple problems, no collinearity |
| Ridge | L2 | No | Multicollinearity, all features matter |
| Lasso | L1 | Yes | Feature selection needed |
| ElasticNet | L1+L2 | Yes | Correlated features + sparsity |