# Week 6: Factor Models

---

## Table of Contents
1. Capital Asset Pricing Model (CAPM)
2. Fama-French Three-Factor Model
3. Fama-French Five-Factor Model
4. Arbitrage Pricing Theory (APT)
5. Factor Investing in Practice

---

In [1]:
# Standard imports and data loading
import numpy as np
import pandas as pd
import yfinance as yf
from datetime import datetime, timedelta

# Standard 5 equities for analysis
tickers = ['AAPL', 'MSFT', 'GOOGL', 'JPM', 'GS']

# Fetch 5 years of data
end_date = datetime.now()
start_date = end_date - timedelta(days=5*365)

print("üì• Downloading market data...")
data = yf.download(tickers, start=start_date, end=end_date, progress=False, auto_adjust=True)
prices = data['Close'].dropna()
returns = prices.pct_change().dropna()
print(f"‚úÖ Loaded {len(prices)} days of data for {len(tickers)} tickers")
print(f"üìÖ Date range: {prices.index[0].strftime('%Y-%m-%d')} to {prices.index[-1].strftime('%Y-%m-%d')}")
print(prices.tail())

üì• Downloading market data...
‚úÖ Loaded 1255 days of data for 5 tickers
üìÖ Date range: 2021-01-25 to 2026-01-22
Ticker            AAPL       GOOGL          GS         JPM        MSFT
Date                                                                  
2026-01-15  258.209991  332.779999  975.859985  309.260010  456.660004
2026-01-16  255.529999  330.000000  962.000000  312.470001  459.859985
2026-01-20  246.699997  322.000000  943.369995  302.739990  454.519989
2026-01-21  247.649994  328.380005  953.010010  302.040009  444.109985
2026-01-22  249.759995  331.829987  967.833984  306.290009  449.984985


## 1. Capital Asset Pricing Model (CAPM)

### The Big Idea

Sharpe-Lintner CAPM (1964): A stock's expected return depends on its **market risk** (beta), not total risk.

### The Formula

$$E[r_i] = r_f + \beta_i (E[r_m] - r_f)$$

Or equivalently, the **excess return** form:

$$r_i - r_f = \alpha_i + \beta_i (r_m - r_f) + \epsilon_i$$

Where:
- $r_i$ = return on asset $i$
- $r_f$ = risk-free rate
- $r_m$ = market return
- $\beta_i$ = sensitivity to market (systematic risk)
- $\alpha_i$ = abnormal return (should be zero in efficient markets)
- $(E[r_m] - r_f)$ = market risk premium

### Beta

$$\beta_i = \frac{Cov(r_i, r_m)}{Var(r_m)}$$

**Interpretation**:
- $\beta = 1$: Moves with market
- $\beta > 1$: More volatile than market (aggressive)
- $\beta < 1$: Less volatile than market (defensive)
- $\beta < 0$: Moves opposite to market (rare)

In [8]:
import numpy as np
import pandas as pd
from scipy import stats
from sklearn.linear_model import LinearRegression

np.random.seed(42)

# Simulate market and stock returns
n_days = 252 * 3  # 3 years
rf = 0.02 / 252   # Daily risk-free rate (2% annual)

# Market returns
market_excess = np.random.normal(0.0003, 0.012, n_days)

# Stock with known beta and alpha
true_beta = 1.3
true_alpha = 0.0001  # Daily alpha of 0.01%
noise = np.random.normal(0, 0.008, n_days)

stock_excess = true_alpha + true_beta * market_excess + noise

# Estimate CAPM using regression
model = LinearRegression()
model.fit(market_excess.reshape(-1, 1), stock_excess)

est_alpha = model.intercept_
est_beta = model.coef_[0]

print("CAPM Regression: Stock Excess Return vs Market Excess Return")
print("="*60)
print(f"\nTrue parameters:     Œ± = {true_alpha*252:.2%}/year, Œ≤ = {true_beta:.2f}")
print(f"Estimated parameters: Œ± = {est_alpha*252:.2%}/year, Œ≤ = {est_beta:.2f}")

# Calculate R-squared
predicted = est_alpha + est_beta * market_excess
ss_res = np.sum((stock_excess - predicted)**2)
ss_tot = np.sum((stock_excess - stock_excess.mean())**2)
r_squared = 1 - ss_res / ss_tot

print(f"\nR¬≤ = {r_squared:.4f}")
print(f"‚Üí {r_squared*100:.1f}% of stock variance explained by market")

CAPM Regression: Stock Excess Return vs Market Excess Return

True parameters:     Œ± = 2.52%/year, Œ≤ = 1.30
Estimated parameters: Œ± = 26.03%/year, Œ≤ = 1.30

R¬≤ = 0.7936
‚Üí 79.4% of stock variance explained by market


### Alpha and Performance Attribution

**Alpha ($\alpha$)**: Return not explained by market exposure

$$\alpha = r_i - [r_f + \beta_i(r_m - r_f)]$$

- $\alpha > 0$: Manager/stock outperformed (skill or luck?)
- $\alpha < 0$: Underperformed
- $\alpha = 0$: Returned exactly what CAPM predicts

**Testing significance**: Is alpha statistically different from zero?

In [9]:
import statsmodels.api as sm

# Proper statistical test for alpha
X = sm.add_constant(market_excess)  # Add intercept
capm_model = sm.OLS(stock_excess, X).fit()

print("CAPM Statistical Analysis")
print("="*60)
print(f"\n{'Parameter':<10} | {'Estimate':>10} | {'Std Error':>10} | {'t-stat':>8} | {'p-value':>8}")
print("-"*60)
print(f"{'Alpha':<10} | {capm_model.params[0]*252:>10.4%} | {capm_model.bse[0]*252:>10.4%} | {capm_model.tvalues[0]:>8.2f} | {capm_model.pvalues[0]:>8.4f}")
print(f"{'Beta':<10} | {capm_model.params[1]:>10.2f} | {capm_model.bse[1]:>10.4f} | {capm_model.tvalues[1]:>8.2f} | {capm_model.pvalues[1]:>8.4f}")

if capm_model.pvalues[0] < 0.05:
    print(f"\n‚úì Alpha is statistically significant at 5% level!")
else:
    print(f"\n‚úó Alpha is NOT statistically significant")

CAPM Statistical Analysis

Parameter  |   Estimate |  Std Error |   t-stat |  p-value
------------------------------------------------------------
Alpha      |   26.0271% |    7.2243% |     3.60 |   0.0003
Beta       |       1.30 |     0.0242 |    53.85 |   0.0000

‚úì Alpha is statistically significant at 5% level!


---

## 2. Fama-French Three-Factor Model

### Motivation

CAPM leaves returns unexplained. Fama & French (1993) found two additional factors:

1. **SMB** (Small Minus Big): Small stocks outperform large stocks
2. **HML** (High Minus Low): Value stocks outperform growth stocks

### The Model

$$r_i - r_f = \alpha_i + \beta_{i,M}(r_m - r_f) + \beta_{i,SMB} \cdot SMB + \beta_{i,HML} \cdot HML + \epsilon_i$$

### Factor Definitions

**SMB (Size)**:
$$SMB = \frac{1}{3}(\text{Small Value} + \text{Small Neutral} + \text{Small Growth}) - \frac{1}{3}(\text{Big Value} + \text{Big Neutral} + \text{Big Growth})$$

**HML (Value)**:
$$HML = \frac{1}{2}(\text{Small Value} + \text{Big Value}) - \frac{1}{2}(\text{Small Growth} + \text{Big Growth})$$

In [10]:
# Simulate Fama-French factors
np.random.seed(42)

# Factor returns (realistic correlations)
mkt_rf = np.random.normal(0.0003, 0.012, n_days)
smb = np.random.normal(0.0001, 0.006, n_days)     # Size premium
hml = np.random.normal(0.0001, 0.005, n_days)     # Value premium

# Stock with exposure to all factors
true_betas = {'MKT': 1.2, 'SMB': 0.4, 'HML': -0.3}  # Growth stock (negative HML)
true_alpha_ff = 0.00005  # Small daily alpha

stock_return = (true_alpha_ff + 
                true_betas['MKT'] * mkt_rf + 
                true_betas['SMB'] * smb + 
                true_betas['HML'] * hml + 
                np.random.normal(0, 0.006, n_days))

# Create DataFrame for regression
ff_data = pd.DataFrame({
    'stock_excess': stock_return,
    'MKT_RF': mkt_rf,
    'SMB': smb,
    'HML': hml
})

# Run FF3 regression
X_ff = sm.add_constant(ff_data[['MKT_RF', 'SMB', 'HML']])
ff3_model = sm.OLS(ff_data['stock_excess'], X_ff).fit()

print("Fama-French Three-Factor Model")
print("="*65)
print(f"\n{'Factor':<10} | {'True Œ≤':>8} | {'Est. Œ≤':>8} | {'t-stat':>8} | {'Signif.':>8}")
print("-"*65)
print(f"{'Alpha':<10} | {true_alpha_ff*252:>7.2%} | {ff3_model.params[0]*252:>7.2%} | {ff3_model.tvalues[0]:>8.2f} | {'Yes' if ff3_model.pvalues[0]<0.05 else 'No':>8}")
for factor in ['MKT_RF', 'SMB', 'HML']:
    true_val = true_betas.get(factor.replace('_RF',''), true_betas.get('MKT', 0))
    idx = list(X_ff.columns).index(factor)
    print(f"{factor:<10} | {true_val:>8.2f} | {ff3_model.params[idx]:>8.2f} | {ff3_model.tvalues[idx]:>8.2f} | {'Yes' if ff3_model.pvalues[idx]<0.05 else 'No':>8}")

print(f"\nR¬≤ = {ff3_model.rsquared:.4f} (vs CAPM: ~{r_squared:.4f})")

Fama-French Three-Factor Model

Factor     |   True Œ≤ |   Est. Œ≤ |   t-stat |  Signif.
-----------------------------------------------------------------
Alpha      |   1.26% |   4.19% |     0.76 |       No
MKT_RF     |     1.20 |     1.19 |    65.22 |      Yes
SMB        |     0.40 |     0.37 |    10.20 |      Yes
HML        |    -0.30 |    -0.31 |    -7.04 |      Yes

R¬≤ = 0.8543 (vs CAPM: ~0.7936)


  print(f"{'Alpha':<10} | {true_alpha_ff*252:>7.2%} | {ff3_model.params[0]*252:>7.2%} | {ff3_model.tvalues[0]:>8.2f} | {'Yes' if ff3_model.pvalues[0]<0.05 else 'No':>8}")
  print(f"{'Alpha':<10} | {true_alpha_ff*252:>7.2%} | {ff3_model.params[0]*252:>7.2%} | {ff3_model.tvalues[0]:>8.2f} | {'Yes' if ff3_model.pvalues[0]<0.05 else 'No':>8}")
  print(f"{'Alpha':<10} | {true_alpha_ff*252:>7.2%} | {ff3_model.params[0]*252:>7.2%} | {ff3_model.tvalues[0]:>8.2f} | {'Yes' if ff3_model.pvalues[0]<0.05 else 'No':>8}")
  print(f"{factor:<10} | {true_val:>8.2f} | {ff3_model.params[idx]:>8.2f} | {ff3_model.tvalues[idx]:>8.2f} | {'Yes' if ff3_model.pvalues[idx]<0.05 else 'No':>8}")


### Interpreting Factor Loadings

| Factor | Positive Œ≤ | Negative Œ≤ |
|--------|------------|------------|
| SMB | Small cap tilt | Large cap tilt |
| HML | Value tilt | Growth tilt |

---

## 3. Fama-French Five-Factor Model

### Additional Factors (2015)

Fama & French added two more factors:

3. **RMW** (Robust Minus Weak): Profitability premium
4. **CMA** (Conservative Minus Aggressive): Investment premium

### The Model

$$r_i - r_f = \alpha + \beta_M(r_m - r_f) + \beta_S \cdot SMB + \beta_H \cdot HML + \beta_R \cdot RMW + \beta_C \cdot CMA + \epsilon$$

### Factor Interpretations

| Factor | Long | Short | Intuition |
|--------|------|-------|----------|
| MKT | Market | Risk-free | Market risk premium |
| SMB | Small caps | Large caps | Size effect |
| HML | High B/M | Low B/M | Value effect |
| RMW | High profit | Low profit | Quality premium |
| CMA | Low investment | High investment | Conservative firms outperform |

In [11]:
# Add profitability and investment factors
rmw = np.random.normal(0.00008, 0.004, n_days)  # Profitability
cma = np.random.normal(0.00005, 0.003, n_days)  # Investment

# Stock with all 5 factor exposures
true_betas_5f = {'MKT': 1.1, 'SMB': 0.3, 'HML': 0.2, 'RMW': 0.4, 'CMA': 0.1}
true_alpha_5f = 0.00003

stock_return_5f = (true_alpha_5f +
                   true_betas_5f['MKT'] * mkt_rf +
                   true_betas_5f['SMB'] * smb +
                   true_betas_5f['HML'] * hml +
                   true_betas_5f['RMW'] * rmw +
                   true_betas_5f['CMA'] * cma +
                   np.random.normal(0, 0.005, n_days))

# Five-factor regression
ff5_data = pd.DataFrame({
    'stock': stock_return_5f,
    'MKT': mkt_rf, 'SMB': smb, 'HML': hml, 'RMW': rmw, 'CMA': cma
})

X_ff5 = sm.add_constant(ff5_data[['MKT', 'SMB', 'HML', 'RMW', 'CMA']])
ff5_model = sm.OLS(ff5_data['stock'], X_ff5).fit()

print("Fama-French Five-Factor Model")
print("="*50)
print(f"\n{'Factor':<10} | {'True Œ≤':>8} | {'Est. Œ≤':>8}")
print("-"*35)
print(f"{'Alpha(ann)':<10} | {true_alpha_5f*252:>7.2%} | {ff5_model.params[0]*252:>7.2%}")
for i, factor in enumerate(['MKT', 'SMB', 'HML', 'RMW', 'CMA']):
    print(f"{factor:<10} | {true_betas_5f[factor]:>8.2f} | {ff5_model.params[i+1]:>8.2f}")

print(f"\nR¬≤ = {ff5_model.rsquared:.4f}")

Fama-French Five-Factor Model

Factor     |   True Œ≤ |   Est. Œ≤
-----------------------------------
Alpha(ann) |   0.76% |   2.56%
MKT        |     1.10 |     1.10
SMB        |     0.30 |     0.31
HML        |     0.20 |     0.19
RMW        |     0.40 |     0.42
CMA        |     0.10 |     0.19

R¬≤ = 0.8784


  print(f"{'Alpha(ann)':<10} | {true_alpha_5f*252:>7.2%} | {ff5_model.params[0]*252:>7.2%}")
  print(f"{factor:<10} | {true_betas_5f[factor]:>8.2f} | {ff5_model.params[i+1]:>8.2f}")


---

## 4. Arbitrage Pricing Theory (APT)

### Overview

Ross (1976): Returns are driven by multiple **systematic factors**, not just market.

### The Model

$$r_i = E[r_i] + \beta_{i,1}F_1 + \beta_{i,2}F_2 + ... + \beta_{i,k}F_k + \epsilon_i$$

Where:
- $F_j$ = unexpected change in factor $j$
- $\beta_{i,j}$ = sensitivity to factor $j$

### Key Difference from CAPM

| CAPM | APT |
|------|-----|
| One factor (market) | Multiple factors |
| Equilibrium model | No-arbitrage model |
| Factors specified | Factors discovered empirically |

### Common Macroeconomic Factors

1. **GDP growth** surprise
2. **Inflation** surprise
3. **Interest rate** changes
4. **Credit spreads**
5. **Oil prices**

In [12]:
from sklearn.decomposition import PCA

# Simulate returns for multiple stocks
np.random.seed(42)
n_stocks = 20
n_days = 252

# Hidden factors (unknown to us)
factor1 = np.random.normal(0, 0.01, n_days)  # e.g., economic growth
factor2 = np.random.normal(0, 0.008, n_days)  # e.g., interest rates

# Generate stock returns with random exposures to factors
returns_matrix = np.zeros((n_days, n_stocks))
true_loadings_f1 = np.random.uniform(0.5, 1.5, n_stocks)
true_loadings_f2 = np.random.uniform(-0.5, 0.5, n_stocks)

for i in range(n_stocks):
    returns_matrix[:, i] = (0.0002 +  # Common mean
                            true_loadings_f1[i] * factor1 +
                            true_loadings_f2[i] * factor2 +
                            np.random.normal(0, 0.005, n_days))  # Idiosyncratic

# Use PCA to discover factors
pca = PCA(n_components=5)
pca.fit(returns_matrix)

print("APT: Factor Discovery with PCA")
print("="*50)
print(f"\nAnalyzing {n_stocks} stocks over {n_days} days")
print(f"\nVariance explained by each principal component:")
for i, var in enumerate(pca.explained_variance_ratio_[:5]):
    bars = "‚ñà" * int(var * 50)
    print(f"  PC{i+1}: {var:>6.1%} {bars}")

print(f"\nTotal variance explained by 2 factors: {sum(pca.explained_variance_ratio_[:2]):.1%}")
print("\n‚úì PCA reveals the number of hidden factors driving returns")

APT: Factor Discovery with PCA

Analyzing 20 stocks over 252 days

Variance explained by each principal component:
  PC1:  77.5% ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
  PC2:   4.2% ‚ñà‚ñà
  PC3:   1.6% 
  PC4:   1.5% 
  PC5:   1.4% 

Total variance explained by 2 factors: 81.7%

‚úì PCA reveals the number of hidden factors driving returns


---

## 5. Factor Investing in Practice

### Popular Factors ("Factor Zoo")

| Factor | Description | Academic Support |
|--------|-------------|------------------|
| Value | Low price vs fundamentals | Strong (Fama-French) |
| Size | Small caps | Moderate (weakening) |
| Momentum | Past winners continue | Strong (Jegadeesh-Titman) |
| Quality | High profitability | Strong |
| Low Volatility | Lower risk stocks | Moderate |
| Dividend Yield | High dividends | Weak |

### Building a Factor Portfolio

**Long-Short**: Long top decile, short bottom decile
- Pure factor exposure
- Market neutral (Œ≤ ‚âà 0)

**Long-Only**: Tilt toward factor
- More practical for most investors
- Still has market exposure

In [13]:
# Momentum factor construction example
np.random.seed(42)

# Simulate 50 stocks with 12-month history
n_stocks = 50
n_months = 12

# Generate random returns (some stocks have momentum)
monthly_returns = np.random.normal(0.01, 0.05, (n_months, n_stocks))

# Add momentum effect: past returns persist
momentum_signal = monthly_returns[:-1].sum(axis=0)  # 11-month return (skip most recent)

# Sort stocks by momentum
ranking = np.argsort(momentum_signal)
bottom_quintile = ranking[:10]  # Losers
top_quintile = ranking[-10:]    # Winners

# Construct long-short portfolio
# Next month return (for backtesting)
next_month = np.random.normal(0.01, 0.05, n_stocks)

# Add momentum effect to next month
next_month += 0.3 * (momentum_signal - momentum_signal.mean()) / momentum_signal.std() * 0.02

long_return = next_month[top_quintile].mean()
short_return = next_month[bottom_quintile].mean()
momentum_factor_return = long_return - short_return

print("Momentum Factor Construction")
print("="*50)
print(f"\n1. Rank all {n_stocks} stocks by 12-month return")
print(f"2. Long top 20% (winners): avg 12m return = {momentum_signal[top_quintile].mean()*100:.1f}%")
print(f"3. Short bottom 20% (losers): avg 12m return = {momentum_signal[bottom_quintile].mean()*100:.1f}%")
print(f"\nNext Month Performance:")
print(f"  Winners: {long_return*100:+.2f}%")
print(f"  Losers: {short_return*100:+.2f}%")
print(f"  Momentum Factor (L-S): {momentum_factor_return*100:+.2f}%")

Momentum Factor Construction

1. Rank all 50 stocks by 12-month return
2. Long top 20% (winners): avg 12m return = 33.8%
3. Short bottom 20% (losers): avg 12m return = -8.9%

Next Month Performance:
  Winners: +3.20%
  Losers: -1.02%
  Momentum Factor (L-S): +4.22%


---

## Summary: Week 6 Key Formulas

| Model | Formula |
|-------|--------|
| CAPM | $E[r_i] = r_f + \beta_i(E[r_m] - r_f)$ |
| Beta | $\beta_i = \frac{Cov(r_i, r_m)}{Var(r_m)}$ |
| FF3 | $r_i - r_f = \alpha + \beta_M MKT + \beta_S SMB + \beta_H HML + \epsilon$ |
| FF5 | Add $\beta_R RMW + \beta_C CMA$ |
| APT | $r_i = E[r_i] + \sum_j \beta_{i,j} F_j + \epsilon_i$ |

### Key Takeaways

1. **CAPM**: Single factor (market beta) explains returns
2. **Fama-French**: Size and value explain anomalies
3. **APT**: Multiple factors, discovered empirically
4. **Factor investing**: Systematic exposure to return drivers
5. **Alpha**: Return not explained by factors = manager skill

---

*Next Week: Advanced Volatility Modeling*

## üî¥ PROS & CONS: THEORY

### ‚úÖ PROS (Advantages)

| Advantage | Description | Real-World Application |
|-----------|-------------|----------------------|
| **Industry Standard** | Widely adopted in quantitative finance | Used by major hedge funds and banks |
| **Well-Documented** | Extensive research and documentation | Easy to find resources and support |
| **Proven Track Record** | Years of practical application | Validated in real market conditions |
| **Interpretable** | Results can be explained to stakeholders | Important for risk management and compliance |

### ‚ùå CONS (Limitations)

| Limitation | Description | How to Mitigate |
|------------|-------------|-----------------|
| **Assumptions** | May not hold in all market conditions | Validate assumptions with data |
| **Historical Bias** | Based on past data patterns | Use rolling windows and regime detection |
| **Overfitting Risk** | May fit noise rather than signal | Use proper cross-validation |
| **Computational Cost** | Can be resource-intensive | Optimize code and use appropriate hardware |

### üéØ Real-World Usage

**WHERE THIS IS USED:**
- ‚úÖ Quantitative hedge funds (Two Sigma, Renaissance, Citadel)
- ‚úÖ Investment banks (Goldman Sachs, JP Morgan, Morgan Stanley)
- ‚úÖ Asset management firms
- ‚úÖ Risk management departments
- ‚úÖ Algorithmic trading desks

**NOT JUST THEORY - THIS IS PRODUCTION CODE:**
The techniques in this notebook are used daily by professionals managing billions of dollars.