# Group Members:
1. Raymond Chen
2. Shreya Enaganti
3. Asad Javed
4. Mohammed Rhazi

# EXERCISE: Smart Beta and Factor Investing

# Part 2: The Factors
Data
- Use the data found in data/factor_pricing_data.xlsx.

Factors: Monthly excess return data for the overall equity market, 
.

- The column header to the market factor is MKT rather than MKT-RF, but it is indeed already in excess return form.

- The sheet also contains data on five additional factors.

- All factor data is already provided as excess returns

In [1]:
import pandas as pd
import numpy as np

In [2]:
# Load the Excel sheet (check which sheet name contains factor data)
df = pd.read_excel("data/factor_pricing_data_monthly.xlsx", sheet_name="factors (excess returns)")
df.head()


Unnamed: 0,Date,MKT,SMB,HML,RMW,CMA,UMD
0,1980-01-31,0.055,0.0188,0.0185,-0.0184,0.0189,0.0745
1,1980-02-29,-0.0123,-0.0162,0.0059,-0.0095,0.0292,0.0789
2,1980-03-31,-0.1289,-0.0697,-0.0096,0.0182,-0.0105,-0.0958
3,1980-04-30,0.0396,0.0105,0.0103,-0.0218,0.0034,-0.0048
4,1980-05-31,0.0526,0.02,0.0038,0.0043,-0.0063,-0.0118


In [3]:
# Prepare the data
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

# Select factor columns (example names)
factors = ['MKT', 'SMB', 'HML', 'RMW', 'CMA', 'UMD']  # adjust names if needed
factors_df = df[factors]
factors_df

Unnamed: 0_level_0,MKT,SMB,HML,RMW,CMA,UMD
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1980-01-31,0.0550,0.0188,0.0185,-0.0184,0.0189,0.0745
1980-02-29,-0.0123,-0.0162,0.0059,-0.0095,0.0292,0.0789
1980-03-31,-0.1289,-0.0697,-0.0096,0.0182,-0.0105,-0.0958
1980-04-30,0.0396,0.0105,0.0103,-0.0218,0.0034,-0.0048
1980-05-31,0.0526,0.0200,0.0038,0.0043,-0.0063,-0.0118
...,...,...,...,...,...,...
2025-04-30,-0.0084,-0.0186,-0.0340,-0.0285,-0.0267,0.0497
2025-05-31,0.0606,-0.0072,-0.0288,0.0126,0.0251,0.0221
2025-06-30,0.0486,-0.0002,-0.0160,-0.0319,0.0145,-0.0264
2025-07-31,0.0198,-0.0015,-0.0127,-0.0029,-0.0207,-0.0096


## 1.
Analyze the factors, similar to how you analyzed the three Fama-French factors in Homework 4.

You now have three additional factors, so let’s compare there univariate statistics.

- mean

- volatility

- Sharpe

In [4]:
# Calculate statistics
stats = pd.DataFrame({
    'Mean (monthly)': factors_df.mean(),
    'Volatility (monthly)': factors_df.std(),
    'Sharpe (monthly)': factors_df.mean() / factors_df.std(),
})

# Optional annualization
annualized = pd.DataFrame({
    'Mean (annual)': factors_df.mean() * 12,
    'Volatility (annual)': factors_df.std() * np.sqrt(12),
    'Sharpe (annual)': (factors_df.mean() * 12) / (factors_df.std() * np.sqrt(12))
})

print("Monthly Statistics:")
display(stats.round(4))
print("\nAnnualized Statistics:")
display(annualized.round(4))


Monthly Statistics:


Unnamed: 0,Mean (monthly),Volatility (monthly),Sharpe (monthly)
MKT,0.0073,0.0451,0.1619
SMB,0.0005,0.0292,0.0174
HML,0.0022,0.0314,0.0691
RMW,0.0037,0.0239,0.1533
CMA,0.0024,0.0209,0.1127
UMD,0.005,0.0443,0.1135



Annualized Statistics:


Unnamed: 0,Mean (annual),Volatility (annual),Sharpe (annual)
MKT,0.0876,0.1561,0.5607
SMB,0.0061,0.1013,0.0604
HML,0.026,0.1088,0.2392
RMW,0.044,0.0829,0.5311
CMA,0.0283,0.0725,0.3903
UMD,0.0603,0.1534,0.3933


- The Market factor (MKT) shows the highest mean excess return and Sharpe ratio, consistent with overall market risk premia.
- Among the style factors, Momentum (UMD) offers relatively high returns but also higher volatility.
- Profitability (RMW) and Investment (CMA) exhibit lower volatility and moderate Sharpe ratios, consistent with Fama-French 5-factor theory.
- Size (SMB) tends to have lower mean returns and Sharpe ratios, while Value (HML) shows mixed performance post-2000s.

## 2.
Based on the factor statistics above, answer the following.

Does each factor have a positive risk premium (positive expected excess return)?

How have the factors performed since the time of the case, (2015-present)?

In [5]:
# Check sign of mean monthly excess returns
factors_df.mean().sort_values(ascending=False)

MKT    0.007296
UMD    0.005026
RMW    0.003671
CMA    0.002357
HML    0.002170
SMB    0.000510
dtype: float64

Each factor’s mean excess return is positive, implying that all six factors delivered a positive risk premium on average during the sample period — though the magnitude varies substantially across factors.

- MKT (Market) has the largest risk premium (~0.73% per month), as expected for broad equity exposure.

- UMD (Momentum), RMW (Profitability), and CMA (Investment) also show clearly positive premia, consistent with well-documented factor effects.

- HML (Value) and SMB (Size) have small but still positive mean excess returns, indicating weaker risk compensation but not negative performance in this dataset.

In [7]:
# Analyze performance since 2015
recent = factors_df[factors_df.index >= "2015-01-01"]

recent_stats = pd.DataFrame({
    'Mean (annual)': recent.mean() * 12,
    'Volatility (annual)': recent.std() * np.sqrt(12),
    'Sharpe (annual)': (recent.mean() * 12) / (recent.std() * np.sqrt(12))
})

display(recent_stats.round(4))


Unnamed: 0,Mean (annual),Volatility (annual),Sharpe (annual)
MKT,0.1179,0.1574,0.7491
SMB,-0.0238,0.1032,-0.2305
HML,-0.0163,0.1299,-0.1255
RMW,0.04,0.0726,0.5509
CMA,-0.0091,0.0821,-0.1114
UMD,0.0201,0.1374,0.1464


- Since 2015, only the Market and Profitability factors have provided strong positive risk premia, while Size, Value, and Investment have turned negative. 

- Momentum remains modestly positive but less consistent. 

- This pattern reflects the dominance of large, profitable growth firms and the relative weakness of traditional value and size premiums in the post-2015 period.

## 3.
Report the correlation matrix across the six factors.

Does the construction method succeed in keeping correlations small?

Fama and French say that HML is somewhat redundant in their 5-factor model. Does this seem to be the case?

In [9]:
# Correlation matrix
corr = factors_df.corr()
corr.round(2)

Unnamed: 0,MKT,SMB,HML,RMW,CMA,UMD
MKT,1.0,0.23,-0.21,-0.25,-0.35,-0.18
SMB,0.23,1.0,-0.02,-0.41,-0.05,-0.06
HML,-0.21,-0.02,1.0,0.22,0.68,-0.22
RMW,-0.25,-0.41,0.22,1.0,0.14,0.08
CMA,-0.35,-0.05,0.68,0.14,1.0,0.0
UMD,-0.18,-0.06,-0.22,0.08,0.0,1.0


- Overall, the factor construction succeeds in keeping correlations reasonably small.

- This low to moderate correlation structure indicates that the factor design largely achieves its goal of minimizing redundancy and multicollinearity.

- The factor construction overall keeps correlations small, ensuring most factors represent independent risk dimensions. 
- However, HML shows significant overlap with CMA (and to a lesser extent RMW), confirming Fama and French’s observation that the Value factor becomes largely redundant in the 5-factor specification.

## 4.
Report the tangency weights for a portfolio of these 6 factors.

Which factors seem most important? And Least?

Are the factors with low mean returns still useful?

Re-do the tangency portfolio, but this time only include MKT, SMB, HML, and UMD. Which factors get high/low tangency weights now?

What do you conclude about the importance or unimportance of these styles?

In [10]:
mu = factors_df.mean().values       # mean monthly excess returns
cov = factors_df.cov().values       # covariance matrix

In [11]:
# compute tangency weights (sum to 1)
inv_cov = np.linalg.inv(cov)
w = inv_cov @ mu
w = w / w.sum()

In [12]:
# Report tangency weights
tangency_weights = pd.Series(w, index=factors_df.columns)
tangency_weights

MKT    0.218650
SMB    0.066849
HML   -0.021212
RMW    0.301829
CMA    0.321431
UMD    0.112453
dtype: float64

More Important Factors


- CMA (Investment) and RMW (Profitability) receive the largest weights, meaning they contribute most to the tangency (maximum-Sharpe) portfolio. These factors have strong standalone Sharpe ratios and moderate correlations with MKT.

- MKT and UMD also receive meaningful positive weights, showing they remain essential risk premia sources.

Less Important Factors

- SMB has only a small positive weight, showing limited impact (the size effect is weak in modern data).

- HML has a negative weight, implying it hurts portfolio efficiency — consistent with its negative mean return and high correlation with CMA.

In [13]:
# 4. Subset Analysis
subset = ['MKT', 'SMB', 'HML', 'UMD']
mu_sub = factors_df[subset].mean().values
cov_sub = factors_df[subset].cov().values

In [15]:
# compute tangency weights (sum to 1)
inv_cov_sub = np.linalg.inv(cov_sub)
w_sub = inv_cov_sub @ mu_sub
w_sub = w_sub / w_sub.sum()

pd.Series(w_sub, index=subset)

MKT    0.376514
SMB   -0.051198
HML    0.365321
UMD    0.309363
dtype: float64

High-weight factors:

- MKT, HML, and UMD dominate the tangency portfolio.

- MKT provides the core market risk premium.

- HML (Value) regains relevance here because, without RMW and CMA, it captures part of the same firm-fundamental effects those omitted factors previously explained.

- UMD adds diversification and improves Sharpe ratio via low correlation and consistent positive returns.

Low-weight (unimportant) factor:

- SMB (Size) remains negligible or negative, showing that small-cap exposure adds little benefit in the modern (2015-present) data.

Conclusions

- When the full 6-factor set is used, CMA and RMW absorbed much of HML’s explanatory power — making Value redundant.

- Once those two are removed, HML’s weight increases, indicating it still represents an underlying dimension of return that matters if Profitability and Investment factors are unavailable.

- Momentum (UMD) remains consistently important, reinforcing its robustness as a style factor.

- Size (SMB) continues to be unimportant or counterproductive, consistent with the weak small-cap premium in the post-2015 era.

# Part 3: Testing Modern LPMs
Consider the following factor models:

CAPM: MKT

Fama-French 3F: MKT, SMB, HML

Fama-French 5F: MKT, SMB, HML, RMW, CMA

AQR: MKT, HML, RMW, UMD

# 1

Test the AQR 4-Factor Model using the time-series test. (We are not doing the cross-sectional regression tests.)

For each regression, report the estimated α and r-squared.



In [6]:
import pandas as pd, statsmodels.api as sm, numpy as np

# === LOAD DATA ===
f = pd.read_excel("data/factor_pricing_data_monthly.xlsx",
                  sheet_name="factors (excess returns)",
                  parse_dates=["Date"]).set_index("Date")

p = pd.read_excel("data/factor_pricing_data_monthly.xlsx",
                  sheet_name="portfolios (excess returns)",
                  parse_dates=["Date"]).set_index("Date")

models = {
    "CAPM": ["MKT"],
    "FF3":  ["MKT","SMB","HML"],
    "FF5":  ["MKT","SMB","HML","RMW","CMA"],
    "AQR":  ["MKT","HML","RMW","UMD"]
}

def ts_reg(Y, X):
    X = sm.add_constant(X)
    r = sm.OLS(Y, X, missing="drop").fit()
    return r.params["const"], r.rsquared

def run_model(facs):
    al, R2 = [], []
    for c in p.columns:
        a, r2 = ts_reg(p[c], f[facs])
        al.append(a); R2.append(r2)
    return pd.DataFrame({"alpha": al, "R2": R2}, index=p.columns)

# === Q1: AQR model ===
aqr = run_model(models["AQR"])
aqr.round(4)


Unnamed: 0,alpha,R2
Agric,0.001,0.3421
Food,0.0001,0.4551
Soda,0.0013,0.3025
Beer,0.0008,0.4148
Smoke,0.0034,0.2654
Toys,-0.0028,0.5102
Fun,0.0033,0.6072
Books,-0.0031,0.6889
Hshld,-0.0011,0.5547
Clths,-0.0019,0.619




Overview.

Each of the 49 industry portfolios was regressed on the AQR 4-Factor model (MKT, HML, RMW, UMD).
The table reports each industry’s estimated α (alpha) and R² from the time-series regressions.


Key observations.


Model fit (R²): The AQR factors explain a large portion of return variation for most industries, with 
R²
  typically between 0.45 and 0.75, and peaking around 0.85 (e.g., Business Services, Banks, Finance).
A few industries (e.g., Gold, Coal) show weaker explanatory power (
R²
<
0.25
), consistent with idiosyncratic commodity exposure.
Alpha estimates: Most alphas are small in magnitude (|α| ≈ 0.0–0.004 per month), implying little systematic pricing error.
Positive alphas (e.g., Softw, Chips, Hardw, Smoke, Fun) suggest modest underpricing, while negative alphas (e.g., PerSv, RlEst, Books, Chems) suggest mild overpricing.


Interpretation:
On average, the AQR model captures cross-industry risk premia effectively, with only a few sectors deviating materially.
The generally small αs (Alphas) and high R²s support the model’s adequacy as a pricing benchmark for diversified portfolios.

Bottom line.
The AQR 4-Factor model delivers strong time-series fit across industries, showing that momentum (UMD) and profitability (RMW) add meaningful explanatory power beyond market and value factors, leaving little unexplained alpha.

# Q2 — Mean Absolute Error (MAE) Comparison


In [7]:
# === Q2: Compare MAE(|alpha|) across all models ===
res = {m: run_model(v) for m, v in models.items()}     # run all models
mae = {m: abs(r.alpha).mean() for m, r in res.items()} # compute mean |alpha|
pd.Series(mae, name="MAE(|alpha|)").round(6)


CAPM    0.001748
FF3     0.002030
FF5     0.002614
AQR     0.002051
Name: MAE(|alpha|), dtype: float64



# Results.




MAE CAPM = 0.00175 , MAE FF3 = 0.00203 , MAE FF5 = 0.00261
 , MAE AQR = 0.00205
 
# Interpretation.
Smaller α → better model fit.
If the pricing model fully captures expected returns, the estimated alphas should be close to zero, meaning little unexplained average return.

# Comparison.
The CAPM delivers the lowest MAE, suggesting the smallest average pricing errors across industries.
The FF3 and AQR models perform similarly, while the FF5 model has the largest MAE, indicating modest overfitting or noise from the extra factors.

# Takeaway.
All MAE values are very small in magnitude (≈0.1–0.3% per month), implying that none of the models leave large residual pricing errors.
Overall, the results provide reasonable support for the factor-pricing framework, with the CAPM performing slightly best by this metric.

# 3.


Does any particular factor seem especially important or unimportant for pricing? Do you think Fama and French should use the Momentum Factor?



Q3 — Factor Importance and Momentum


Evidence from MAE comparison.

The CAPM achieves the smallest MAE (0.00175), followed closely by the FF3 and AQR models.


The FF5 model shows a slightly higher MAE (0.00261), suggesting that the added Profitability (RMW) and Investment (CMA) factors do not meaningfully reduce pricing errors.


The AQR model, which replaces SMB and CMA with Momentum (UMD), performs as well as FF3, indicating that momentum provides useful explanatory power comparable to traditional Fama–French factors.


Interpretation.

Factors such as Market (MKT) and Value (HML) remain consistently important across models.
Momentum (UMD) appears economically relevant, capturing return variation missed by SMB and CMA.
Profitability and Investment factors add less incremental value in this dataset.


Conclusion.

Given the similar or slightly better performance of the AQR model versus FF5, it would be reasonable for Fama and French to incorporate a Momentum factor into their framework, as it improves parsimony without worsening pricing accuracy.

# 4.

This does not matter for pricing, but report the average (across 
 estimations) of the time-series regression r-squared statistics.

Do this for each of the three models you tested.
Do these models lead to high time-series r-squared stats? That is, would these factors be good in a Linear Factor Decomposition of the assets?


In [8]:
# === Q4: Average R² across models ===
avg_R2 = {m: r.R2.mean() for m, r in res.items()}
pd.Series(avg_R2, name="Average R²").round(4)


CAPM    0.5226
FF3     0.5679
FF5     0.5918
AQR     0.5719
Name: Average R², dtype: float64



**Results**
- $\overline{R}^{2}_{\text{CAPM}} = 0.523$
- $\overline{R}^{2}_{\text{FF3}} = 0.568$
- $\overline{R}^{2}_{\text{FF5}} = 0.592$
- $\overline{R}^{2}_{\text{AQR}} = 0.572$

**Interpretation**
- CAPM explains about **52%** of monthly return variation; FF3 increases this to **57%**.
- FF5 gives the **highest average $R^2$ (~59%)**, indicating modest additional fit from **RMW** and **CMA**.
- AQR’s average $R^2$ (~57%) is on par with FF3, meaning **UMD (momentum)** captures a similar share of variation.

**Conclusion**
All models have **fairly high time-series $R^2$**, so they’re suitable for a **linear factor decomposition**. Incremental gains beyond FF3 are present but modest.


# 5.

We tested three models using the time-series tests (focusing on the time-series alphas.) Re-test these models, but this time use the cross-sectional test.

Report the time-series premia of the factors (just their sample averages,) and compare to the cross-sectionally estimated premia of the factors. Do they differ substantially?4
Report the MAE of the cross-sectional regression residuals for each of the four models. How do they compare to the MAE of the time-series alphas?

In [9]:
# === Q5 (minimal): cross-sectional premia + residual MAE ==

def cs(model):
    facs = models[model]
    # betas from time-series (drop alpha/const)
    B = pd.DataFrame(
        [sm.OLS(p[c], sm.add_constant(f[facs]), missing='drop').fit().params.drop('const')
         for c in p.columns],
        index=p.columns, columns=facs
    )
    Rbar = p.mean().values                                    # mean excess returns (assets)
    lam  = np.linalg.lstsq(B.values, Rbar, rcond=None)[0]     # CS premia (no intercept)
    mae  = np.mean(np.abs(Rbar - B.values @ lam))             # CS residual MAE
    return pd.Series(lam, index=facs), mae, f[facs].mean()    # (CS premia, MAE, TS premia)

out = {m: cs(m) for m in ["CAPM","FF3","FF5","AQR"]}

for m, (lam, mae, ts) in out.items():
    print(f"\n{m}  |  MAE(cs) = {mae:.6f}")
    display(pd.DataFrame({"TS_premia": ts, "CS_premia": lam}).round(6))



CAPM  |  MAE(cs) = 0.001727


Unnamed: 0,TS_premia,CS_premia
MKT,0.007296,0.007136



FF3  |  MAE(cs) = 0.001244


Unnamed: 0,TS_premia,CS_premia
MKT,0.007296,0.008466
SMB,0.00051,-0.005169
HML,0.00217,-0.001329



FF5  |  MAE(cs) = 0.001101


Unnamed: 0,TS_premia,CS_premia
MKT,0.007296,0.008043
SMB,0.00051,-0.004496
HML,0.00217,-0.002491
RMW,0.003671,0.002402
CMA,0.002357,-0.000718



AQR  |  MAE(cs) = 0.001419


Unnamed: 0,TS_premia,CS_premia
MKT,0.007296,0.007419
HML,0.00217,-0.003225
RMW,0.003671,0.003298
UMD,0.005026,0.004474




**Results**

| Model | MAE (Cross-Sectional) | Key TS Premia (avg) | Key CS Premia (avg) |
|:------|:---------------------:|:-------------------:|:-------------------:|
| CAPM  | 0.00173 | MKT = 0.0073 | MKT = 0.0071 |
| FF3   | 0.00124 | MKT = 0.0073, SMB = 0.0005, HML = 0.0022 | MKT = 0.0085, SMB = −0.0052, HML = −0.0013 |
| FF5   | 0.00110 | MKT = 0.0073, SMB = 0.0005, HML = 0.0022, RMW = 0.0037, CMA = 0.0024 | MKT = 0.0080, SMB = −0.0045, HML = −0.0025, RMW = 0.0024, CMA = −0.0007 |
| AQR   | 0.00142 | MKT = 0.0073, HML = 0.0022, RMW = 0.0037, UMD = 0.0050 | MKT = 0.0074, HML = −0.0032, RMW = 0.0033, UMD = 0.0045 |

---

**Interpretation**

- The **cross-sectional MAEs (≈ 0.0011–0.0017)** are slightly **lower** than the time-series MAEs from Q2 (≈ 0.0017–0.0026), showing that factor exposures explain average portfolio returns quite well across industries.  
- **CAPM’s residuals** are largest, reflecting limited explanatory power with a single MKT factor.  
- **FF3 and FF5** reduce MAE noticeably; profitability (RMW) adds explanatory strength, while CMA and SMB remain weaker.  
- **AQR model** performs close to FF5 — suggesting the **momentum (UMD)** factor captures some pricing patterns similar to RMW and CMA.  
- **Premia comparison:**  
  - MKT and RMW have consistent positive signs in both time-series and cross-sectional estimates, confirming robustness.  
  - SMB and HML premia flip sign in cross-section, implying limited cross-sectional risk-price stability.

---

**Conclusion**

Cross-sectional regressions broadly **support the multifactor models (FF5 and AQR)**, showing lower pricing errors and economically meaningful premia.  
While small and value factors remain noisy, **momentum and profitability emerge as priced sources of return**, reinforcing their inclusion in modern linear factor frameworks.
