# FF3 Regression with Lagged IV Skew

This notebook runs a standard Fama-French 3-factor regression augmented with the previous week's implied-volatility skew as an additional factor. We start with a single ticker (AAPL) and can generalize to other firms later.


In [1]:
import pandas as pd
import polars as pl
import numpy as np
import statsmodels.api as sm
from pathlib import Path

pd.set_option("display.max_columns", None)
print("✓ Libraries loaded")


✓ Libraries loaded


In [2]:
DATA_DIR = Path("processed_data")
MERGED_PATH = DATA_DIR / "merged_data_with_ff3.parquet"

print(f"Reading merged dataset from {MERGED_PATH} ...")
merged_df = pl.read_parquet(MERGED_PATH)
print(f"✓ Loaded: {merged_df.shape} (rows, cols)")


Reading merged dataset from processed_data/merged_data_with_ff3.parquet ...
✓ Loaded: (514768, 17) (rows, cols)


In [3]:
FIRM_TICKER = "AAPL"
REQUIRED_COLS = [
    "secid", "TICKER", "week_start", "week_end",
    "IV_skew", "weekly_return", "Mkt-RF", "SMB", "HML", "RF"
]

firm_pl = (
    merged_df
    .filter(pl.col("TICKER") == FIRM_TICKER)
    .select(REQUIRED_COLS)
    .with_columns([
        pl.col("IV_skew").shift(1).over("secid").alias("IV_skew_prev"),
        (pl.col("weekly_return") - pl.col("RF")).alias("excess_return")
    ])
    .drop_nulls(["IV_skew_prev", "excess_return", "Mkt-RF", "SMB", "HML"])
    .sort("week_start")
)

firm_pd = firm_pl.to_pandas()
firm_pd["week_start"] = pd.to_datetime(firm_pd["week_start"])
firm_pd["week_end"] = pd.to_datetime(firm_pd["week_end"])
print(f"Observations for {FIRM_TICKER}: {len(firm_pd):,}")
firm_pd.head()


Observations for AAPL: 243


Unnamed: 0,secid,TICKER,week_start,week_end,IV_skew,weekly_return,Mkt-RF,SMB,HML,RF,IV_skew_prev,excess_return
0,101594,AAPL,2019-01-07,2019-01-11,-0.026902,0.029747,0.0284,-0.0085,0.0089,0.0005,-0.023604,0.029247
1,101594,AAPL,2019-01-14,2019-01-18,-0.032017,0.005994,-0.0027,0.001,-0.0017,0.0005,-0.026902,0.005494
2,101594,AAPL,2019-01-21,2019-01-25,-0.031406,0.055527,0.0156,-0.0033,-0.008,0.0005,-0.032017,0.055027
3,101594,AAPL,2019-01-28,2019-02-01,-0.033564,0.027745,0.0011,0.0031,-0.0131,0.0005,-0.031406,0.027245
4,101594,AAPL,2019-02-04,2019-02-08,-0.040646,5.8e-05,0.0274,0.0156,0.0002,0.0005,-0.033564,-0.000442


In [4]:
def run_ff3_with_iv(data: pd.DataFrame, use_hac: bool = True) -> sm.regression.linear_model.RegressionResultsWrapper:
    X = data[["Mkt-RF", "SMB", "HML", "IV_skew_prev"]]
    X = sm.add_constant(X)
    y = data["excess_return"]

    model = sm.OLS(y, X).fit()

    if use_hac:
        maxlags = max(1, int(np.floor(4 * (len(data) / 100) ** (2 / 9))))
        robust = model.get_robustcov_results(cov_type="HAC", maxlags=maxlags)
        robust.maxlags = maxlags
        robust.rsq = model.rsquared
        return robust

    return model


In [5]:
model = run_ff3_with_iv(firm_pd, use_hac=True)
print(f"AAPL FF3 + lagged IV skew regression (HAC max lag = {model.maxlags})")
display(model.summary())


AAPL FF3 + lagged IV skew regression (HAC max lag = 4)


0,1,2,3
Dep. Variable:,excess_return,R-squared:,0.662
Model:,OLS,Adj. R-squared:,0.656
Method:,Least Squares,F-statistic:,150.4
Date:,"Thu, 13 Nov 2025",Prob (F-statistic):,6.230000000000001e-64
Time:,19:20:08,Log-Likelihood:,567.0
No. Observations:,243,AIC:,-1124.0
Df Residuals:,238,BIC:,-1107.0
Df Model:,4,,
Covariance Type:,HAC,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.0018,0.004,0.435,0.664,-0.006,0.010
Mkt-RF,1.1457,0.048,24.095,0.000,1.052,1.239
SMB,-0.5000,0.115,-4.351,0.000,-0.726,-0.274
HML,-0.3311,0.060,-5.521,0.000,-0.449,-0.213
IV_skew_prev,-0.0707,0.116,-0.611,0.542,-0.299,0.157

0,1,2,3
Omnibus:,20.983,Durbin-Watson:,1.951
Prob(Omnibus):,0.0,Jarque-Bera (JB):,41.658
Skew:,0.44,Prob(JB):,8.99e-10
Kurtosis:,4.827,Cond. No.,81.0
