# IV Skew Delta FF5 Diagnostics

Construct daily IV skew delta quintile portfolios and estimate FF5 time-series regressions (including a Q5–Q1 long/short spread) to gauge whether the factor delivers statistically significant alphas. The notebook also documents standard-error choices.


In [8]:
import pandas as pd
import polars as pl
import numpy as np
import statsmodels.api as sm
from pathlib import Path

pd.set_option('display.max_columns', None)
print('✓ Libraries ready')


✓ Libraries ready


In [9]:
PROCESSED_DATA_DIR = Path('processed_data')
IV_DELTA_PATH = PROCESSED_DATA_DIR / 'daily_iv_skew_delta.parquet'

print(f'Loading IV skew delta data from {IV_DELTA_PATH} ...')
required_cols = [
    'secid', 'iv_date', 'next_date',
    'IV_skew', 'IV_skew_delta_25', 'excess_return',
    'Mkt-RF', 'SMB', 'HML', 'RMW', 'CMA', 'RF'
]
iv_delta_pl = pl.read_parquet(IV_DELTA_PATH, columns=required_cols)
print(f'✓ Loaded {iv_delta_pl.shape[0]:,} rows / {iv_delta_pl.shape[1]} columns')
iv_delta_pl.head()


Loading IV skew delta data from processed_data/daily_iv_skew_delta.parquet ...
✓ Loaded 624,626 rows / 12 columns


secid,iv_date,next_date,IV_skew,IV_skew_delta_25,excess_return,Mkt-RF,SMB,HML,RMW,CMA,RF
i64,date,date,f32,f32,f64,f64,f64,f64,f64,f64,f64
6646,2023-03-06,2023-03-07,-0.174937,-0.193222,0.128164,-0.0145,0.0063,-0.0067,-0.0013,-0.0012,0.0002
6646,2023-04-04,2023-04-05,-0.145058,-0.243835,-0.038758,-0.0039,-0.01,0.014,0.0079,0.0129,0.0002
8170,2019-10-22,2019-10-23,-0.041055,-0.06043,0.026124,0.0025,-0.001,0.0027,-0.0001,0.0009,0.0001
8170,2019-10-23,2019-10-24,0.058631,0.128819,0.011257,0.0025,-0.005,-0.0089,-0.0005,-0.0061,0.0001
8170,2019-10-24,2019-10-25,0.078637,0.088066,0.024044,0.005,0.004,0.0007,0.0034,0.0,0.0001


In [10]:
print('Preparing daily quintile sorts on IV_skew_delta_25...')
needed_for_sort = iv_delta_pl.drop_nulls([
    'IV_skew_delta_25', 'excess_return',
    'Mkt-RF', 'SMB', 'HML', 'RMW', 'CMA'
])
print(f'Rows after dropping key-factor nulls: {needed_for_sort.shape[0]:,}')

daily_pd = needed_for_sort.to_pandas()
daily_pd['next_date'] = pd.to_datetime(daily_pd['next_date'])
daily_pd['iv_date'] = pd.to_datetime(daily_pd['iv_date'])

def assign_quintiles(series: pd.Series) -> np.ndarray:
    # Require at least 25 observations with >=5 unique values to make clean quintiles.
    if series.nunique(dropna=True) < 5 or len(series) < 25:
        return np.full(len(series), np.nan)
    quint = pd.qcut(series, 5, labels=False, duplicates='drop')
    return quint.astype(float).to_numpy() + 1

quintile_col = (
    daily_pd.groupby('next_date')['IV_skew_delta_25']
            .transform(assign_quintiles)
)
daily_pd['iv_delta_quintile'] = quintile_col

print('Quintile coverage:')
print(daily_pd['iv_delta_quintile'].value_counts(dropna=False).sort_index())
daily_pd.head()


Preparing daily quintile sorts on IV_skew_delta_25...
Rows after dropping key-factor nulls: 624,626
Quintile coverage:
iv_delta_quintile
1.0    125175
2.0    124806
3.0    124789
4.0    124806
5.0    125049
NaN         1
Name: count, dtype: int64


Unnamed: 0,secid,iv_date,next_date,IV_skew,IV_skew_delta_25,excess_return,Mkt-RF,SMB,HML,RMW,CMA,RF,iv_delta_quintile
0,6646,2023-03-06,2023-03-07,-0.174937,-0.193222,0.128164,-0.0145,0.0063,-0.0067,-0.0013,-0.0012,0.0002,1.0
1,6646,2023-04-04,2023-04-05,-0.145058,-0.243835,-0.038758,-0.0039,-0.01,0.014,0.0079,0.0129,0.0002,1.0
2,8170,2019-10-22,2019-10-23,-0.041055,-0.06043,0.026124,0.0025,-0.001,0.0027,-0.0001,0.0009,0.0001,1.0
3,8170,2019-10-23,2019-10-24,0.058631,0.128819,0.011257,0.0025,-0.005,-0.0089,-0.0005,-0.0061,0.0001,5.0
4,8170,2019-10-24,2019-10-25,0.078637,0.088066,0.024044,0.005,0.004,0.0007,0.0034,0.0,0.0001,5.0


In [11]:
portfolio_cols = [f'Q{i}' for i in range(1, 6)] + ['Q5_Q1']
factor_cols = ['Mkt-RF', 'SMB', 'HML', 'RMW', 'CMA']

valid_daily = daily_pd.dropna(subset=['iv_delta_quintile'])
print(f"Observations with valid quintiles: {len(valid_daily):,}")

# Equal-weight excess returns for each quintile per day
pivot_returns = (
    valid_daily
    .groupby(['next_date', 'iv_delta_quintile'])['excess_return']
    .mean()
    .unstack('iv_delta_quintile')
    .rename(columns=lambda q: f'Q{int(q)}')
    .sort_index()
)

# Ensure all quintile columns exist even if some dates dropped
for col in [f'Q{i}' for i in range(1, 6)]:
    if col not in pivot_returns.columns:
        pivot_returns[col] = np.nan
pivot_returns = pivot_returns[[f'Q{i}' for i in range(1, 6)]]
pivot_returns['Q5_Q1'] = pivot_returns['Q5'] - pivot_returns['Q1']

factor_panel = (
    valid_daily.groupby('next_date')[factor_cols].mean()
    .sort_index()
)

portfolio_panel = pivot_returns.join(factor_panel, how='inner').dropna()
print(f"Regression-ready observations: {len(portfolio_panel):,}")
portfolio_panel.head()


Observations with valid quintiles: 624,625
Regression-ready observations: 644


Unnamed: 0_level_0,Q1,Q2,Q3,Q4,Q5,Q5_Q1,Mkt-RF,SMB,HML,RMW,CMA
next_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2019-02-26,-0.000377,0.000487,-0.003776,-0.002084,0.002831,0.003207,-0.0016,-0.0068,-0.0033,0.001,-0.0016
2019-02-27,-0.006072,-0.009042,0.003979,-0.008655,0.008575,0.014647,0.001,0.0015,-0.0012,-0.0066,-0.0013
2019-02-28,-0.002248,-0.006767,-0.001383,-0.00306,-0.000808,0.001441,-0.0031,-0.0001,-0.0027,0.0029,0.0017
2019-03-01,0.002411,0.001911,0.004188,-0.001552,0.006719,0.004308,0.0072,0.0025,-0.0041,-0.0036,-0.0019
2019-03-05,-0.004118,0.000252,-0.004033,-0.002208,0.000954,0.005072,-0.0017,-0.0032,-0.0022,0.0029,-0.0001


In [12]:
def run_ff5_ts(y: pd.Series, X: pd.DataFrame, hac_lags: int = 21):
    X_with_const = sm.add_constant(X)
    model = sm.OLS(y, X_with_const).fit()
    setattr(model, '_rsquared', model.rsquared)
    return model

hac_lags = 21  # ≈ one trading month of serial-correlation adjustment
factor_matrix = portfolio_panel[factor_cols]

regression_rows = []
models = {}
for col in portfolio_cols:
    model = run_ff5_ts(portfolio_panel[col], factor_matrix, hac_lags=hac_lags)
    params = model.params
    tvals = model.tvalues
    regression_rows.append({
        'portfolio': col,
        'n_obs': int(model.nobs),
        'alpha': params['const'],
        'alpha_t': tvals['const'],
        'beta_mkt': params['Mkt-RF'],
        'beta_smb': params['SMB'],
        'beta_hml': params['HML'],
        'beta_rmw': params['RMW'],
        'beta_cma': params['CMA'],
        'r_squared': getattr(model, '_rsquared', np.nan)
    })
    models[col] = model

regression_df = pd.DataFrame(regression_rows)
regression_df


Unnamed: 0,portfolio,n_obs,alpha,alpha_t,beta_mkt,beta_smb,beta_hml,beta_rmw,beta_cma,r_squared
0,Q1,644,0.000178,1.020407,1.054727,0.599569,0.122341,-0.255008,-0.075893,0.933497
1,Q2,644,0.00011,0.720462,1.05475,0.383535,0.033265,-0.200974,-0.005125,0.940869
2,Q3,644,-1.4e-05,-0.102602,1.06139,0.341762,0.009293,-0.18236,-0.024359,0.953872
3,Q4,644,0.000229,1.536053,1.039988,0.382437,0.07512,-0.197704,-0.096296,0.943676
4,Q5,644,0.000225,1.306854,1.052452,0.709884,0.157923,-0.301677,-0.168427,0.939972
5,Q5_Q1,644,4.7e-05,0.263076,-0.002275,0.110315,0.035582,-0.046669,-0.092534,0.064245


In [13]:
display_cols = ['portfolio', 'n_obs', 'alpha', 'alpha_t', 'r_squared'] + [
    'beta_mkt', 'beta_smb', 'beta_hml', 'beta_rmw', 'beta_cma'
]
sorted_results = (
    regression_df[display_cols]
    .set_index('portfolio')
    .loc[portfolio_cols]
)
sorted_results = sorted_results.round(4)
print('FF5 HAC(21) regression summary (alphas in daily excess-return units):')
sorted_results


FF5 HAC(21) regression summary (alphas in daily excess-return units):


Unnamed: 0_level_0,n_obs,alpha,alpha_t,r_squared,beta_mkt,beta_smb,beta_hml,beta_rmw,beta_cma
portfolio,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Q1,644,0.0002,1.0204,0.9335,1.0547,0.5996,0.1223,-0.255,-0.0759
Q2,644,0.0001,0.7205,0.9409,1.0547,0.3835,0.0333,-0.201,-0.0051
Q3,644,-0.0,-0.1026,0.9539,1.0614,0.3418,0.0093,-0.1824,-0.0244
Q4,644,0.0002,1.5361,0.9437,1.04,0.3824,0.0751,-0.1977,-0.0963
Q5,644,0.0002,1.3069,0.94,1.0525,0.7099,0.1579,-0.3017,-0.1684
Q5_Q1,644,0.0,0.2631,0.0642,-0.0023,0.1103,0.0356,-0.0467,-0.0925


In [14]:
spread_model = models['Q5_Q1']
print('Detailed HAC(21) FF5 regression for Q5-Q1 spread:')
display(spread_model.summary())


Detailed HAC(21) FF5 regression for Q5-Q1 spread:


0,1,2,3
Dep. Variable:,Q5_Q1,R-squared:,0.064
Model:,OLS,Adj. R-squared:,0.057
Method:,Least Squares,F-statistic:,8.76
Date:,"Sat, 15 Nov 2025",Prob (F-statistic):,4.75e-08
Time:,18:34:20,Log-Likelihood:,2563.5
No. Observations:,644,AIC:,-5115.0
Df Residuals:,638,BIC:,-5088.0
Df Model:,5,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,4.73e-05,0.000,0.263,0.793,-0.000,0.000
Mkt-RF,-0.0023,0.015,-0.154,0.878,-0.031,0.027
SMB,0.1103,0.028,3.942,0.000,0.055,0.165
HML,0.0356,0.025,1.410,0.159,-0.014,0.085
RMW,-0.0467,0.036,-1.297,0.195,-0.117,0.024
CMA,-0.0925,0.045,-2.051,0.041,-0.181,-0.004

0,1,2,3
Omnibus:,64.408,Durbin-Watson:,1.827
Prob(Omnibus):,0.0,Jarque-Bera (JB):,360.562
Skew:,-0.192,Prob(JB):,5.07e-79
Kurtosis:,6.646,Cond. No.,272.0


### How the FF5 time-series test works
- **Portfolio construction:** Each trading day forms five equal-weight baskets sorted on `IV_skew_delta_25` (after requiring ≥25 names and ≥5 unique values). The `Q5_Q1` leg is a daily long/short of the extreme quintiles.
- **Regression specification:** For each portfolio, run a time-series regression of daily portfolio excess returns on the FF5 factors plus an intercept. This follows the standard Fama-French evaluation: a significant alpha after controls implies incremental predictive power.
- **Standard errors:** Portfolio returns inherit serial correlation from overlapping signals (25-day lag). To guard against downward-biased t-stats we use Newey–West/HAC errors with `maxlags = 21` ≈ one trading month. If you expect autocorrelation to persist longer (e.g., monthly rebalancing or additional smoothing), bump the lag to 63 (≈ one quarter). Clustered SEs across time are unnecessary here because each regression already pools a single time series, but they become relevant if you stack multiple portfolios and run SUR/panel setups.
