# VIX-Based Continuous Regression Analysis

This notebook estimates a continuous model where the regime dummies are replaced by the **VIX index** (CBOE Volatility Index). [cite_start]The VIX serves as a continuous proxy for market stress and uncertainty[cite: 21].

## Model Specification

We estimate the following regression model:

$$R_{t} = \alpha + \theta \cdot VIX_{t-1} + \beta \cdot Factors_t + \eta \cdot (VIX_{t-1} \times Factors_t) + \epsilon_t$$

### Variable Definitions

- [cite_start]**$R_t$**: Monthly long-short anomaly return[cite: 8].
- [cite_start]**$VIX_{t-1}$**: The **lagged** and **standardized** CBOE Volatility Index[cite: 19, 21].
- [cite_start]**$Factors_t$**: Vector of Famaâ€“French three factors (MKT, SMB, HML)[cite: 11].
- [cite_start]**$\theta$**: Coefficient measuring the direct effect of market stress (VIX) on anomaly returns[cite: 22].
- [cite_start]**$\eta$**: Vector of interaction coefficients capturing how the factor loadings vary with the level of market stress[cite: 23].


In [None]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from IPython.display import display

# Suppress warnings for cleaner output
import warnings
warnings.filterwarnings('ignore')

In [None]:
# 1. Load VIX Data
vix_df = pd.read_csv('VIX Regression Data/VIXCLS.csv', parse_dates=['observation_date'], index_col='observation_date')

# Handle potential non-numeric data (replace '.' with NaN if exists)
vix_df['VIXCLS'] = pd.to_numeric(vix_df['VIXCLS'], errors='coerce')

# 2. Standardization
vix_mean = vix_df['VIXCLS'].mean()
vix_std = vix_df['VIXCLS'].std()
vix_df['VIX_Std'] = (vix_df['VIXCLS'] - vix_mean) / vix_std

# 3. Lagging
# The model uses VIX_{t-1} (lagged by one month)
vix_df['VIX_Lagged_Std'] = vix_df['VIX_Std'].shift(1)

# Display to verify
print("VIX Data Preview (Standardized and Lagged):")
display(vix_df[['VIXCLS', 'VIX_Std', 'VIX_Lagged_Std']].head())

In [None]:
# Load Excess Returns and Fama-French Factors
excess_returns = pd.read_excel('./Regression Data/excess_returns.xlsx', index_col=0, parse_dates=True)
ff_factors = pd.read_excel('./Regression Data/fama_french_factors.xlsx', index_col='date', parse_dates=True)

# 1. Convert Returns and Factors to Period (safe because they are freshly loaded above)
excess_returns.index = excess_returns.index.to_period('M')
ff_factors.index = ff_factors.index.to_period('M')

# 2. Convert VIX to Period (Safely)
# vix_df is loaded in the previous cell, so we check if it's already a PeriodIndex
if not isinstance(vix_df.index, pd.PeriodIndex):
    vix_df.index = vix_df.index.to_period('M')

print("Data Loaded and Indices Aligned.")

In [None]:
# Select required columns
ff_subset = ff_factors[['Mkt-RF', 'SMB', 'HML']]
vix_subset = vix_df[['VIX_Lagged_Std']]

# Merge all datasets
df_vix_model = pd.merge(excess_returns, ff_subset, left_index=True, right_index=True, how='inner')
df_vix_model = pd.merge(df_vix_model, vix_subset, left_index=True, right_index=True, how='inner')

# Drop any rows with NaN (likely the first row due to lagging)
df_vix_model.dropna(inplace=True)

display(df_vix_model.head())

In [None]:
def run_vix_regressions(data, anomalies):
    results_list = []
    
    # Define Core Variables
    factors = ['Mkt-RF', 'SMB', 'HML']
    vix_col = 'VIX_Lagged_Std'
    
    # Create Interaction Terms: VIX_Lagged * Factor
    # This captures the 'eta' coefficients (change in loading due to stress)
    interaction_cols = []
    for f in factors:
        int_col = f"VIX_x_{f}"
        data[int_col] = data[vix_col] * data[f]
        interaction_cols.append(int_col)
    
    # Independent Variables: Constant + Factors + VIX + Interactions
    X_cols = factors + [vix_col] + interaction_cols
    X = sm.add_constant(data[X_cols])
    
    for anomaly in anomalies:
        y = data[anomaly]
        
        # Fit OLS with Newey-West (HAC) standard errors
        model = sm.OLS(y, X).fit(cov_type='HAC', cov_kwds={'maxlags': 3})
        
        # Store Results
        results_list.append({
            'Anomaly': anomaly,
            'Alpha': model.params['const'],
            'Alpha_P_Value': model.pvalues['const'],
            
            # Direct VIX Effect (Theta)
            'Theta_VIX': model.params[vix_col],
            'Theta_P_Value': model.pvalues[vix_col],
            
            # Base Factor Loadings (Beta)
            'Beta_MKT': model.params['Mkt-RF'],
            'Beta_P_Value_MKT': model.pvalues['Mkt-RF'],
            'Beta_SMB': model.params['SMB'],
            'Beta_P_Value_SMB': model.pvalues['SMB'],
            'Beta_HML': model.params['HML'],
            'Beta_P_Value_HML': model.pvalues['HML'],
            
            # Interaction Coefficients (Eta)
            'Eta_MKT (VIX*MKT)': model.params['VIX_x_Mkt-RF'],
            'Eta_P_Value_MKT': model.pvalues['VIX_x_Mkt-RF'],
            'Eta_SMB (VIX*SMB)': model.params['VIX_x_SMB'],
            'Eta_P_Value_SMB': model.pvalues['VIX_x_SMB'],
            'Eta_HML (VIX*HML)': model.params['VIX_x_HML'],
            'Eta_P_Value_HML': model.pvalues['VIX_x_HML'],
            'Adj_R2': model.rsquared_adj
        })
        
    return pd.DataFrame(results_list)

In [None]:
# List of anomalies (Dependent Variables)
anomalies_list = ['Accruals', 'Asset Growth', 'BM', 'Gross Profit', 'Momentum', 'Leverage Ret']

print("Running VIX-Based Interaction Model...")
vix_results = run_vix_regressions(df_vix_model, anomalies_list)

# Export and Display
vix_results.to_excel('./VIX Regression Results/vix_model_results.xlsx', index=False)
display(round(vix_results, 4))