# Stylized Facts of Financial Returns

This notebook analyzes empirical properties of financial returns that motivate volatility modeling:

1. **Fat tails**: Returns exhibit excess kurtosis
2. **Volatility clustering**: Large moves tend to cluster
3. **Leverage effect**: Negative shocks increase volatility more
4. **No autocorrelation in returns**: Weak form efficiency
5. **Strong autocorrelation in |returns| and returns²**: Volatility persistence

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import sys
sys.path.append('..')

from src.data_loader import DataLoader
from src.returns import StylizedFacts

sns.set_style('whitegrid')
plt.rcParams['figure.dpi'] = 100

%matplotlib inline

## Load Data

In [None]:
loader = DataLoader()
returns, prices = loader.prepare_dataset(
    ticker="^GSPC",
    start_date="2010-01-01",
    return_method="log"
)

print(f"Loaded {len(returns)} daily returns")
print(f"Date range: {returns.index[0]} to {returns.index[-1]}")

## Summary Statistics

In [None]:
sf = StylizedFacts(returns)
summary = sf.summary_statistics()

summary_df = pd.DataFrame(summary, index=[0]).T
summary_df.columns = ['Value']
print(summary_df)

**Observations:**
- Mean ≈ 0: No predictable drift at daily frequency
- Excess kurtosis > 0: Fat tails (leptokurtic distribution)
- Negative skewness: Left tail is heavier (crash risk)

## Distribution Analysis

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Histogram
axes[0].hist(returns, bins=100, density=True, alpha=0.7, edgecolor='black')
x_range = np.linspace(returns.min(), returns.max(), 100)
axes[0].plot(x_range, stats.norm.pdf(x_range, returns.mean(), returns.std()), 
             'r-', linewidth=2, label='Normal')
axes[0].set_title('Return Distribution vs Normal')
axes[0].set_xlabel('Log Returns')
axes[0].set_ylabel('Density')
axes[0].legend()
axes[0].grid(alpha=0.3)

# Q-Q plot
stats.probplot(returns, dist="norm", plot=axes[1])
axes[1].set_title('Q-Q Plot vs Normal')
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

## Normality Tests

In [None]:
normality_tests = sf.normality_tests()

print("Normality Tests:")
for test_name, (stat, pval) in normality_tests.items():
    print(f"  {test_name:20s}: stat={stat:10.2f}, p-value={pval:.6f}")
    if pval < 0.05:
        print(f"    → Reject normality at 5% level")
    else:
        print(f"    → Cannot reject normality")

## Autocorrelation Structure

In [None]:
acf_df = sf.autocorrelation_structure(max_lag=30)

fig, ax = plt.subplots(figsize=(12, 6))

ax.plot(acf_df['lag'], acf_df['acf_returns'], 'o-', label='Returns', markersize=4)
ax.plot(acf_df['lag'], acf_df['acf_abs_returns'], 's-', label='|Returns|', markersize=4)
ax.plot(acf_df['lag'], acf_df['acf_squared_returns'], '^-', label='Returns²', markersize=4)

# Confidence bands
n = len(returns)
conf_level = 1.96 / np.sqrt(n)
ax.axhline(conf_level, color='gray', linestyle='--', linewidth=1, alpha=0.5)
ax.axhline(-conf_level, color='gray', linestyle='--', linewidth=1, alpha=0.5)
ax.axhline(0, color='black', linestyle='-', linewidth=0.5)

ax.set_title('Autocorrelation Function', fontsize=12)
ax.set_xlabel('Lag')
ax.set_ylabel('Autocorrelation')
ax.legend()
ax.grid(alpha=0.3)

plt.tight_layout()
plt.show()

**Key Finding:** Returns show little autocorrelation, but |returns| and returns² show strong persistence. This is the signature of **volatility clustering**.

## ARCH Effects Test

In [None]:
# Ljung-Box tests
lb_returns = sf.ljung_box_test(lags=10, series_type='returns')
lb_squared = sf.ljung_box_test(lags=10, series_type='squared')

print("Ljung-Box Test (10 lags):")
print(f"  Returns:  stat={lb_returns[0]:.2f}, p-value={lb_returns[1]:.6f}")
print(f"  Squared:  stat={lb_squared[0]:.2f}, p-value={lb_squared[1]:.6f}")

# ARCH-LM test
arch_lm = sf.arch_lm_test(lags=5)
print(f"\nARCH-LM Test (5 lags):")
print(f"  stat={arch_lm[0]:.2f}, p-value={arch_lm[1]:.6f}")

if arch_lm[1] < 0.05:
    print("  → Strong evidence of ARCH effects (time-varying volatility)")

## Leverage Effect

In [None]:
leverage_corr = sf.leverage_effect_correlation(lag=1)

print(f"Leverage Effect (correlation between r_t and r²_{{t+1}}): {leverage_corr:.4f}")

if leverage_corr < 0:
    print("  → Negative correlation confirms leverage effect")
    print("  → Negative shocks increase future volatility more than positive shocks")

## Tail Behavior

In [None]:
tail_indices = sf.tail_index_estimation(tail_fraction=0.05)

print("Tail Index Estimation (Hill estimator):")
print(f"  Left tail (losses):  α = {tail_indices['left_tail_index']:.2f}")
print(f"  Right tail (gains):  α = {tail_indices['right_tail_index']:.2f}")
print("\n  Lower α = fatter tails")
print("  Normal distribution has α = ∞")
print("  Typical for financial returns: α ≈ 3-5")

## Volatility Clustering Visualization

In [None]:
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

# Returns over time
plot_returns = returns.iloc[-1000:]
axes[0].plot(plot_returns.index, plot_returns.values, linewidth=0.5, alpha=0.7)
axes[0].set_title('Daily Returns (Last 1000 days)', fontsize=12)
axes[0].set_ylabel('Log Returns')
axes[0].axhline(0, color='black', linestyle='-', linewidth=0.5)
axes[0].grid(alpha=0.3)

# Absolute returns (volatility proxy)
axes[1].plot(plot_returns.index, np.abs(plot_returns.values), linewidth=0.8, alpha=0.7)
axes[1].set_title('Absolute Returns (Volatility Proxy)', fontsize=12)
axes[1].set_ylabel('|Returns|')
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

## Summary

**Stylized facts confirmed:**

1. ✓ Fat tails (excess kurtosis, non-normal distribution)
2. ✓ Volatility clustering (strong ACF in squared returns)
3. ✓ Leverage effect (negative correlation with future volatility)
4. ✓ No return autocorrelation (weak form efficiency)
5. ✓ ARCH effects present (time-varying volatility)

**Implications:**
- Constant volatility assumption is violated
- Need conditional heteroskedasticity models (GARCH family)
- Asymmetric models (EGARCH, GJR-GARCH) may capture leverage effect
- Risk models must account for fat tails