[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/danpele/Time-Series-Analysis/blob/main/EN/Course_Notebooks/chapter5_garch_lecture_notebook.ipynb)

---

# Volatility Models: ARCH, GARCH and Extensions

**Course:** Time Series Analysis and Forecasting  
**Program:** Bachelor program, Faculty of Cybernetics, Statistics and Economic Informatics, Bucharest University of Economic Studies, Romania  
**Academic Year:** 2025-2026

---

## Learning Objectives

By the end of this notebook, you will be able to:
1. Understand volatility clustering and stylized facts of financial returns
2. Estimate and interpret ARCH and GARCH models
3. Apply asymmetric models (EGARCH, GJR-GARCH) to capture leverage effect
4. Perform model diagnostics and selection
5. Forecast volatility and calculate Value at Risk (VaR)

## Setup and Imports

In [None]:
# Install packages if needed (for Colab)
try:
    from arch import arch_model
    import yfinance as yf
except ImportError:
    !pip install arch yfinance --quiet
    from arch import arch_model
    import yfinance as yf

# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

# Statistical models
from arch import arch_model
from arch.univariate import GARCH, EGARCH, ConstantMean
from statsmodels.stats.diagnostic import het_arch, acorr_ljungbox
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Plotting style
plt.rcParams['figure.figsize'] = (12, 5)
plt.rcParams['font.size'] = 11
plt.rcParams['axes.facecolor'] = 'none'
plt.rcParams['figure.facecolor'] = 'none'
plt.rcParams['savefig.facecolor'] = 'none'
plt.rcParams['savefig.transparent'] = True
plt.rcParams['axes.grid'] = False
plt.rcParams['axes.spines.top'] = False
plt.rcParams['axes.spines.right'] = False
plt.rcParams['legend.frameon'] = False

# Colors (IDA scheme)
COLORS = {
    'blue': '#1A3A6E',
    'red': '#DC3545',
    'green': '#2E7D32',
    'orange': '#E67E22',
    'gray': '#666666'
}

print("All libraries loaded successfully!")

## 1. Why Model Volatility?

**ARIMA models assume constant variance (homoskedasticity)**

Financial time series exhibit:
- **Volatility clustering**: Large changes followed by large changes
- **Fat tails (leptokurtosis)**: More extreme values than normal distribution
- **Leverage effect**: Negative returns increase volatility more than positive returns

These features require **conditional heteroskedasticity models**.

In [None]:
# Download real S&P 500 data to demonstrate volatility clustering
print("Downloading S&P 500 data from Yahoo Finance...")

sp500 = yf.download('^GSPC', start='2000-01-01', end='2024-12-31', progress=False)
sp500_close = sp500['Close'].squeeze() if isinstance(sp500['Close'], pd.DataFrame) else sp500['Close']

# Calculate returns (percentage)
returns = (sp500_close.pct_change() * 100).dropna()
returns = pd.Series(returns.values, index=returns.index, name='returns')

print(f"\nS&P 500 Data Summary:")
print(f"Period: {returns.index[0].date()} to {returns.index[-1].date()}")
print(f"Observations: {len(returns)}")
print(f"\nBasic Statistics:")
print(f"  Mean return: {returns.mean():.4f}%")
print(f"  Std deviation: {returns.std():.4f}%")
print(f"  Min: {returns.min():.2f}%")
print(f"  Max: {returns.max():.2f}%")
print(f"  Skewness: {stats.skew(returns):.4f}")
print(f"  Kurtosis: {stats.kurtosis(returns)+3:.4f} (Normal = 3)")

In [None]:
# Visualize volatility clustering in real S&P 500 data
fig, axes = plt.subplots(2, 1, figsize=(14, 8), sharex=True)

# Returns
axes[0].plot(returns.index, returns.values, color=COLORS['blue'], linewidth=0.5, alpha=0.8)
axes[0].axhline(y=0, color='black', linewidth=0.5)
axes[0].set_ylabel('Returns (%)')
axes[0].set_title('S&P 500 Daily Returns (2000-2024): Volatility Clustering', fontweight='bold')

# Mark crisis periods
crisis_periods = [
    ('2008-09-01', '2009-03-31', '2008 Financial Crisis'),
    ('2020-02-15', '2020-04-30', 'COVID-19'),
    ('2022-01-01', '2022-10-31', '2022 Bear Market')
]
for start, end, label in crisis_periods:
    axes[0].axvspan(pd.Timestamp(start), pd.Timestamp(end), alpha=0.2, color=COLORS['red'])

# Rolling volatility (20-day)
rolling_vol = returns.rolling(window=20).std()
axes[1].plot(rolling_vol.index, rolling_vol.values, color=COLORS['red'], linewidth=0.8)
axes[1].fill_between(rolling_vol.index, 0, rolling_vol.values, color=COLORS['red'], alpha=0.3)
axes[1].set_ylabel('Rolling Volatility (%)')
axes[1].set_xlabel('Date')
axes[1].set_title('20-Day Rolling Volatility', fontweight='bold')

for start, end, label in crisis_periods:
    axes[1].axvspan(pd.Timestamp(start), pd.Timestamp(end), alpha=0.2, color=COLORS['red'])

plt.tight_layout()
plt.show()

print("\nKey observations:")
print("  • Volatility clustering is clearly visible - large moves followed by large moves")
print("  • Crisis periods (2008, 2020) show extreme volatility spikes")
print("  • Periods of calm are followed by periods of calm")

## 2. Stylized Facts of Financial Returns

Key empirical regularities:
1. **No autocorrelation** in returns $r_t$
2. **Significant autocorrelation** in $r_t^2$ and $|r_t|$
3. **Fat tails** (kurtosis > 3)
4. **Volatility clustering**

In [None]:
# Check stylized facts of real S&P 500 returns
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# ACF of returns
plot_acf(returns.values, lags=30, ax=axes[0, 0], color=COLORS['blue'],
         vlines_kwargs={'color': COLORS['blue']}, alpha=0.05)
axes[0, 0].set_title('ACF of Returns (should be ~0)', fontweight='bold')

# ACF of squared returns
plot_acf(returns.values**2, lags=30, ax=axes[0, 1], color=COLORS['red'],
         vlines_kwargs={'color': COLORS['red']}, alpha=0.05)
axes[0, 1].set_title('ACF of Squared Returns (significant!)', fontweight='bold')

# Distribution vs Normal
axes[1, 0].hist(returns.values, bins=100, density=True, color=COLORS['blue'],
                alpha=0.7, edgecolor='white', label='S&P 500')
x = np.linspace(returns.min(), returns.max(), 100)
axes[1, 0].plot(x, stats.norm.pdf(x, float(returns.mean()), float(returns.std())),
                color=COLORS['red'], linewidth=2, label='Normal')
kurtosis_val = float(stats.kurtosis(returns.values) + 3)
axes[1, 0].set_title(f'Distribution: Kurtosis = {kurtosis_val:.2f} (Normal = 3)', fontweight='bold')
axes[1, 0].legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), ncol=2)
axes[1, 0].set_xlim(-10, 10)

# QQ-Plot
stats.probplot(returns.values.flatten(), dist="norm", plot=axes[1, 1])
axes[1, 1].get_lines()[0].set_color(COLORS['blue'])
axes[1, 1].get_lines()[0].set_markersize(3)
axes[1, 1].get_lines()[1].set_color(COLORS['red'])
axes[1, 1].set_title('QQ-Plot: Fat Tails Visible', fontweight='bold')

plt.tight_layout()
plt.show()

print("\nStylized Facts Summary (Real S&P 500 Data):")
print(f"  Mean return: {float(returns.mean()):.4f}%")
print(f"  Std deviation: {float(returns.std()):.4f}%")
print(f"  Skewness: {float(stats.skew(returns.values)):.4f}")
print(f"  Kurtosis: {kurtosis_val:.4f} (Normal = 3)")
print(f"\n  → Fat tails confirmed (kurtosis >> 3)")
print(f"  → Negative skewness (crashes more severe than rallies)")

## 3. Testing for ARCH Effects

**Engle's ARCH-LM Test:**
1. Estimate mean model, get residuals $\hat{\varepsilon}_t$
2. Regress $\hat{\varepsilon}_t^2$ on its lags
3. Test statistic: $LM = T \cdot R^2 \sim \chi^2(q)$

- $H_0$: No ARCH effects
- $H_1$: ARCH effects present

In [None]:
# ARCH-LM test on real S&P 500 data
print("ARCH-LM Test for Heteroskedasticity (S&P 500)")
print("="*50)

# Test with different lag orders
returns_demeaned = returns.values - returns.values.mean()
for q in [5, 10, 20]:
    lm_stat, lm_pvalue, f_stat, f_pvalue = het_arch(returns_demeaned, nlags=q)
    result = "Reject H0 → ARCH present" if lm_pvalue < 0.05 else "Cannot reject H0"
    print(f"  Lags = {q}: LM = {lm_stat:.2f}, p-value = {lm_pvalue:.6f} → {result}")

print("\n⚠️ Strong ARCH effects detected in S&P 500 returns!")
print("   This confirms the need for GARCH modeling.")

## 4. The ARCH(q) Model

**Engle (1982) - Nobel Prize 2003**

$$\varepsilon_t = \sigma_t z_t, \quad z_t \sim \text{i.i.d.}(0, 1)$$
$$\sigma_t^2 = \omega + \alpha_1 \varepsilon_{t-1}^2 + \cdots + \alpha_q \varepsilon_{t-q}^2$$

**Constraints:**
- $\omega > 0$
- $\alpha_i \geq 0$
- $\sum \alpha_i < 1$ (stationarity)

In [None]:
# Estimate ARCH(5) model on real S&P 500 data
print("ARCH(5) Model Estimation - S&P 500")
print("="*50)

model_arch = arch_model(returns.values, vol='ARCH', p=5, dist='normal')
res_arch = model_arch.fit(disp='off')

print(res_arch.summary())

## 5. The GARCH(p,q) Model

**Bollerslev (1986)** - Adds lagged conditional variance for persistence:

$$\sigma_t^2 = \omega + \sum_{i=1}^{q} \alpha_i \varepsilon_{t-i}^2 + \sum_{j=1}^{p} \beta_j \sigma_{t-j}^2$$

**GARCH(1,1) - The workhorse model:**
$$\sigma_t^2 = \omega + \alpha \varepsilon_{t-1}^2 + \beta \sigma_{t-1}^2$$

- $\alpha$ = reaction to news (ARCH effect)
- $\beta$ = persistence (GARCH effect)
- $\alpha + \beta$ = total persistence

In [None]:
# Estimate GARCH(1,1) model on real S&P 500 data
print("GARCH(1,1) Model Estimation - S&P 500")
print("="*50)

model_garch = arch_model(returns.values, vol='Garch', p=1, q=1, dist='normal')
res_garch = model_garch.fit(disp='off')

print(res_garch.summary())

In [None]:
# Interpret GARCH(1,1) parameters for S&P 500
omega_est = res_garch.params['omega']
alpha_est = res_garch.params['alpha[1]']
beta_est = res_garch.params['beta[1]']

print("GARCH(1,1) Parameter Interpretation - S&P 500")
print("="*50)
print(f"\nEstimated Parameters:")
print(f"  ω (omega) = {omega_est:.6f}")
print(f"  α (alpha) = {alpha_est:.4f}  ← News reaction (ARCH effect)")
print(f"  β (beta)  = {beta_est:.4f}  ← Persistence (GARCH effect)")

print(f"\nDerived Quantities:")
persistence = alpha_est + beta_est
print(f"  α + β = {persistence:.4f}  ← Total persistence (very high!)")

if persistence < 1:
    uncond_var = omega_est / (1 - persistence)
    uncond_vol = np.sqrt(uncond_var)
    half_life = np.log(0.5) / np.log(persistence)
    print(f"  Unconditional variance: {uncond_var:.6f}")
    print(f"  Unconditional volatility: {uncond_vol:.2f}% daily")
    print(f"  Annualized volatility: {uncond_vol * np.sqrt(252):.2f}%")
    print(f"  Half-life: {half_life:.1f} trading days")
    print(f"\n  → Shocks take ~{half_life:.0f} days to decay by half")
else:
    print("  ⚠️ Model is IGARCH (α + β ≥ 1), unconditional variance undefined")

In [None]:
# Visualize GARCH(1,1) conditional volatility for S&P 500
fig, axes = plt.subplots(2, 1, figsize=(14, 8), sharex=True)

# Returns with conditional volatility bands
cond_vol = res_garch.conditional_volatility
axes[0].plot(returns.index, returns.values, color=COLORS['blue'], linewidth=0.5, alpha=0.7, label='Returns')
axes[0].plot(returns.index, 2*cond_vol, color=COLORS['red'], linewidth=0.8, label='±2σ bands')
axes[0].plot(returns.index, -2*cond_vol, color=COLORS['red'], linewidth=0.8)
axes[0].axhline(y=0, color='black', linewidth=0.5)
axes[0].set_ylabel('Return (%)')
axes[0].set_title('S&P 500 Returns with GARCH(1,1) Volatility Bands', fontweight='bold')
axes[0].legend(loc='upper center', bbox_to_anchor=(0.5, -0.05), ncol=2)

# Conditional volatility
axes[1].plot(returns.index, cond_vol, color=COLORS['red'], linewidth=0.8)
axes[1].fill_between(returns.index, 0, cond_vol, color=COLORS['red'], alpha=0.3)
axes[1].set_ylabel('Volatility (%)')
axes[1].set_xlabel('Date')
axes[1].set_title('GARCH(1,1) Conditional Volatility', fontweight='bold')

# Mark crisis periods
for start, end, label in crisis_periods:
    axes[1].axvspan(pd.Timestamp(start), pd.Timestamp(end), alpha=0.15, color='gray')

plt.tight_layout()
plt.show()

print(f"\nVolatility Statistics:")
print(f"  Mean conditional volatility: {cond_vol.mean():.2f}%")
print(f"  Max volatility: {cond_vol.max():.2f}% (during crisis)")
print(f"  Min volatility: {cond_vol.min():.2f}% (calm periods)")

## 6. Alternative Distributions

Normal distribution underestimates tail risk. Alternatives:
- **Student-t**: Fat tails, estimated degrees of freedom $\nu$
- **GED**: Generalized Error Distribution
- **Skewed Student-t**: Asymmetry + fat tails

In [None]:
# Compare different distributions for S&P 500
print("GARCH(1,1) with Different Distributions - S&P 500")
print("="*60)

distributions = ['normal', 't', 'skewt', 'ged']
results = {}

for dist in distributions:
    try:
        model = arch_model(returns.values, vol='Garch', p=1, q=1, dist=dist)
        res = model.fit(disp='off')
        results[dist] = res
        print(f"{dist:>8}: AIC = {res.aic:.2f}, BIC = {res.bic:.2f}, LogLik = {res.loglikelihood:.2f}")
    except:
        print(f"{dist:>8}: Failed to converge")

# Best model
best_dist = min(results, key=lambda x: results[x].aic)
print(f"\nBest distribution by AIC: {best_dist}")
print("→ Student-t captures the fat tails observed in real financial data!")

In [None]:
# Student-t GARCH details
if 't' in results:
    print("GARCH(1,1) with Student-t Distribution")
    print("="*50)
    print(results['t'].summary())

## 7. Asymmetric GARCH Models

### Leverage Effect
Negative returns increase volatility MORE than positive returns of the same magnitude.

Standard GARCH uses $\varepsilon_{t-1}^2$ → symmetric response

### Solutions:
- **EGARCH** (Nelson, 1991)
- **GJR-GARCH** (Glosten, Jagannathan, Runkle, 1993)
- **TGARCH** (Zakoian, 1994)

In [None]:
# Demonstrate leverage effect with real S&P 500 data
print("Leverage Effect Analysis - S&P 500")
print("="*50)

# Calculate rolling volatility
rolling_vol = returns.rolling(window=5).std()
lagged_returns = returns.shift(1)

# Create DataFrame for analysis
leverage_df = pd.DataFrame({
    'return': returns,
    'lagged_return': lagged_returns,
    'next_vol': rolling_vol.shift(-1)
}).dropna()

# Separate positive and negative returns
neg_mask = leverage_df['lagged_return'] < 0
pos_mask = leverage_df['lagged_return'] >= 0

vol_after_neg = leverage_df.loc[neg_mask, 'next_vol'].mean()
vol_after_pos = leverage_df.loc[pos_mask, 'next_vol'].mean()

print(f"Mean volatility after negative returns: {vol_after_neg:.3f}%")
print(f"Mean volatility after positive returns: {vol_after_pos:.3f}%")
print(f"Ratio: {vol_after_neg/vol_after_pos:.2f}x higher after negative shocks!")
print("\n→ This confirms the leverage effect in real market data!")

In [None]:
# Visualize leverage effect in S&P 500
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Scatter plot: returns vs next-period volatility
colors_scatter = [COLORS['red'] if r < 0 else COLORS['blue'] for r in leverage_df['lagged_return']]
axes[0].scatter(leverage_df['lagged_return'], leverage_df['next_vol'], 
                c=colors_scatter, alpha=0.3, s=10)
axes[0].axvline(x=0, color='gray', linestyle='--', linewidth=1)
axes[0].set_xlabel('Return at t (%)')
axes[0].set_ylabel('Volatility at t+1 (%)')
axes[0].set_title('S&P 500: Leverage Effect\nNegative Shocks → Higher Volatility', fontweight='bold')
axes[0].set_xlim(-12, 12)

# Box plot comparison
bp = axes[1].boxplot([leverage_df.loc[neg_mask, 'next_vol'].values, 
                      leverage_df.loc[pos_mask, 'next_vol'].values],
                     labels=['After Negative Return', 'After Positive Return'],
                     patch_artist=True)
bp['boxes'][0].set_facecolor(COLORS['red'])
bp['boxes'][0].set_alpha(0.6)
bp['boxes'][1].set_facecolor(COLORS['blue'])
bp['boxes'][1].set_alpha(0.6)
axes[1].set_ylabel('Next Period Volatility (%)')
axes[1].set_title('Distribution of Volatility by Return Sign', fontweight='bold')

plt.tight_layout()
plt.show()

In [None]:
# Estimate asymmetric models on S&P 500
print("Asymmetric GARCH Models Comparison - S&P 500")
print("="*60)

# Standard GARCH
model_garch_t = arch_model(returns.values, vol='Garch', p=1, q=1, dist='t')
res_garch_t = model_garch_t.fit(disp='off')

# GJR-GARCH (o=1 adds asymmetry)
model_gjr = arch_model(returns.values, vol='Garch', p=1, o=1, q=1, dist='t')
res_gjr = model_gjr.fit(disp='off')

# EGARCH (o=1 is needed for asymmetry term gamma)
model_egarch = arch_model(returns.values, vol='EGARCH', p=1, o=1, q=1, dist='t')
res_egarch = model_egarch.fit(disp='off')

print(f"{'Model':<15} {'AIC':>10} {'BIC':>10} {'LogLik':>12}")
print("-"*50)
print(f"{'GARCH(1,1)':<15} {res_garch_t.aic:>10.2f} {res_garch_t.bic:>10.2f} {res_garch_t.loglikelihood:>12.2f}")
print(f"{'GJR-GARCH':<15} {res_gjr.aic:>10.2f} {res_gjr.bic:>10.2f} {res_gjr.loglikelihood:>12.2f}")
print(f"{'EGARCH':<15} {res_egarch.aic:>10.2f} {res_egarch.bic:>10.2f} {res_egarch.loglikelihood:>12.2f}")

# Best model
models_comparison = {'GARCH': res_garch_t.aic, 'GJR-GARCH': res_gjr.aic, 'EGARCH': res_egarch.aic}
best_model = min(models_comparison, key=models_comparison.get)
print(f"\n→ Best model by AIC: {best_model}")
print("→ Asymmetric models fit better because they capture the leverage effect!")

In [None]:
# GJR-GARCH details for S&P 500
print("GJR-GARCH(1,1,1) Estimation Results - S&P 500")
print("="*50)
print(res_gjr.summary())

# Interpret leverage
gamma_gjr = res_gjr.params['gamma[1]']
alpha_gjr = res_gjr.params['alpha[1]']
print(f"\nLeverage Effect Interpretation:")
print(f"  α = {alpha_gjr:.4f} (positive shock impact)")
print(f"  α + γ = {alpha_gjr + gamma_gjr:.4f} (negative shock impact)")
print(f"  Ratio: {(alpha_gjr + gamma_gjr)/alpha_gjr:.2f}x higher impact for negative shocks")

In [None]:
# EGARCH details for S&P 500
print("EGARCH(1,1) Estimation Results - S&P 500")
print("="*50)
print(res_egarch.summary())

gamma_egarch = res_egarch.params['gamma[1]']
print("\nInterpretation of EGARCH parameters:")
print(f"  gamma[1] = {gamma_egarch:.4f}")
if gamma_egarch < 0:
    print("  → Negative gamma confirms leverage effect!")
    print("  → Negative shocks increase volatility more than positive shocks")
    print("  → This is typical for equity markets (bad news has larger impact)")

## 8. News Impact Curve

The **News Impact Curve** shows how today's volatility $\sigma_{t+1}^2$ responds to yesterday's shock $\varepsilon_t$, holding $\sigma_t^2$ constant.

In [None]:
# Plot News Impact Curves
epsilon_range = np.linspace(-0.04, 0.04, 200)
sigma2_prev = 0.0004  # Fixed previous variance

# GARCH (symmetric)
omega_g = 0.0001
alpha_g = 0.10
beta_g = 0.85
sigma2_garch_curve = omega_g + alpha_g * epsilon_range**2 + beta_g * sigma2_prev

# GJR-GARCH
omega_gjr = 0.0001
alpha_gjr = 0.05
gamma_gjr = 0.10
beta_gjr = 0.85
indicator = (epsilon_range < 0).astype(float)
sigma2_gjr_curve = omega_gjr + alpha_gjr * epsilon_range**2 + gamma_gjr * epsilon_range**2 * indicator + beta_gjr * sigma2_prev

fig, ax = plt.subplots(figsize=(12, 6))

ax.plot(epsilon_range * 100, np.sqrt(sigma2_garch_curve) * 100,
        color=COLORS['blue'], linewidth=2, label='GARCH (Symmetric)')
ax.plot(epsilon_range * 100, np.sqrt(sigma2_gjr_curve) * 100,
        color=COLORS['red'], linewidth=2, label='GJR-GARCH (Asymmetric)')

ax.axvline(x=0, color='gray', linestyle=':', linewidth=1)
ax.set_xlabel('Shock εₜ (%)', fontsize=12)
ax.set_ylabel('Conditional Volatility σₜ₊₁ (%)', fontsize=12)
ax.set_title('News Impact Curve: GARCH vs GJR-GARCH', fontweight='bold', fontsize=14)
ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.12), ncol=2)

plt.tight_layout()
plt.show()

print("The asymmetric curve shows higher volatility for negative shocks.")

## 9. Model Diagnostics

After fitting a GARCH model, check:
1. **Standardized residuals** $\hat{z}_t = \hat{\varepsilon}_t / \hat{\sigma}_t$ should be i.i.d.
2. **No remaining ARCH effects** in $\hat{z}_t^2$
3. **No autocorrelation** in $\hat{z}_t$

In [None]:
# Diagnostics for GJR-GARCH on S&P 500
std_resid = res_gjr.std_resid

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Time series of standardized residuals
axes[0, 0].plot(returns.index, std_resid, color=COLORS['blue'], linewidth=0.5, alpha=0.8)
axes[0, 0].axhline(y=0, color='black', linewidth=0.5)
axes[0, 0].axhline(y=2, color=COLORS['red'], linestyle='--', linewidth=0.8, alpha=0.5)
axes[0, 0].axhline(y=-2, color=COLORS['red'], linestyle='--', linewidth=0.8, alpha=0.5)
axes[0, 0].set_title('Standardized Residuals - S&P 500', fontweight='bold')
axes[0, 0].set_ylabel('zₜ')

# ACF of squared standardized residuals
plot_acf(std_resid**2, lags=20, ax=axes[0, 1], color=COLORS['blue'],
         vlines_kwargs={'color': COLORS['blue']}, alpha=0.05)
axes[0, 1].set_title('ACF of z²ₜ (should be ≈ 0 if model adequate)', fontweight='bold')

# Histogram
axes[1, 0].hist(std_resid, bins=60, density=True, color=COLORS['blue'],
                alpha=0.7, edgecolor='white')
x = np.linspace(-6, 6, 100)
nu_est = res_gjr.params['nu']
axes[1, 0].plot(x, stats.norm.pdf(x), color=COLORS['red'], linewidth=2, label='N(0,1)')
axes[1, 0].plot(x, stats.t.pdf(x, df=nu_est), color=COLORS['green'], linewidth=2, label=f't({nu_est:.1f})')
axes[1, 0].set_title('Distribution of Standardized Residuals', fontweight='bold')
axes[1, 0].set_xlim(-6, 6)
axes[1, 0].legend(loc='upper center', bbox_to_anchor=(0.5, -0.12), ncol=2)

# QQ-plot
stats.probplot(std_resid, dist="norm", plot=axes[1, 1])
axes[1, 1].get_lines()[0].set_color(COLORS['blue'])
axes[1, 1].get_lines()[0].set_markersize(3)
axes[1, 1].get_lines()[1].set_color(COLORS['red'])
axes[1, 1].set_title('QQ-Plot of Standardized Residuals', fontweight='bold')

plt.tight_layout()
plt.show()

In [None]:
# Formal diagnostic tests for GJR-GARCH on S&P 500
print("Diagnostic Tests for GJR-GARCH Model - S&P 500")
print("="*50)

# Ljung-Box on standardized residuals
lb_z = acorr_ljungbox(std_resid, lags=10, return_df=True)
print("\n1. Ljung-Box Test on zₜ (no autocorrelation in mean):")
print(f"   Lag 10: Q = {lb_z['lb_stat'].iloc[-1]:.2f}, p-value = {lb_z['lb_pvalue'].iloc[-1]:.4f}")
print(f"   Result: {'✓ No autocorrelation' if lb_z['lb_pvalue'].iloc[-1] > 0.05 else '✗ Autocorrelation present'}")

# Ljung-Box on squared standardized residuals
lb_z2 = acorr_ljungbox(std_resid**2, lags=10, return_df=True)
print("\n2. Ljung-Box Test on z²ₜ (no remaining ARCH effects):")
print(f"   Lag 10: Q = {lb_z2['lb_stat'].iloc[-1]:.2f}, p-value = {lb_z2['lb_pvalue'].iloc[-1]:.4f}")
print(f"   Result: {'✓ No ARCH effects' if lb_z2['lb_pvalue'].iloc[-1] > 0.05 else '✗ ARCH effects remain'}")

# ARCH-LM test on standardized residuals
lm_stat, lm_pval, _, _ = het_arch(std_resid, nlags=5)
print("\n3. ARCH-LM Test on Standardized Residuals:")
print(f"   LM = {lm_stat:.2f}, p-value = {lm_pval:.4f}")
print(f"   Result: {'✓ No remaining ARCH' if lm_pval > 0.05 else '✗ ARCH effects remain'}")

if lm_pval > 0.05:
    print("\n→ Model diagnostics PASS - GJR-GARCH adequately captures volatility dynamics!")

## 10. Volatility Forecasting

**One-step forecast:**
$$\hat{\sigma}_{T+1}^2 = \omega + \alpha \varepsilon_T^2 + \beta \sigma_T^2$$

**Multi-step forecast:** Converges to unconditional variance
$$E_T[\sigma_{T+h}^2] = \bar{\sigma}^2 + (\alpha + \beta)^{h-1} (\sigma_{T+1}^2 - \bar{\sigma}^2)$$

In [None]:
# Volatility forecasting for S&P 500
horizon = 20

# Generate forecasts from GJR-GARCH
forecasts = res_gjr.forecast(horizon=horizon)
vol_forecast = np.sqrt(forecasts.variance.values[-1, :])

# Get historical volatility
hist_vol = res_gjr.conditional_volatility

# Unconditional volatility
params = res_gjr.params
omega_hat = params['omega']
alpha_hat = params['alpha[1]']
gamma_hat = params.get('gamma[1]', 0)  # Handle case where gamma might not exist
beta_hat = params['beta[1]']
persistence = alpha_hat + gamma_hat/2 + beta_hat
uncond_vol = np.sqrt(omega_hat / (1 - persistence)) if persistence < 1 else np.nan

# Plot
fig, ax = plt.subplots(figsize=(14, 5))

# Historical (last 100 periods)
ax.plot(range(100), hist_vol[-100:], color=COLORS['blue'], linewidth=1, label='Historical Volatility')

# Forecast - handle both numpy array and pandas Series
last_hist_vol = hist_vol[-1] if isinstance(hist_vol, np.ndarray) else hist_vol.iloc[-1]
forecast_x = range(99, 99 + horizon + 1)
forecast_values = np.concatenate([[last_hist_vol], vol_forecast])
ax.plot(forecast_x, forecast_values, color=COLORS['red'], linewidth=2, linestyle='--', label='Forecast')

# Unconditional level
if not np.isnan(uncond_vol):
    ax.axhline(y=uncond_vol, color=COLORS['green'], linestyle=':',
               linewidth=1.5, label=f'Unconditional: {uncond_vol:.2f}%')

ax.axvline(x=99, color='black', linestyle='-', alpha=0.3)
ax.set_xlabel('Days')
ax.set_ylabel('Volatility (%)')
ax.set_title('S&P 500: GJR-GARCH Volatility Forecast (20 days ahead)', fontweight='bold')
ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.12), ncol=3)

plt.tight_layout()
plt.show()

print(f"\nForecast Summary:")
print(f"  Last observed date: {returns.index[-1].date()}")
print(f"  1-day ahead forecast: {vol_forecast[0]:.3f}%")
print(f"  5-day ahead forecast: {vol_forecast[4]:.3f}%")
print(f"  20-day ahead forecast: {vol_forecast[-1]:.3f}%")
print(f"  Unconditional volatility: {uncond_vol:.3f}%")
print(f"\n  → Forecast converges to unconditional level as horizon increases")

## 11. Value at Risk (VaR)

**VaR at level $\alpha$**: Maximum loss that will not be exceeded with probability $1-\alpha$

$$\text{VaR}_\alpha = -\mu_{t+1} + z_\alpha \cdot \sigma_{t+1}$$

For normal: $z_{0.05} = 1.645$, $z_{0.01} = 2.326$

For Student-t: Use t-quantiles (fatter tails = higher VaR)

In [None]:
# Calculate VaR for S&P 500 portfolio
print("Value at Risk Calculation - S&P 500")
print("="*50)

portfolio_value = 1_000_000  # EUR
sigma_1 = vol_forecast[0] / 100  # 1-step ahead volatility (as decimal)

# Normal distribution quantiles
z_95 = stats.norm.ppf(0.95)
z_99 = stats.norm.ppf(0.99)

VaR_95_normal = z_95 * sigma_1 * portfolio_value
VaR_99_normal = z_99 * sigma_1 * portfolio_value

print(f"\nPortfolio value: €{portfolio_value:,.0f}")
print(f"1-day volatility forecast: {sigma_1*100:.3f}%")

print(f"\nNormal Distribution VaR:")
print(f"  VaR 95% (1 day): €{VaR_95_normal:,.0f}")
print(f"  VaR 99% (1 day): €{VaR_99_normal:,.0f}")

# Student-t distribution (estimated df from model)
nu = res_gjr.params.get('nu', 8)  # Default to 8 if not found
# Adjust quantile for unit variance
t_95 = stats.t.ppf(0.95, df=nu) * np.sqrt((nu-2)/nu)
t_99 = stats.t.ppf(0.99, df=nu) * np.sqrt((nu-2)/nu)

VaR_95_t = t_95 * sigma_1 * portfolio_value
VaR_99_t = t_99 * sigma_1 * portfolio_value

print(f"\nStudent-t Distribution (ν = {nu:.1f}):")
print(f"  VaR 95% (1 day): €{VaR_95_t:,.0f}")
print(f"  VaR 99% (1 day): €{VaR_99_t:,.0f}")

# 10-day VaR (scaling rule)
print(f"\n10-day VaR (√10 scaling rule):")
print(f"  Normal 99%: €{VaR_99_normal * np.sqrt(10):,.0f}")
print(f"  Student-t 99%: €{VaR_99_t * np.sqrt(10):,.0f}")

# Comparison
print(f"\n→ Student-t VaR is {(VaR_99_t/VaR_99_normal - 1)*100:.1f}% higher than Normal VaR")
print("→ This reflects the fat tails in real financial returns!")

## Summary

### Key Takeaways

1. **ARIMA assumes constant variance** - not realistic for financial data

2. **GARCH(1,1)** is the workhorse model:
   - $\sigma_t^2 = \omega + \alpha \varepsilon_{t-1}^2 + \beta \sigma_{t-1}^2$
   - α = news reaction, β = persistence
   - Stationarity: α + β < 1

3. **Leverage effect** requires asymmetric models:
   - EGARCH: log-specification, no positivity constraints
   - GJR-GARCH: indicator function for negative shocks

4. **Student-t distribution** captures fat tails better than normal

5. **Applications:**
   - VaR and risk management
   - Option pricing
   - Portfolio optimization

### Practical Workflow
1. Test for ARCH effects (ARCH-LM test)
2. Estimate GARCH(1,1) with Student-t
3. Check for asymmetry (GJR/EGARCH)
4. Diagnostic checks on standardized residuals
5. Forecast volatility, calculate VaR