<a href="https://colab.research.google.com/github/OpusArtisbyRawlz/spy-risk-volatility-model/blob/main/notebooks/spy_volatility_risk.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SPY Volatility & Risk Modelling
## Project 2 - Quant Research Portfolio

Goal:
Model and forecast short-horizon risk metrics (realised volatility, volatility regimes, and drawdown probability) using walk-forward validation and baseline-first evaluation.

# Project 2 — SPY Volatility & Risk Modeling (Walk-Forward)

## Hypothesis
Volatility clusters: recent realized volatility contains information about future realized volatility.
Therefore, forecasting next 5-day realized volatility should beat a naïve baseline in walk-forward evaluation.

## Target
Next 5-day realized volatility (RV_5d_fwd), computed from returns t+1...t+5.

## Evaluation
Walk-forward validation with a strong baseline.
Primary metric: MAE (and RMSE optional).


In [1]:
import numpy as np
import pandas as pd

# If you already have `spy` loaded from earlier work, you can skip the download part.
import yfinance as yf

spy = yf.download("SPY", start="2010-01-01", progress=False)
spy = spy[["Close"]].copy()

# --- Returns ---
spy["ret_1d"] = spy["Close"].pct_change()

# --- FUTURE 5-day realized volatility (target) ---
# Use returns from t+1..t+5 (forward-looking). This avoids leakage.
# We compute forward RV by shifting returns up by 1 and taking a rolling std over 5 days.
spy["RV5_fwd"] = spy["ret_1d"].shift(-1).rolling(5).std()

# Optional: annualize (common in practice). Comment out if you want raw 5-day vol.
spy["RV5_fwd_ann"] = spy["RV5_fwd"] * np.sqrt(252)

# --- TRAILING realized volatility (baseline feature) ---
spy["RV20_trail"] = spy["ret_1d"].rolling(20).std() * np.sqrt(252)
spy["RV5_trail"]  = spy["ret_1d"].rolling(5).std()  * np.sqrt(252)

# Drop rows with NaNs created by rolling windows and shifting
spy = spy.dropna().copy()

# Choose target
y = spy["RV5_fwd_ann"]  # or use "RV5_fwd" if you want non-annualized

# Baseline prediction: predict next-week vol as current trailing 20d vol
baseline_pred = spy["RV20_trail"]

print("Rows:", len(spy))
print("Target mean (ann RV):", y.mean())
print("Baseline mean (RV20):", baseline_pred.mean())



  spy = yf.download("SPY", start="2010-01-01", progress=False)


Rows: 4009
Target mean (ann RV): 0.13759933981826042
Baseline mean (RV20): 0.14690625901547552
