Skip to content

Julian516/midterm-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Forecasting the 1-Day 99% Value-at-Risk (VaR) for Equity Returns using Quantile Regression.

Goal

Predict the loss quantile (99% VaR) of daily returns for S&P 500 using historical market features.

Business Context:

Banks must estimate daily VaR to measure potential losses under normal market conditions. A robust ML estimator helps automate and improve traditional parametric models.

Data

For this midterm project, we will using the S&P500 from yahoo finance (^GSPC). From 2000-01-01 to 2024-12-31. Using the closing price Close and eventualy Volume.

Problem issued

We estimate the 1-day 99% Value-at-Risk as the conditional 0.01 quantile of future returns. The task is framed as supervised regression using quantile regression (LightGBM) on engineered time-series features.

EDA & Feature Engineering

Daily returns exhibit volatility clustering, heavy tails, negative skewness, and excess kurtosis. Features include lagged returns, rolling volatility (5/20-day), rolling skewness/kurtosis (20-day), EMA ratios, and drawdown. The target is the empirical VaR: a rolling 1% quantile computed over a 250-day window.

Modeling

We evaluate:

  • A naive persistence baseline
  • Linear Regression
  • LightGBM Quantile Regression (α = 0.01)

The LightGBM model achieves the lowest pinball loss and is exported as the final model (models/var_model.pkl).
A FastAPI service exposes a /predict_var endpoint for inference.

Architecture

var-forecasting-99-sp500/
├─ README.md
├─ data/
│  ├─ raw/
│  └─ processed/
├─ notebook.ipynb
├─ src/
│  ├─ features.py
│  ├─ metrics.py
│  ├─ train.py
│  └─ predict.py
├─ models/
│  ├─ var_model.pkl
│  └─ feature_spec.json
├─ requirements.txt
└─ Dockerfile

How to run (Local, no docker)

# 1) Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate

# 2) Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

# 3) Build the dataset and train the model
python -m src.train

# 4) Start the API service
uvicorn src.predict:app --host 0.0.0.0 --port 8000

Test API

curl -X POST "http://localhost:8000/predict_var" \
  -H "Content-Type: application/json" \
  -d '{
    "ret_lag1": -0.002,
    "ret_lag5": 0.001,
    "vol_5": 0.012,
    "vol_20": 0.010,
    "skew_20": -0.5,
    "kurt_20": 3.1,
    "ema_ratio": -0.01,
    "drawdown": -0.12
  }'

Build & Run Docker

docker build -t var-forecast .
docker run -p 8000:8000 var-forecast

Conclusion

This project implemented an end-to-end machine-learning pipeline for forecasting the 1-day 99% Value-at-Risk (VaR) of S&P500 daily returns using quantile regression. Despite extensive feature engineering (rolling volatility, higher-moment estimators, EMA trend indicators, drawdown measures) and the use of a non-linear model (LightGBM quantile), the results are structurally limited.

Model Performance Summary

  • Pinball loss (α = 0.01): 0.00033
  • MSE: 0.00122
  • Coverage on the test period: near 0% (no VaR violations)

The LightGBM model converges, but it fails to capture the conditional structure of the extreme left tail. Instead, it collapses toward a highly conservative constant VaR, approximately equal to the worst quantile observed during the training period. This behaviour minimizes pinball loss but yields an over-pessimistic VaR estimate that is never violated in the test window.

Interpretation

This outcome is expected in quantitative finance when attempting to model extreme quantiles (1%) using daily returns and simple historical features. Financial return series exhibit:

  • weak autocorrelation,
  • strong volatility clustering,
  • regime-dependent tail behaviour,
  • highly non-stationary dynamics.
  • As a result, standard machine-learning models without specialised volatility or regime-switching inputs tend to default to conservative constant predictors for extreme quantiles.

What This Demonstrates

Although the predictive performance is not sufficient for real-world risk management, the project successfully delivers a complete VaR forecasting pipeline, including:

  • data acquisition
  • EDA
  • feature engineering
  • quantile regression
  • backtesting and evaluation
  • model export
  • API deployment with FastAPI
  • containerisation with Docker

The results highlight an important practical insight: estimating tail risk requires richer features (e.g., realised volatility, high-low range estimators, VIX, GARCH-type signals) or hybrid ML-volatility models.

The current model therefore serves as a technical demonstration, illustrating the challenges of ML-based VaR forecasting and the limitations of applying generic algorithms to extreme-risk estimation without domain-specific inputs.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages