Forecasting the 1-Day 99% Value-at-Risk (VaR) for Equity Returns using Quantile Regression.

Goal

Predict the loss quantile (99% VaR) of daily returns for S&P 500 using historical market features.

Business Context:

Banks must estimate daily VaR to measure potential losses under normal market conditions. A robust ML estimator helps automate and improve traditional parametric models.

Data

For this midterm project, we will using the S&P500 from yahoo finance (^GSPC). From 2000-01-01 to 2024-12-31. Using the closing price Close and eventualy Volume.

Problem issued

We estimate the 1-day 99% Value-at-Risk as the conditional 0.01 quantile of future returns. The task is framed as supervised regression using quantile regression (LightGBM) on engineered time-series features.

EDA & Feature Engineering

Daily returns exhibit volatility clustering, heavy tails, negative skewness, and excess kurtosis. Features include lagged returns, rolling volatility (5/20-day), rolling skewness/kurtosis (20-day), EMA ratios, and drawdown. The target is the empirical VaR: a rolling 1% quantile computed over a 250-day window.

Modeling

We evaluate:

A naive persistence baseline
Linear Regression
LightGBM Quantile Regression (α = 0.01)

The LightGBM model achieves the lowest pinball loss and is exported as the final model (models/var_model.pkl).
A FastAPI service exposes a /predict_var endpoint for inference.

Architecture

var-forecasting-99-sp500/
├─ README.md
├─ data/
│  ├─ raw/
│  └─ processed/
├─ notebook.ipynb
├─ src/
│  ├─ features.py
│  ├─ metrics.py
│  ├─ train.py
│  └─ predict.py
├─ models/
│  ├─ var_model.pkl
│  └─ feature_spec.json
├─ requirements.txt
└─ Dockerfile

How to run (Local, no docker)

# 1) Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate

# 2) Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

# 3) Build the dataset and train the model
python -m src.train

# 4) Start the API service
uvicorn src.predict:app --host 0.0.0.0 --port 8000

Test API

curl -X POST "http://localhost:8000/predict_var" \
  -H "Content-Type: application/json" \
  -d '{
    "ret_lag1": -0.002,
    "ret_lag5": 0.001,
    "vol_5": 0.012,
    "vol_20": 0.010,
    "skew_20": -0.5,
    "kurt_20": 3.1,
    "ema_ratio": -0.01,
    "drawdown": -0.12
  }'

Build & Run Docker

docker build -t var-forecast .
docker run -p 8000:8000 var-forecast

Conclusion

This project implemented an end-to-end machine-learning pipeline for forecasting the 1-day 99% Value-at-Risk (VaR) of S&P500 daily returns using quantile regression. Despite extensive feature engineering (rolling volatility, higher-moment estimators, EMA trend indicators, drawdown measures) and the use of a non-linear model (LightGBM quantile), the results are structurally limited.

Model Performance Summary

Pinball loss (α = 0.01): 0.00033
MSE: 0.00122
Coverage on the test period: near 0% (no VaR violations)

The LightGBM model converges, but it fails to capture the conditional structure of the extreme left tail. Instead, it collapses toward a highly conservative constant VaR, approximately equal to the worst quantile observed during the training period. This behaviour minimizes pinball loss but yields an over-pessimistic VaR estimate that is never violated in the test window.

Interpretation

This outcome is expected in quantitative finance when attempting to model extreme quantiles (1%) using daily returns and simple historical features. Financial return series exhibit:

weak autocorrelation,
strong volatility clustering,
regime-dependent tail behaviour,
highly non-stationary dynamics.
As a result, standard machine-learning models without specialised volatility or regime-switching inputs tend to default to conservative constant predictors for extreme quantiles.

What This Demonstrates

Although the predictive performance is not sufficient for real-world risk management, the project successfully delivers a complete VaR forecasting pipeline, including:

data acquisition
EDA
feature engineering
quantile regression
backtesting and evaluation
model export
API deployment with FastAPI
containerisation with Docker

The results highlight an important practical insight: estimating tail risk requires richer features (e.g., realised volatility, high-low range estimators, VIX, GARCH-type signals) or hybrid ML-volatility models.

The current model therefore serves as a technical demonstration, illustrating the challenges of ML-based VaR forecasting and the limitations of applying generic algorithms to extreme-risk estimation without domain-specific inputs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Forecasting the 1-Day 99% Value-at-Risk (VaR) for Equity Returns using Quantile Regression.

Goal

Business Context:

Data

Problem issued

EDA & Feature Engineering

Modeling

Architecture

How to run (Local, no docker)

Test API

Build & Run Docker

Conclusion

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
models		models
src		src
Dockerfile		Dockerfile
README.md		README.md
notebook.ipynb		notebook.ipynb
requirements.txt		requirements.txt

Julian516/midterm-project

Folders and files

Latest commit

History

Repository files navigation

Forecasting the 1-Day 99% Value-at-Risk (VaR) for Equity Returns using Quantile Regression.

Goal

Business Context:

Data

Problem issued

EDA & Feature Engineering

Modeling

Architecture

How to run (Local, no docker)

Test API

Build & Run Docker

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages