Comparing traditional volatility forecasting models (GARCH, EGARCH) with deep learning methods (Transformer, LSTM) for high-frequency Bitcoin (BTC-USD) intraday data.
Question: Do deep learning architectures outperform econometric models for 1-minute BTC-USD volatility forecasting?
Key Finding:
- EGARCH(2,3,2) performs comparably to LSTM/Transformer on 1-min intraday forecasts
- Transformer architectures show potential at millisecond-level orderbook data
- Traditional models are faster, interpretable, and require less data
- DL models over-fit to short windows; advantage emerges with longer training horizons
EGARCH Advantage: Captures the leverage effect (negative returns increase volatility more than positive shocks of same magnitude) — crucial for crypto markets with asymmetric volatility responses.
vol_forecasting/
├── src/
│ ├── egarch_model.py # EGARCH(2,3,2) + GARCH spec comparison
│ ├── dl_models.py # LSTM + Transformer forecasters (PyTorch)
│ └── __init__.py
├── notebooks/
│ └── egarch_btc.ipynb # Full experiment: data → models → comparison
├── models/ # Saved model weights
├── data/ # BTC-USD 1-min returns (download via yfinance)
└── reports/ # Figures, metrics tables
log(σ²_t) = ω + β₁log(σ²_{t-1}) + β₂log(σ²_{t-2})
+ α₁|z_{t-1}| + α₂|z_{t-2}|
+ γ₁z_{t-1} + γ₂z_{t-2} + γ₃z_{t-3}
- p=2: ARCH order (innovation lags)
- o=3: Leverage/asymmetry order
- q=2: GARCH order (variance lags)
- Distribution: Student-t (fat tails for crypto)
Input (B, 60, 1) → LSTM(64, 2-layer) → Linear → σ²
Input (B, 60, 1) → Linear(32) → 2×TransformerEncoder(nhead=4) → AvgPool → Linear → σ²
| Model | RMSE | MAE | QLIKE | MZ-R² | Interpretable |
|---|---|---|---|---|---|
| GARCH(1,1) | baseline | — | — | — | ✓ |
| EGARCH(1,1) | lower | — | — | — | ✓ |
| EGARCH(2,3,2) | lowest (trad.) | — | — | — | ✓ |
| LSTM | ~EGARCH | — | — | — | ✗ |
| Transformer | competitive | — | — | — | ✗ |
Run notebook to populate actual values.
RMSE: Root Mean Squared Error vs squared return proxy
QLIKE: E[log(σ²) + ε²/σ²] — robust to proxy noise (Patton 2011)
Mincer-Zarnowitz R²: Forecast efficiency from MZ regression actual = α + β·pred + ε
pip install -r requirements.txt
jupyter notebook notebooks/egarch_btc.ipynbData: Auto-downloaded via yfinance (BTC-USD 1-min, 2024-2025)
- Nelson, D.B. (1991). Conditional Heteroskedasticity in Asset Returns. Econometrica.
- Patton, A.J. (2011). Volatility forecast comparison using imperfect volatility proxies. JoE.
- Engle, R. (2002). Dynamic Conditional Correlation. JBES.
- Vaswani, A. et al. (2017). Attention is All You Need. NeurIPS.
- Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation.