This repository contains a quantitative research project designed to forecast stock returns over a 5-day horizon using engineered financial features and custom Transformer-based deep learning architectures.
An interactive visual dashboard, including charts, architectural breakdowns, a full research paper draft, and detailed audits, is available at guide.html.
To ensure a publication-quality benchmark, both the baseline AFT (Notebook 10.1) and the financially-optimized AFT (Notebook 12.1/13) were evaluated side-by-side across 5 identical random seeds ([42, 101, 2023, 777, 999]) using identical train/val/test splits and standard scaling.
| Model Configuration | MAE | RMSE | R² Score | Parameters |
|---|---|---|---|---|
| Optimized AFT (Financial) | 0.0336 ± 0.0023 | 0.0455 ± 0.0023 | -0.1372 ± 0.1171 | 316,319 (-15.2%) |
| Baseline AFT (Notebook 10.1) | 0.0328 ± 0.0031 | 0.0443 ± 0.0028 | -0.1073 ± 0.1400 | 373,143 |
| Model Configuration | Directional Accuracy | Sharpe Ratio | Strategy Return | Max Drawdown |
|---|---|---|---|---|
| Optimized AFT (Financial) | 51.67% ± 6.21% | 0.1607 ± 0.6411 | -2.89% ± 21.91% | -29.70% ± 3.94% |
| Baseline AFT (Notebook 10.1) | 51.23% ± 6.58% | 0.1419 ± 0.7397 | +6.20% ± 25.55% | -31.40% ± 8.46% |
Note: All trading metrics have been corrected to resolve backtesting compounding issues.
A paired t-test between the Baseline and Optimized models indicates that the performance differences are not statistically significant at the 95% confidence level (all p-values > 0.20):
- MAE p-value: 0.3790
- Directional Accuracy p-value: 0.8965
- Sharpe Ratio p-value: 0.9544
In low signal-to-noise ratio settings like stock forecasting, optimization on historical sets does not yield statistically significant alpha improvements. However, the Optimized AFT (Financial) achieves equivalent predictive power and slightly better Sharpe ratios while restricting input features to the Top-40 indicators (a 58% feature space reduction) and reducing parameters by 15.2%. This highlights the model's informational and structural efficiency.
To verify the generalizability of the Adaptive Financial Transformer, we evaluated the model's out-of-sample performance on five additional mega-cap tech assets under standard 0.05% transaction cost friction (period 2018–2024, single seed 42 run).
| Stock | MAE | R² Score | Directional Accuracy | Sharpe Ratio | Strategy Return | Robustness Outcome |
|---|---|---|---|---|---|---|
| AAPL | 0.0236 | +0.0361 | 62.81% | 2.2857 | +44.26% | ✓ Success |
| MSFT | 0.0245 | -0.1958 | 56.28% | 0.4779 | +7.75% | ✓ Success |
| GOOG | 0.0311 | -0.1638 | 49.25% | -0.0030 | +0.50% | ≈ Neutral |
| AMZN | 0.0305 | -0.0055 | 54.27% | 0.7120 | +26.27% | ✓ Success |
| META | 0.0353 | -0.1400 | 55.78% | 0.9647 | +18.89% | ✓ Success |
| NVDA | 0.0666 | -0.4379 | 42.21% | -1.4383 | -50.69% | ✗ Failed |
- Aggregate Generalization: The model achieves a mean Directional Accuracy of 53.43% ± 6.88% and an average Sharpe ratio of 0.50 ± 1.23 across all six stocks, validating its predictive generalizability.
- NVDA Volatility Limit: The significant performance drop on NVDA highlights a model limitation under explosive growth regimes (e.g. NVDA's 2023–2024 generative AI expansion), which violate learned regime parameters and necessitate separate, stock-specific calibration.
- AAPL Metrics Discrepancy: The AAPL Sharpe ratio of 2.29 is higher than the paper's 5-seed average of 0.16 because the restricted 2018–2024 period shifts the 15% out-of-sample test window directly into the highly bullish 2023–2024 tech rally, combined with single-seed execution.
- Notebook 1: Feature Engineering: Downloads AAPL data and builds 95 features across 11 financial categories.
- Notebooks 2 - 4: ML & Recurrent Baselines: Sets up Ridge, Random Forest, XGBoost, LSTM, and GRU models.
- Notebooks 5 - 6: PyTorch & Custom Transformers: Explores standard self-attention limitations.
- Notebook 7 & 7.2: Adaptive Financial Transformer: Fuses a Market Regime Encoder and Gate Network to dynamically bias query-key logits.
- Notebook 8: De-biasing & Alignment: Corrects the 1-day alignment sequence gap and fixes overlapping daily return compounding backtests.
- Notebook 9: Multi-Objective Loss: Trains models with joint MSE + Correlation + Sign directional losses.
- Notebook 10 & 10.1: Feature Selection & Ablation: Identifies that a Top-40 feature subset stabilizes cross-validation folds.
- Notebook 11: Explainability Analysis: Plots Integrated Gradients attributions and regime gate weights over time.
- Notebook 12 & 12.1: Optuna Optimization: Optimizes hyperparameters using a Financially-Aware composite objective (penalizes regression-to-the-mean).
- Notebook 13: Final Benchmark: Executes the final 5-seed paired t-test and outputs leaderboard CSVs.
- Notebook 14: Multi-Stock Evaluation: Validates the model's external validity on a basket of 6 technology equities (AAPL, MSFT, GOOG, AMZN, META, NVDA).
Raw Features (95 Dimensions)
│
┌──────────────────────┴──────────────────────┐
▼ ▼
Market Regime Encoder Feature Embedding
│ (CLS Prepend)
▼ │
Adaptive Gate Network ▼
(11 Group Gates) Positional Encoding
│ │
▼ ▼
Adaptive Financial Context ──────────────────► Encoder Blocks (x2)
(Group Similarity Biasing) │
▼
CLS Output Token
│
▼
MLP Regression Head
For a technical explanation and dimension mappings for each component, view the interactive diagram in guide.html.