Institutional-grade order flow backtesting, ML pipeline, and portfolio management platform.
Built for quant research interviews. Not a toy — a system with real execution modeling, walk-forward validation, and portfolio-level risk analysis.
Most backtesters are glorified spreadsheets. This one simulates what happens when you trade:
- Slippage: half-spread + volatility impact + size impact (√ market impact model)
- Latency: 1-bar signal delay — your signal fires, but execution happens next bar
- Partial fills: 85% fill probability per bar, partial fills on the rest
- Fees: configurable in basis points, tracked per-trade
- Position sizing: fixed fraction, volatility-targeted (15% ann. vol), or quarter-Kelly
The ML pipeline uses walk-forward validation (not k-fold — that leaks future data in time series) and runs feature leakage detection on every training run.
Frontend (React + Vite) Backend (Python + FastAPI)
┌─────────────────────┐ ┌──────────────────────────┐
│ Dashboard │ │ /backtest │
│ • Single / Portfolio│◄──── api.js ─────►│ /portfolio/backtest │
│ • Data source select│ (retry+timeout) │ /ml/train │
│ • Position sizing │ │ /ml/insights │
│ │ │ /ws/logs (WebSocket) │
│ Results │ ├──────────────────────────┤
│ • Equity + Drawdown │ │ Engine │
│ • Trade log (paged) │ │ ├── strategies (5) │
│ • Portfolio corr. │ │ ├── execution model │
│ • Cost analysis │ │ ├── position sizing │
│ │ │ ├── metrics (20+) │
│ ML / Alpha │ │ └── portfolio combiner │
│ • Walk-forward table│ ├──────────────────────────┤
│ • Leakage check │ │ ML Pipeline │
│ • Model comparison │ │ ├── XGBoost + LR │
│ • SHAP + importance │ │ ├── Walk-forward (5w) │
│ • Signal quality │ │ ├── Leakage detection │
└─────────────────────┘ │ └── SHAP (OOS only) │
├──────────────────────────┤
│ Data Sources │
│ ├── Synthetic (GARCH) │
│ └── CSV (Yahoo, custom) │
└──────────────────────────┘
| Component | Prevention Method |
|---|---|
| Signals | Strategies process bars sequentially; each bar only sees past data |
| Execution | 1-bar latency delay between signal and fill |
| ML Labels | Future returns used for labels, but train/test split is strictly temporal |
| ML Split | 70% train → 10% val → 20% test, chronological order, no shuffling |
| Walk-Forward | Expanding window: each fold trains on all prior data only |
| SHAP | Computed on out-of-sample test set only |
| Features | All derived from rolling windows of past data |
# Backend
cd orderflow-backtester-v3
pip install -r backend/requirements.txt
uvicorn backend.main:app --reload --port 8000
# Frontend (new terminal)
cd frontend
npm install
npm run dev| Strategy | Signal Logic | Exit Logic |
|---|---|---|
order_flow_imbalance |
Z-score of rolling OFI > ±1.5σ | Mean reversion to ±0.3σ |
queue_exhaustion |
Book-side depletion + intensity spike | Flow reversal or 12-bar timeout |
momentum_burst |
5-bar momentum + volume spike (1.8x) | Trailing stop (vol-adjusted) |
mean_reversion |
Price z-score > ±2σ + tight spread | Z-score crosses ±0.5σ |
composite_alpha |
Weighted ensemble vote of all four | Combined signal threshold ±0.4 |
Fill Price = Mid Price
+ (Spread / 2) ← always pay the spread
+ (Volatility × Price × 0.1) ← vol-proportional impact
+ (Price × 0.0001 × √size_ratio) ← square-root market impact
With 85% full-fill probability. Remaining 15% get 50-95% partial fills.
Training: XGBoost (200 trees, depth 5, 0.05 LR, regularized) Validation: Walk-forward with 4-5 expanding windows Comparison: XGBoost vs Logistic Regression baseline Features: 20 features (12 raw order flow + 8 derived: z-scores, rolling stats, composites) Leakage check: Flags features with |corr| > 0.5 to target
- Equal weight: simple 1/N allocation
- Risk parity: inverse-volatility weighting
- Correlation matrix: computed from equity curve returns
- Diversification ratio: weighted avg vol / portfolio vol
- Warnings: auto-flagged when |corr| > 0.7 between assets
| Endpoint | Method | Description |
|---|---|---|
GET /health |
Health check | |
GET /strategies |
List strategies | |
GET /symbols?source=synthetic |
List symbols by source | |
GET /data-sources |
List data sources | |
POST /backtest |
Single-asset backtest | |
POST /portfolio/backtest |
Multi-asset portfolio | |
POST /ml/train |
Train + evaluate + walk-forward | |
POST /ml/insights |
Feature importance + SHAP | |
WS /ws/logs |
Live log streaming |
Performance: Total Return, Annualized Return, Sharpe, Sortino, Calmar Risk: Max Drawdown, Drawdown Duration, VaR 95%, CVaR 95%, Vol, Skewness, Kurtosis Trade: Win Rate, Avg Win/Loss, Profit Factor, Hold Duration, Max Win/Loss Streaks Cost: Total Fees, Total Slippage (tracked per-trade) ML: OOS Accuracy, AUC-ROC, Precision, Recall, IC, ICIR, Turnover, Signal Decay Portfolio: Diversification Ratio, Correlation Matrix, Per-Asset Breakdown
orderflow-backtester-v3/
├── backend/
│ ├── config.py ← centralized config
│ ├── main.py ← FastAPI app + WebSocket
│ ├── requirements.txt
│ ├── data/
│ │ ├── base.py ← abstract DataSource interface
│ │ ├── generator.py ← synthetic GARCH + order flow
│ │ └── csv_loader.py ← CSV ingestion (Yahoo, custom)
│ ├── engine/
│ │ ├── backtest.py ← event-driven engine
│ │ ├── execution.py ← slippage, latency, partial fills
│ │ ├── metrics.py ← 20+ performance metrics
│ │ ├── portfolio.py ← multi-asset portfolio engine
│ │ ├── position.py ← sizing: fixed, vol-target, Kelly
│ │ └── strategies.py ← 5 strategies incl. ensemble
│ ├── ml/
│ │ ├── features.py ← feature engineering
│ │ ├── pipeline.py ← XGBoost + SHAP + comparison
│ │ └── validation.py ← walk-forward + leakage detection
│ └── routes/
│ ├── backtest.py ← /backtest + /portfolio/backtest
│ └── ml.py ← /ml/train + /ml/insights
└── frontend/
├── index.html
├── package.json
├── vite.config.js
└── src/
├── main.jsx
├── App.jsx ← tabs + keyboard shortcuts
├── api.js ← API service (retry, timeout)
├── Navbar.jsx ← status + clock + tabs
├── Dashboard.jsx ← config + portfolio mode
├── ResultsView.jsx ← metrics + trade log + correlation
├── MLInsights.jsx ← walk-forward + leakage + SHAP
└── hooks/
└── useBackend.js ← connection + clock hooks
Why synthetic data? — Exchange tick data costs $10K+/year. The GARCH(1,1) generator produces realistic vol clustering and order flow correlations. CSV loader supports real data when available.
Why event-driven? — Loop-based backtests allow accidental vectorized operations that leak future data. Bar-by-bar processing with explicit state makes this impossible.
Why XGBoost over LSTM? — For tabular order flow features with 20 columns, gradient-boosted trees consistently outperform sequence models. SHAP provides the interpretability trading desks require. The model comparison proves this empirically on each run.
Why walk-forward over k-fold? — k-fold shuffles time series data, placing future bars in the training set. Walk-forward expanding windows guarantee the model never trains on future data. The overfit ratio per window quantifies model stability.
Why quarter-Kelly? — Full Kelly is theoretically optimal but practically catastrophic — it assumes perfect edge estimation. Quarter-Kelly provides geometric growth with ~75% lower variance of outcome.
| Tutorial Project | This Project |
|---|---|
if price > MA: buy |
Z-score of rolling OFI with adaptive exits |
returns.mean() / returns.std() |
Proper annualized Sharpe from daily returns |
random_state=42, train_test_split() |
Walk-forward validation, no shuffling |
| Mock data in frontend | All data from backend API, error states everywhere |
| Single asset only | Portfolio with correlation + risk parity |
| No execution costs | Slippage + fees + latency + partial fills, tracked per-trade |
Backend: Python 3.12, FastAPI, NumPy, pandas, XGBoost, SHAP, scikit-learn Frontend: React 18, Vite, Canvas charts Protocol: REST + WebSocket
Built as a quantitative research platform for prop trading interviews. Every metric is computed from actual PnL. No mock values. No silent fallbacks.