▲ OrderFlow Backtester v3.0

Institutional-grade order flow backtesting, ML pipeline, and portfolio management platform.

Built for quant research interviews. Not a toy — a system with real execution modeling, walk-forward validation, and portfolio-level risk analysis.

What This Actually Does

Most backtesters are glorified spreadsheets. This one simulates what happens when you trade:

Slippage: half-spread + volatility impact + size impact (√ market impact model)
Latency: 1-bar signal delay — your signal fires, but execution happens next bar
Partial fills: 85% fill probability per bar, partial fills on the rest
Fees: configurable in basis points, tracked per-trade
Position sizing: fixed fraction, volatility-targeted (15% ann. vol), or quarter-Kelly

The ML pipeline uses walk-forward validation (not k-fold — that leaks future data in time series) and runs feature leakage detection on every training run.

Architecture

Frontend (React + Vite)                    Backend (Python + FastAPI)
┌─────────────────────┐                   ┌──────────────────────────┐
│ Dashboard            │                   │ /backtest                │
│  • Single / Portfolio│◄──── api.js ─────►│ /portfolio/backtest      │
│  • Data source select│  (retry+timeout)  │ /ml/train                │
│  • Position sizing   │                   │ /ml/insights             │
│                      │                   │ /ws/logs (WebSocket)     │
│ Results              │                   ├──────────────────────────┤
│  • Equity + Drawdown │                   │ Engine                   │
│  • Trade log (paged) │                   │  ├── strategies (5)      │
│  • Portfolio corr.   │                   │  ├── execution model     │
│  • Cost analysis     │                   │  ├── position sizing     │
│                      │                   │  ├── metrics (20+)       │
│ ML / Alpha           │                   │  └── portfolio combiner  │
│  • Walk-forward table│                   ├──────────────────────────┤
│  • Leakage check     │                   │ ML Pipeline              │
│  • Model comparison  │                   │  ├── XGBoost + LR        │
│  • SHAP + importance │                   │  ├── Walk-forward (5w)   │
│  • Signal quality    │                   │  ├── Leakage detection   │
└─────────────────────┘                   │  └── SHAP (OOS only)     │
                                           ├──────────────────────────┤
                                           │ Data Sources             │
                                           │  ├── Synthetic (GARCH)   │
                                           │  └── CSV (Yahoo, custom) │
                                           └──────────────────────────┘

No Lookahead Bias — Here's How

Component	Prevention Method
Signals	Strategies process bars sequentially; each bar only sees past data
Execution	1-bar latency delay between signal and fill
ML Labels	Future returns used for labels, but train/test split is strictly temporal
ML Split	70% train → 10% val → 20% test, chronological order, no shuffling
Walk-Forward	Expanding window: each fold trains on all prior data only
SHAP	Computed on out-of-sample test set only
Features	All derived from rolling windows of past data

Quick Start

# Backend
cd orderflow-backtester-v3
pip install -r backend/requirements.txt
uvicorn backend.main:app --reload --port 8000

# Frontend (new terminal)
cd frontend
npm install
npm run dev

Open http://localhost:5173

Strategies

Strategy	Signal Logic	Exit Logic
`order_flow_imbalance`	Z-score of rolling OFI > ±1.5σ	Mean reversion to ±0.3σ
`queue_exhaustion`	Book-side depletion + intensity spike	Flow reversal or 12-bar timeout
`momentum_burst`	5-bar momentum + volume spike (1.8x)	Trailing stop (vol-adjusted)
`mean_reversion`	Price z-score > ±2σ + tight spread	Z-score crosses ±0.5σ
`composite_alpha`	Weighted ensemble vote of all four	Combined signal threshold ±0.4

Execution Model

Fill Price = Mid Price
           + (Spread / 2)                    ← always pay the spread
           + (Volatility × Price × 0.1)      ← vol-proportional impact
           + (Price × 0.0001 × √size_ratio)  ← square-root market impact

With 85% full-fill probability. Remaining 15% get 50-95% partial fills.

ML Pipeline

Training: XGBoost (200 trees, depth 5, 0.05 LR, regularized) Validation: Walk-forward with 4-5 expanding windows Comparison: XGBoost vs Logistic Regression baseline Features: 20 features (12 raw order flow + 8 derived: z-scores, rolling stats, composites) Leakage check: Flags features with |corr| > 0.5 to target

Portfolio System

Equal weight: simple 1/N allocation
Risk parity: inverse-volatility weighting
Correlation matrix: computed from equity curve returns
Diversification ratio: weighted avg vol / portfolio vol
Warnings: auto-flagged when |corr| > 0.7 between assets

API Reference

Endpoint	Method	Description
`GET /health`	Health check
`GET /strategies`	List strategies
`GET /symbols?source=synthetic`	List symbols by source
`GET /data-sources`	List data sources
`POST /backtest`	Single-asset backtest
`POST /portfolio/backtest`	Multi-asset portfolio
`POST /ml/train`	Train + evaluate + walk-forward
`POST /ml/insights`	Feature importance + SHAP
`WS /ws/logs`	Live log streaming

Metrics (Computed, Not Mocked)

Performance: Total Return, Annualized Return, Sharpe, Sortino, Calmar Risk: Max Drawdown, Drawdown Duration, VaR 95%, CVaR 95%, Vol, Skewness, Kurtosis Trade: Win Rate, Avg Win/Loss, Profit Factor, Hold Duration, Max Win/Loss Streaks Cost: Total Fees, Total Slippage (tracked per-trade) ML: OOS Accuracy, AUC-ROC, Precision, Recall, IC, ICIR, Turnover, Signal Decay Portfolio: Diversification Ratio, Correlation Matrix, Per-Asset Breakdown

Project Structure

orderflow-backtester-v3/
├── backend/
│   ├── config.py                 ← centralized config
│   ├── main.py                   ← FastAPI app + WebSocket
│   ├── requirements.txt
│   ├── data/
│   │   ├── base.py               ← abstract DataSource interface
│   │   ├── generator.py          ← synthetic GARCH + order flow
│   │   └── csv_loader.py         ← CSV ingestion (Yahoo, custom)
│   ├── engine/
│   │   ├── backtest.py           ← event-driven engine
│   │   ├── execution.py          ← slippage, latency, partial fills
│   │   ├── metrics.py            ← 20+ performance metrics
│   │   ├── portfolio.py          ← multi-asset portfolio engine
│   │   ├── position.py           ← sizing: fixed, vol-target, Kelly
│   │   └── strategies.py         ← 5 strategies incl. ensemble
│   ├── ml/
│   │   ├── features.py           ← feature engineering
│   │   ├── pipeline.py           ← XGBoost + SHAP + comparison
│   │   └── validation.py         ← walk-forward + leakage detection
│   └── routes/
│       ├── backtest.py           ← /backtest + /portfolio/backtest
│       └── ml.py                 ← /ml/train + /ml/insights
└── frontend/
    ├── index.html
    ├── package.json
    ├── vite.config.js
    └── src/
        ├── main.jsx
        ├── App.jsx               ← tabs + keyboard shortcuts
        ├── api.js                ← API service (retry, timeout)
        ├── Navbar.jsx            ← status + clock + tabs
        ├── Dashboard.jsx         ← config + portfolio mode
        ├── ResultsView.jsx       ← metrics + trade log + correlation
        ├── MLInsights.jsx        ← walk-forward + leakage + SHAP
        └── hooks/
            └── useBackend.js     ← connection + clock hooks

Design Decisions

Why synthetic data? — Exchange tick data costs $10K+/year. The GARCH(1,1) generator produces realistic vol clustering and order flow correlations. CSV loader supports real data when available.

Why event-driven? — Loop-based backtests allow accidental vectorized operations that leak future data. Bar-by-bar processing with explicit state makes this impossible.

Why XGBoost over LSTM? — For tabular order flow features with 20 columns, gradient-boosted trees consistently outperform sequence models. SHAP provides the interpretability trading desks require. The model comparison proves this empirically on each run.

Why walk-forward over k-fold? — k-fold shuffles time series data, placing future bars in the training set. Walk-forward expanding windows guarantee the model never trains on future data. The overfit ratio per window quantifies model stability.

Why quarter-Kelly? — Full Kelly is theoretically optimal but practically catastrophic — it assumes perfect edge estimation. Quarter-Kelly provides geometric growth with ~75% lower variance of outcome.

What Makes This Different From Tutorial Projects

Tutorial Project	This Project
`if price > MA: buy`	Z-score of rolling OFI with adaptive exits
`returns.mean() / returns.std()`	Proper annualized Sharpe from daily returns
`random_state=42, train_test_split()`	Walk-forward validation, no shuffling
Mock data in frontend	All data from backend API, error states everywhere
Single asset only	Portfolio with correlation + risk parity
No execution costs	Slippage + fees + latency + partial fills, tracked per-trade

Technologies

Backend: Python 3.12, FastAPI, NumPy, pandas, XGBoost, SHAP, scikit-learn Frontend: React 18, Vite, Canvas charts Protocol: REST + WebSocket

Built as a quantitative research platform for prop trading interviews. Every metric is computed from actual PnL. No mock values. No silent fallbacks.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

▲ OrderFlow Backtester v3.0

What This Actually Does

Architecture

No Lookahead Bias — Here's How

Quick Start

Strategies

Execution Model

ML Pipeline

Portfolio System

API Reference

Metrics (Computed, Not Mocked)

Project Structure

Design Decisions

What Makes This Different From Tutorial Projects

Technologies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

▲ OrderFlow Backtester v3.0

What This Actually Does

Architecture

No Lookahead Bias — Here's How

Quick Start

Strategies

Execution Model

ML Pipeline

Portfolio System

API Reference

Metrics (Computed, Not Mocked)

Project Structure

Design Decisions

What Makes This Different From Tutorial Projects

Technologies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages