backtest-audit

Catch overfitting before it costs you money.

Evidence it works

Validation experiment: 712 strategies (MA crossover + RSI + Bollinger Band + pure noise) across 8 assets — SPY, QQQ, GLD, BTC-USD, ETH-USD, TLT, EEM, VXX. IS period: 2018-2021. OOS period: 2022-2023.

Finding	Result	What it means
IS/OOS Spearman correlation	r = 0.038	IS winners are noise — near-zero predictive power
DSR-pass strategies (OOS+)	58%	Audit-approved strategies survive OOS at above-chance rate
Noise strategies (OOS+)	47%	Pure noise strategies underperform DSR-approved ones
Consensus filter false positives	0%	Requiring DSR+MC consensus eliminates all false positives

Key claims:

IS/OOS rank correlation is near zero (r=0.038) — selecting on IS Sharpe alone is no better than random
DSR+MC consensus gate reduces false-positive overfitting alerts to zero across 712 strategies
Noise strategies are correctly identified at a lower OOS survival rate than real-edge strategies

python examples/validation_experiment.py   # reproduce in ~3 minutes

What a strategy with Sharpe 1.2 actually looks like

Deflated Sharpe Ratio     DSR=-75.6   [FAIL]   <- selected from 19 combos
Probability of Overfitting  PBO=1.00  [FAIL]   <- 100% chance this is luck
Economic Significance       d=0.051   [WARN]   <- negligible effect size
Regime: trend_down          SR=-3.48  [FAIL]   <- collapses in bear markets
Robustness: 3/7 survived    FRAGILE   [WARN]   <- edge breaks under stress

OVERALL VERDICT: FAIL

python examples/audit_demo.py   # real SPY data, runs in 30 seconds

What it audits

Module	Method	What it catches
Deflated Sharpe Ratio	Bailey & Lopez de Prado (2014)	Multiple-testing inflation
Monte Carlo Permutation	White (2000)	Returns order not mattering
PBO	Bailey & Lopez de Prado (2014)	IS winners losing OOS
Parameter Sensitivity	—	Narrow, brittle parameter windows
Economic Significance	Cohen (1988)	Statistically significant but economically useless
Walk-Forward OOS	—	In-sample edge not holding out-of-sample
Regime Audit	—	Edge disappearing in high-vol or bear regimes
Robustness Stress Test	—	Edge collapsing under noise / cost / tail events

Quickstart

pip install backtest-audit

import pandas as pd
from backtest_audit import BacktestAuditor

returns = pd.read_csv("my_strategy_returns.csv").squeeze()
auditor = BacktestAuditor(returns, n_trials=50)  # 50 param combos tried
report  = auditor.run_all()
report.print_report()

print(report.overall_verdict)  # "PASS" | "WARN" | "FAIL"

Full output

report.economic_result      # Cohen's d, MDE, R^2, break-even cost
report.walk_forward_result  # OOS hit rate, IS/OOS Sharpe correlation
report.regime_result        # Per-regime DSR+MC (low/high vol, trend/counter)
report.robustness_report    # 7-scenario stress test survival rate
report.to_dict()            # JSON-serialisable — pipe to any dashboard

REST API

pip install "backtest-audit[api]"
uvicorn backtest_audit.api:app --reload
# -> http://localhost:8000/docs

Endpoint	What
`POST /audit`	Full 8-test audit
`POST /audit/dsr`	DSR only
`POST /audit/mc`	Monte Carlo only
`POST /audit/pbo`	PBO (returns matrix)
`POST /audit/sensitivity`	Parameter sensitivity
`POST /audit/economic`	Effect size, MDE, R^2
`POST /audit/walk-forward`	OOS validation
`POST /audit/regime`	Regime-conditional audit
`POST /audit/robustness`	Stress battery

Docker

docker build --target production -t backtest-audit .
docker run -p 8000:8000 backtest-audit
# -> http://localhost:8000/health

Development

pip install -e ".[dev,api]"
pytest tests/ -v          # 124 tests, ~5s, zero network calls
ruff check src/ tests/    # lint clean

References

Bailey, D. & Lopez de Prado, M. (2014). The Deflated Sharpe Ratio. Journal of Portfolio Management.
White, H. (2000). A Reality Check for Data Snooping. Econometrica, 68(5).
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
examples		examples
src/backtest_audit		src/backtest_audit
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

backtest-audit

Evidence it works

What a strategy with Sharpe 1.2 actually looks like

What it audits

Quickstart

Full output

REST API

Docker

Development

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

backtest-audit

Evidence it works

What a strategy with Sharpe 1.2 actually looks like

What it audits

Quickstart

Full output

REST API

Docker

Development

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages