Rigorous cross-validation for time series with leakage detection and gap enforcement.
Documentation • Examples • Leakage Tutorial
pip install temporalcvValidate your model for temporal leakage in 4 lines:
from temporalcv import run_gates
from temporalcv.gates import gate_signal_verification
report = run_gates([gate_signal_verification(model, X, y, n_shuffles=100)])
print(report.status) # HALT, WARN, or PASS| Status | Meaning |
|---|---|
| HALT | Signal detected — investigate (legitimate temporal pattern or leakage?) |
| WARN | Marginal signal — proceed with caution |
| PASS | No signal — model has no detectable predictive power |
Standard cross-validation shuffles data randomly. For time series, this means training on future data to predict the past — a form of data leakage that inflates metrics.
temporalcv provides:
- Leakage detection via validation gates (shuffled target test, suspicious improvement)
- Gap enforcement for h-step forecasting (no lag-feature contamination)
- High-persistence metrics (MASE, MC-SS) that measure actual skill
- Conformal coverage with caveats: marginal under exchangeability; time-series autocorrelation can invalidate the guarantee — use
AdaptiveConformalPredictor(Gibbs & Candès 2021) for distribution-shift handling
| Feature | temporalcv | sklearn | sktime | darts |
|---|---|---|---|---|
| Horizon-derived gap (auto) | ✓ | Manual¹ | Manual | Manual |
| Leakage detection gates | ✓ | ✗ | ✗ | ✗ |
| Conformal prediction | ✓ | ✗ | Partial | ✓ |
| sklearn-compatible API | ✓ | ✓ | ✓ | ✗ |
| Statistical tests (DM, PT) | ✓ | ✗ | Partial | ✗ |
¹ sklearn TimeSeriesSplit(gap=N) (since v0.24) requires the user to compute N from the forecast horizon themselves; temporalcv derives gap = horizon + extra_gap automatically from the horizon parameter.
from temporalcv import WalkForwardCV
cv = WalkForwardCV(
window_type="sliding",
window_size=104, # 2 years of weekly data
horizon=2, # 2-step ahead forecast
test_size=1
)
for train_idx, test_idx in cv.split(X, y):
model.fit(X[train_idx], y[train_idx])
pred = model.predict(X[test_idx])The horizon parameter enforces a gap between training and test sets, preventing lagged features from leaking target information.
from temporalcv.gates import gate_signal_verification, gate_suspicious_improvement
gates = [
gate_signal_verification(model, X, y, n_shuffles=100),
gate_suspicious_improvement(model_mae, baseline_mae, threshold=0.20),
]
report = run_gates(gates)
if report.status == "HALT":
raise ValueError(f"Signal detected — investigate: {report.summary()}")When your series is "sticky" (ACF(1) > 0.9), standard MAE lies — predicting "same as yesterday" looks great but adds no value.
from temporalcv.metrics import mase, mc_skill_score
print(f"MASE: {mase(actual, predicted, y_train):.3f}")
print(f"MC-SS: {mc_skill_score(actual, predicted):.3f}")| Metric | What It Measures |
|---|---|
| MASE | Error relative to naive forecast (scale-free) |
| MC-SS | Skill only when target moved |
flowchart TD
A[Data + Model] --> B{Validation Gates}
B -->|HALT| C[Stop & Investigate]
B -->|WARN| D[Proceed with Caution]
B -->|PASS| E[Walk-Forward CV]
E --> F[Statistical Tests]
F --> G[Deploy]
Learn from these failure modes:
| Pattern | Example | Why It's Bad |
|---|---|---|
| Rolling stats on full series | .rolling().mean() without .shift() |
Features encode future |
| No gap for h-step forecast | horizon=0 when predicting 2 steps ahead |
Lag features leak target |
| Threshold on full data | Regime boundary uses all data | Classification cheats |
See the failure examples gallery for detailed walkthroughs:
- 16: Rolling Stats Leak
- 17: Threshold Leak
- 19: Missing Gap
- 20: KFold Trap — 47.8% fake improvement
pip install temporalcv[benchmarks] # M4/M5 benchmarks
pip install temporalcv[changepoint] # PELT algorithm
pip install temporalcv[dev] # Testing, linting
pip install temporalcv[all] # EverythingCore: numpy >= 1.21, scipy >= 1.7, scikit-learn >= 1.0, statsmodels >= 0.13, matplotlib >= 3.5 • Optional: pandas >= 1.3 (pip install temporalcv[pandas])
Platforms: Linux, macOS, Windows | Python: 3.10+
| Resource | Description |
|---|---|
| Quickstart | Get running in 5 minutes |
| Leakage Tutorial | Deep dive on detection |
| API Reference | Full API docs |
| Examples Gallery | 21 real-world cases |
| Test | Reference | Result |
|---|---|---|
| DM test golden values | R forecast::dm.test() |
✓ Match |
| Type I error rate | 500 Monte Carlo sims | 5% ± 2% |
| Conformal coverage | Synthetic AR(1) | 95% nominal |
| Benchmark | M4 Competition (4,773 series) | ✓ Validated |
See Testing Strategy for details.
@software{temporalcv2025,
author = {Behring, Brandon},
title = {temporalcv: Temporal cross-validation with leakage protection},
year = {2025},
publisher = {GitHub},
url = {https://github.com/brandonmbehring-dev/temporalcv}
}Part of the Rigorous AI Engineering ecosystem:
| Project | Description |
|---|---|
| temporalcv (this repo) | Temporal cross-validation with leakage detection |
| ir-eval | Statistical retrieval evaluation with drift detection |
| research-kb | Graph-boosted semantic search for research literature |
See CONTRIBUTING.md for development setup and guidelines.
MIT License — see LICENSE