v2.0.0 — Phase 3B + 4A + dashboard + Sessions 1–7 correctness
42 commits, +7,570 lines, 96 files since v1.1.0.
Phase 3B + 4A roadmap, dashboard rebuild, and a 7-session local code-review correctness pass. v1.0.1 → v2.0.0 because behavioural semantics changed (see Breaking Changes).
⚠️ Breaking changes
- A1 bar-timing (
faf1254) — the runner now decides on bart-1's close and fills any approved entry/exit at bart's open, not the same bar. Backtests will produce numerically different results from any pre-v2.0 pinned fixture. N bars produce N-1 decisions (day 0 has no prior data). BacktestResultshape expanded — new fields:sharpe,sortino,calmar,max_drawdown_duration,avg_recovery_time,mc_success_prob,on_day_errors. Consumers reading the dataclass by field name are unaffected; consumers relying on positional unpacking will break.ConstraintSchemashape expanded — 7 new optional fields with defaults:max_daily_loss_pct(0.02),consistency_rule_pct(0.50),consistency_warn_pct(0.40),trading_cutoff_time("16:20"),min_stop_atr_multiple(1.5),prefer_entry_below_vwap(True),trailing_max_drawdown_pct(None → defaults tomax_drawdown_pct). Old preset JSON without these keys still loads.dashboard/parallel.py.run_parallelsignature — dropped the unusedon_eventsparameter (R8).PresetWorkeris nowCallable[[str], Any].
Phase 3B — Prop trading infra
- F11 trailing drawdown breaker (via Seam 3 — breaker registry)
- F12 challenge phase management (
challenge_pnl_pct→ building / conservation) - F13 daily consistency enforcement (graduated
approaching/breachedsignal) - F14 Monte Carlo challenge simulator (P(pass) projection from realised edge)
Phase 4A — Backtesting quality
- F15 walk-forward windows + OOS aggregation
- F16 transaction cost modelling
- F17 risk-adjusted metrics (Sharpe, Sortino, Calmar)
- F18 drawdown profile analysis (duration, avg recovery)
- F19 regime-conditional decomposition
Dashboard workstream
- A1 Seam 0 bar-timing fidelity (
faf1254) - A2 Phase 4A metrics surfaced in backtest dashboard (
64c59fb) - A3 Monte Carlo P(pass) per configuration (
806953f) - T0 backtest persistence & recovery (
569d1fb) - T1 prompt caching for Player + Coach (
14bede3) - T2 parallel preset execution (
2825167) - T3 Fast / Standard backtest depth toggle (
3f2cf28) - Live backtest progress bar + sparkline + status caption (
bbae90c)
Sessions 1–7 correctness pass
22 of 25 local /code-review findings shipped over 7 sessions.
🟥 Critical (7) — silent bugs in v1.x
- N1 (
5161dc8) —daily_pnl=0.0at decision time disabled daily-loss and consistency breakers pre-LLM. Now passes overnight-gap MTM. - N2 (
b7f36ce) — runner discarded Player's proposedentry_price. Now re-validates direction + RR againsttoday_open, skips on broken geometry. - N3 (
11cb49d) — parallel presets now write to per-strategy artifact dirs (artifacts/<strategy_id>/). - R1 + N13 (
11cb49d) — same-preset slot collision —_persist_presetnow takesslotas parameter, never derives. - R2 (
11cb49d) — recovery banner phantom winner —_bfields areNone, not0.0. - R3 (
4dc057d) — Streamlit widget mutations from worker threads now carryScriptRunContextviathread_inithook inrun_parallel. - N4 (
11cb49d) —_render_metrics_panelKeyErroron partial persistence — iteratesmetrics_panel_slots(m). - N7 (
5161dc8) — day-1consistency_statusfixed as part of N1.
🟧 High (5)
- R4 (
7a3c11c) —artifact["rounds"][-1]empty-listIndexErrorguarded. - N5 (
7a3c11c) — CoachLoop forwards everytokens_usedkey —cache_read_player/cache_read_coachreach the artifact. - N6 (
7a3c11c) — T1 caching plumbed throughreasoning_evaluator.py+ both streaming clones viabuild_cached_system(). - N8 (
11cb49d) —fresh_metrics_state()installed on every Run click — no stalepreset_bleak from prior runs.
🟨 Medium (6) — efficiency + observability
- R5 (
891f234) — narrowedBaseException→Exception(KeyboardInterruptpropagates). - R6 (
891f234) — strict typed encoder inrecovery.save_snapshot— numpy coerce cleanly, unknown types raise. - N9 (
891f234) — sparkline throttled to every 5 bars + final bar viashould_render_sparkline(). - N10 (
891f234) —ConstraintSchema.to_json()caches per-instance — ~1500 redundant dumps eliminated per backtest. - N11 (
891f234) — runner inner-loop — numpy SMA, cachedtrade_stats, single-pass MTM. - N12 (
891f234) —BacktestResult.on_day_errorscounter + log; broken subscribers surface instead of looking frozen.
⬜ Low (4, +3 rehoused)
- R7 (incidentally Session 3) — unused
other_labelparam removed. - R8 / N17 (
3a6e34e) — deadon_eventsAPI surface dropped. - R9 (
3a6e34e) — B-preset success toast added. - R10 (
3a6e34e) —list_recoverable()wrapped in@st.cache_data(ttl=10). - N14 / N15 / N16 deferred to AgentCore backlog D6–D8 (architectural).
Infrastructure
- Seam 0 — backtest fidelity (
8754365) - Seam 3 — breaker registry (
30866aa) - Seam 5 — equity curve + metrics module (
d69ce11) - Streamlit Cloud deploy: Python 3.12 set via Cloud Settings UI (the misleading
.python-versionwas dropped inee586d5).
Verification
- 513 unit tests passing, 1 skipped (was 460 at start of Sessions 1–7)
- Local
pytest: clean - Lint: clean on every file touched
- Streamlit Cloud (Python 3.12): R3 ScriptRunContext attach verified with progress bars updating live across full 18-day Fast run; both R9 success toasts displayed; comparison summary rendered
Installation
pip install player-coach-core==2.0.0