Release v2.0.0 — Phase 3B + 4A + dashboard + Sessions 1–7 correctness · MaverickHQ/crucible-player-coach

42 commits, +7,570 lines, 96 files since v1.1.0.

Phase 3B + 4A roadmap, dashboard rebuild, and a 7-session local code-review correctness pass. v1.0.1 → v2.0.0 because behavioural semantics changed (see Breaking Changes).

⚠️ Breaking changes

A1 bar-timing (faf1254) — the runner now decides on bar t-1's close and fills any approved entry/exit at bar t's open, not the same bar. Backtests will produce numerically different results from any pre-v2.0 pinned fixture. N bars produce N-1 decisions (day 0 has no prior data).
BacktestResult shape expanded — new fields: sharpe, sortino, calmar, max_drawdown_duration, avg_recovery_time, mc_success_prob, on_day_errors. Consumers reading the dataclass by field name are unaffected; consumers relying on positional unpacking will break.
ConstraintSchema shape expanded — 7 new optional fields with defaults: max_daily_loss_pct (0.02), consistency_rule_pct (0.50), consistency_warn_pct (0.40), trading_cutoff_time ("16:20"), min_stop_atr_multiple (1.5), prefer_entry_below_vwap (True), trailing_max_drawdown_pct (None → defaults to max_drawdown_pct). Old preset JSON without these keys still loads.
dashboard/parallel.py.run_parallel signature — dropped the unused on_events parameter (R8). PresetWorker is now Callable[[str], Any].

Phase 3B — Prop trading infra

F11 trailing drawdown breaker (via Seam 3 — breaker registry)
F12 challenge phase management (challenge_pnl_pct → building / conservation)
F13 daily consistency enforcement (graduated approaching / breached signal)
F14 Monte Carlo challenge simulator (P(pass) projection from realised edge)

Phase 4A — Backtesting quality

F15 walk-forward windows + OOS aggregation
F16 transaction cost modelling
F17 risk-adjusted metrics (Sharpe, Sortino, Calmar)
F18 drawdown profile analysis (duration, avg recovery)
F19 regime-conditional decomposition

Dashboard workstream

A1 Seam 0 bar-timing fidelity (faf1254)
A2 Phase 4A metrics surfaced in backtest dashboard (64c59fb)
A3 Monte Carlo P(pass) per configuration (806953f)
T0 backtest persistence & recovery (569d1fb)
T1 prompt caching for Player + Coach (14bede3)
T2 parallel preset execution (2825167)
T3 Fast / Standard backtest depth toggle (3f2cf28)
Live backtest progress bar + sparkline + status caption (bbae90c)

Sessions 1–7 correctness pass

22 of 25 local /code-review findings shipped over 7 sessions.

🟥 Critical (7) — silent bugs in v1.x

N1 (5161dc8) — daily_pnl=0.0 at decision time disabled daily-loss and consistency breakers pre-LLM. Now passes overnight-gap MTM.
N2 (b7f36ce) — runner discarded Player's proposed entry_price. Now re-validates direction + RR against today_open, skips on broken geometry.
N3 (11cb49d) — parallel presets now write to per-strategy artifact dirs (artifacts/<strategy_id>/).
R1 + N13 (11cb49d) — same-preset slot collision — _persist_preset now takes slot as parameter, never derives.
R2 (11cb49d) — recovery banner phantom winner — _b fields are None, not 0.0.
R3 (4dc057d) — Streamlit widget mutations from worker threads now carry ScriptRunContext via thread_init hook in run_parallel.
N4 (11cb49d) — _render_metrics_panel KeyError on partial persistence — iterates metrics_panel_slots(m).
N7 (5161dc8) — day-1 consistency_status fixed as part of N1.

🟧 High (5)

R4 (7a3c11c) — artifact["rounds"][-1] empty-list IndexError guarded.
N5 (7a3c11c) — CoachLoop forwards every tokens_used key — cache_read_player / cache_read_coach reach the artifact.
N6 (7a3c11c) — T1 caching plumbed through reasoning_evaluator.py + both streaming clones via build_cached_system().
N8 (11cb49d) — fresh_metrics_state() installed on every Run click — no stale preset_b leak from prior runs.

🟨 Medium (6) — efficiency + observability

R5 (891f234) — narrowed BaseException → Exception (KeyboardInterrupt propagates).
R6 (891f234) — strict typed encoder in recovery.save_snapshot — numpy coerce cleanly, unknown types raise.
N9 (891f234) — sparkline throttled to every 5 bars + final bar via should_render_sparkline().
N10 (891f234) — ConstraintSchema.to_json() caches per-instance — ~1500 redundant dumps eliminated per backtest.
N11 (891f234) — runner inner-loop — numpy SMA, cached trade_stats, single-pass MTM.
N12 (891f234) — BacktestResult.on_day_errors counter + log; broken subscribers surface instead of looking frozen.

⬜ Low (4, +3 rehoused)

R7 (incidentally Session 3) — unused other_label param removed.
R8 / N17 (3a6e34e) — dead on_events API surface dropped.
R9 (3a6e34e) — B-preset success toast added.
R10 (3a6e34e) — list_recoverable() wrapped in @st.cache_data(ttl=10).
N14 / N15 / N16 deferred to AgentCore backlog D6–D8 (architectural).

Infrastructure

Seam 0 — backtest fidelity (8754365)
Seam 3 — breaker registry (30866aa)
Seam 5 — equity curve + metrics module (d69ce11)
Streamlit Cloud deploy: Python 3.12 set via Cloud Settings UI (the misleading .python-version was dropped in ee586d5).

Verification

513 unit tests passing, 1 skipped (was 460 at start of Sessions 1–7)
Local pytest: clean
Lint: clean on every file touched
Streamlit Cloud (Python 3.12): R3 ScriptRunContext attach verified with progress bars updating live across full 18-day Fast run; both R9 success toasts displayed; comparison summary rendered

Installation

pip install player-coach-core==2.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.0.0 — Phase 3B + 4A + dashboard + Sessions 1–7 correctness

Choose a tag to compare

Sorry, something went wrong.