Skip to content

v2.0.0 — Phase 3B + 4A + dashboard + Sessions 1–7 correctness

Choose a tag to compare

@MaverickHQ MaverickHQ released this 21 Jun 08:40
· 2 commits to main since this release

42 commits, +7,570 lines, 96 files since v1.1.0.

Phase 3B + 4A roadmap, dashboard rebuild, and a 7-session local code-review correctness pass. v1.0.1 → v2.0.0 because behavioural semantics changed (see Breaking Changes).


⚠️ Breaking changes

  • A1 bar-timing (faf1254) — the runner now decides on bar t-1's close and fills any approved entry/exit at bar t's open, not the same bar. Backtests will produce numerically different results from any pre-v2.0 pinned fixture. N bars produce N-1 decisions (day 0 has no prior data).
  • BacktestResult shape expanded — new fields: sharpe, sortino, calmar, max_drawdown_duration, avg_recovery_time, mc_success_prob, on_day_errors. Consumers reading the dataclass by field name are unaffected; consumers relying on positional unpacking will break.
  • ConstraintSchema shape expanded — 7 new optional fields with defaults: max_daily_loss_pct (0.02), consistency_rule_pct (0.50), consistency_warn_pct (0.40), trading_cutoff_time ("16:20"), min_stop_atr_multiple (1.5), prefer_entry_below_vwap (True), trailing_max_drawdown_pct (None → defaults to max_drawdown_pct). Old preset JSON without these keys still loads.
  • dashboard/parallel.py.run_parallel signature — dropped the unused on_events parameter (R8). PresetWorker is now Callable[[str], Any].

Phase 3B — Prop trading infra

  • F11 trailing drawdown breaker (via Seam 3 — breaker registry)
  • F12 challenge phase management (challenge_pnl_pct → building / conservation)
  • F13 daily consistency enforcement (graduated approaching / breached signal)
  • F14 Monte Carlo challenge simulator (P(pass) projection from realised edge)

Phase 4A — Backtesting quality

  • F15 walk-forward windows + OOS aggregation
  • F16 transaction cost modelling
  • F17 risk-adjusted metrics (Sharpe, Sortino, Calmar)
  • F18 drawdown profile analysis (duration, avg recovery)
  • F19 regime-conditional decomposition

Dashboard workstream

  • A1 Seam 0 bar-timing fidelity (faf1254)
  • A2 Phase 4A metrics surfaced in backtest dashboard (64c59fb)
  • A3 Monte Carlo P(pass) per configuration (806953f)
  • T0 backtest persistence & recovery (569d1fb)
  • T1 prompt caching for Player + Coach (14bede3)
  • T2 parallel preset execution (2825167)
  • T3 Fast / Standard backtest depth toggle (3f2cf28)
  • Live backtest progress bar + sparkline + status caption (bbae90c)

Sessions 1–7 correctness pass

22 of 25 local /code-review findings shipped over 7 sessions.

🟥 Critical (7) — silent bugs in v1.x

  • N1 (5161dc8) — daily_pnl=0.0 at decision time disabled daily-loss and consistency breakers pre-LLM. Now passes overnight-gap MTM.
  • N2 (b7f36ce) — runner discarded Player's proposed entry_price. Now re-validates direction + RR against today_open, skips on broken geometry.
  • N3 (11cb49d) — parallel presets now write to per-strategy artifact dirs (artifacts/<strategy_id>/).
  • R1 + N13 (11cb49d) — same-preset slot collision — _persist_preset now takes slot as parameter, never derives.
  • R2 (11cb49d) — recovery banner phantom winner — _b fields are None, not 0.0.
  • R3 (4dc057d) — Streamlit widget mutations from worker threads now carry ScriptRunContext via thread_init hook in run_parallel.
  • N4 (11cb49d) — _render_metrics_panel KeyError on partial persistence — iterates metrics_panel_slots(m).
  • N7 (5161dc8) — day-1 consistency_status fixed as part of N1.

🟧 High (5)

  • R4 (7a3c11c) — artifact["rounds"][-1] empty-list IndexError guarded.
  • N5 (7a3c11c) — CoachLoop forwards every tokens_used key — cache_read_player / cache_read_coach reach the artifact.
  • N6 (7a3c11c) — T1 caching plumbed through reasoning_evaluator.py + both streaming clones via build_cached_system().
  • N8 (11cb49d) — fresh_metrics_state() installed on every Run click — no stale preset_b leak from prior runs.

🟨 Medium (6) — efficiency + observability

  • R5 (891f234) — narrowed BaseExceptionException (KeyboardInterrupt propagates).
  • R6 (891f234) — strict typed encoder in recovery.save_snapshot — numpy coerce cleanly, unknown types raise.
  • N9 (891f234) — sparkline throttled to every 5 bars + final bar via should_render_sparkline().
  • N10 (891f234) — ConstraintSchema.to_json() caches per-instance — ~1500 redundant dumps eliminated per backtest.
  • N11 (891f234) — runner inner-loop — numpy SMA, cached trade_stats, single-pass MTM.
  • N12 (891f234) — BacktestResult.on_day_errors counter + log; broken subscribers surface instead of looking frozen.

⬜ Low (4, +3 rehoused)

  • R7 (incidentally Session 3) — unused other_label param removed.
  • R8 / N17 (3a6e34e) — dead on_events API surface dropped.
  • R9 (3a6e34e) — B-preset success toast added.
  • R10 (3a6e34e) — list_recoverable() wrapped in @st.cache_data(ttl=10).
  • N14 / N15 / N16 deferred to AgentCore backlog D6–D8 (architectural).

Infrastructure

  • Seam 0 — backtest fidelity (8754365)
  • Seam 3 — breaker registry (30866aa)
  • Seam 5 — equity curve + metrics module (d69ce11)
  • Streamlit Cloud deploy: Python 3.12 set via Cloud Settings UI (the misleading .python-version was dropped in ee586d5).

Verification

  • 513 unit tests passing, 1 skipped (was 460 at start of Sessions 1–7)
  • Local pytest: clean
  • Lint: clean on every file touched
  • Streamlit Cloud (Python 3.12): R3 ScriptRunContext attach verified with progress bars updating live across full 18-day Fast run; both R9 success toasts displayed; comparison summary rendered

Installation

pip install player-coach-core==2.0.0