Skip to content

Releases: MaverickHQ/crucible-player-coach

v2.0.1 — packaging fix for v2.0.0

21 Jun 10:15

Choose a tag to compare

Packaging-only patch. No runtime code changed. v2.0.0 → v2.0.1 fixes two latent defects that have been present since v1.0.0 but only surface when installing from PyPI (not from git clone, which is how this repo is normally deployed).

What was broken in v2.0.0

pip install player-coach-core==2.0.0 produced a package where:

  1. player_coach/artifacts/ was missing entirely from the wheel. Any from player_coach.loop.coach_loop import CoachLoop chained through player_coach.artifacts.writer.ArtifactWriter and crashed with ModuleNotFoundError.
  2. yfinance wasn't declared in any extra. Even pip install player-coach-core[market] (which had numpy, hmmlearn, arch) couldn't import BacktestRunner — yfinance is needed for OHLCV fetching at module level.

Neither bug was caught by the existing 513-test suite because development happens from git clone where requirements.txt covers everything. They were caught during the post-publish smoke test on a clean venv.

Fixes

Defect Cause Fix
D1player_coach/artifacts/ missing from wheel .gitignore:12 had unanchored artifacts/ — hatchling's pathspec parser matched the Python sub-module too, even though git check-ignore correctly said it wasn't ignored .gitignore: artifacts//artifacts/ (anchor to repo root)
D2yfinance not declared [project.optional-dependencies.market] listed numpy, hmmlearn, arch — yfinance was missed pyproject.toml [market]: add yfinance>=0.2.40

TDD coverage added

Two new tests in tests/unit/test_packaging.py pin both defects so they cannot regress:

  • test_no_tracked_player_coach_file_matches_gitignore — walks git ls-files player_coach/ and asserts no path is matched by any .gitignore pattern under hatchling-equivalent semantics. Would have failed on v2.0.0.
  • test_declared_dependencies_cover_runtime_imports — AST-parses every player_coach/**/*.py, asserts every non-stdlib top-level import is in [dependencies] or an [optional-dependencies] table. Would have failed on v2.0.0.

Plus two guardrail tests that were already passing and remain unchanged:

  • test_pyproject_declares_llm_extra_for_anthropic — pins the CLAUDE.md install ladder
  • test_tomllib_available — pins the Python 3.11+ floor needed for the static analysis above

Stale test removed

test_python_version_pinned_for_streamlit_cloud (in tests/unit/test_requirements.py) was pinning the .python-version file we removed in ee586d5. That file turned out to be a pyenv-only convention that Streamlit Cloud ignores (Cloud reads its Python version from the per-app Settings UI, not from any repo file). The test was pinning a contract that no longer applies; the deploy-mechanism docs moved to CLAUDE.md's "Deploy environment" section.

Verification

# clean venv, install from PyPI
python3 -m venv /tmp/v
/tmp/v/bin/pip install 'player-coach-core[market,llm]==2.0.1'

# the v2.0.0-broken import chain
/tmp/v/bin/python -c "
from player_coach.loop.coach_loop import CoachLoop
from player_coach.backtest.runner import BacktestRunner
from player_coach.artifacts.writer import ArtifactWriter   # was missing in v2.0.0
print('all imports OK')
"

Test count
513 → 516 (+4 new packaging tests, –1 stale .python-version test = +3 net), 1 skipped, lint clean.

Who needs to upgrade
pip install player-coach-core users — yes, upgrade. v2.0.0 is fundamentally broken on import.
Streamlit Cloud users (this app's actual deploy path) — no change needed. Cloud reads from git, not PyPI.
git clone developers — no change needed; requirements.txt was always covering this.
Installation

pip install --upgrade 'player-coach-core[market,llm]==2.0.1'

---

## Why this body is shorter than v2.0.0's

v2.0.0 shipped features. v2.0.1 ships a packaging correction. The body reflects that — single section per defect, the TDD evidence, verification recipe, and a clear "who needs to upgrade" guide. ~50 lines instead of 100.

Ping me when the GitHub Release is published. The background install will notify me when it finishes (~2 min from when I started it).

v2.0.0 — Phase 3B + 4A + dashboard + Sessions 1–7 correctness

21 Jun 08:40

Choose a tag to compare

42 commits, +7,570 lines, 96 files since v1.1.0.

Phase 3B + 4A roadmap, dashboard rebuild, and a 7-session local code-review correctness pass. v1.0.1 → v2.0.0 because behavioural semantics changed (see Breaking Changes).


⚠️ Breaking changes

  • A1 bar-timing (faf1254) — the runner now decides on bar t-1's close and fills any approved entry/exit at bar t's open, not the same bar. Backtests will produce numerically different results from any pre-v2.0 pinned fixture. N bars produce N-1 decisions (day 0 has no prior data).
  • BacktestResult shape expanded — new fields: sharpe, sortino, calmar, max_drawdown_duration, avg_recovery_time, mc_success_prob, on_day_errors. Consumers reading the dataclass by field name are unaffected; consumers relying on positional unpacking will break.
  • ConstraintSchema shape expanded — 7 new optional fields with defaults: max_daily_loss_pct (0.02), consistency_rule_pct (0.50), consistency_warn_pct (0.40), trading_cutoff_time ("16:20"), min_stop_atr_multiple (1.5), prefer_entry_below_vwap (True), trailing_max_drawdown_pct (None → defaults to max_drawdown_pct). Old preset JSON without these keys still loads.
  • dashboard/parallel.py.run_parallel signature — dropped the unused on_events parameter (R8). PresetWorker is now Callable[[str], Any].

Phase 3B — Prop trading infra

  • F11 trailing drawdown breaker (via Seam 3 — breaker registry)
  • F12 challenge phase management (challenge_pnl_pct → building / conservation)
  • F13 daily consistency enforcement (graduated approaching / breached signal)
  • F14 Monte Carlo challenge simulator (P(pass) projection from realised edge)

Phase 4A — Backtesting quality

  • F15 walk-forward windows + OOS aggregation
  • F16 transaction cost modelling
  • F17 risk-adjusted metrics (Sharpe, Sortino, Calmar)
  • F18 drawdown profile analysis (duration, avg recovery)
  • F19 regime-conditional decomposition

Dashboard workstream

  • A1 Seam 0 bar-timing fidelity (faf1254)
  • A2 Phase 4A metrics surfaced in backtest dashboard (64c59fb)
  • A3 Monte Carlo P(pass) per configuration (806953f)
  • T0 backtest persistence & recovery (569d1fb)
  • T1 prompt caching for Player + Coach (14bede3)
  • T2 parallel preset execution (2825167)
  • T3 Fast / Standard backtest depth toggle (3f2cf28)
  • Live backtest progress bar + sparkline + status caption (bbae90c)

Sessions 1–7 correctness pass

22 of 25 local /code-review findings shipped over 7 sessions.

🟥 Critical (7) — silent bugs in v1.x

  • N1 (5161dc8) — daily_pnl=0.0 at decision time disabled daily-loss and consistency breakers pre-LLM. Now passes overnight-gap MTM.
  • N2 (b7f36ce) — runner discarded Player's proposed entry_price. Now re-validates direction + RR against today_open, skips on broken geometry.
  • N3 (11cb49d) — parallel presets now write to per-strategy artifact dirs (artifacts/<strategy_id>/).
  • R1 + N13 (11cb49d) — same-preset slot collision — _persist_preset now takes slot as parameter, never derives.
  • R2 (11cb49d) — recovery banner phantom winner — _b fields are None, not 0.0.
  • R3 (4dc057d) — Streamlit widget mutations from worker threads now carry ScriptRunContext via thread_init hook in run_parallel.
  • N4 (11cb49d) — _render_metrics_panel KeyError on partial persistence — iterates metrics_panel_slots(m).
  • N7 (5161dc8) — day-1 consistency_status fixed as part of N1.

🟧 High (5)

  • R4 (7a3c11c) — artifact["rounds"][-1] empty-list IndexError guarded.
  • N5 (7a3c11c) — CoachLoop forwards every tokens_used key — cache_read_player / cache_read_coach reach the artifact.
  • N6 (7a3c11c) — T1 caching plumbed through reasoning_evaluator.py + both streaming clones via build_cached_system().
  • N8 (11cb49d) — fresh_metrics_state() installed on every Run click — no stale preset_b leak from prior runs.

🟨 Medium (6) — efficiency + observability

  • R5 (891f234) — narrowed BaseExceptionException (KeyboardInterrupt propagates).
  • R6 (891f234) — strict typed encoder in recovery.save_snapshot — numpy coerce cleanly, unknown types raise.
  • N9 (891f234) — sparkline throttled to every 5 bars + final bar via should_render_sparkline().
  • N10 (891f234) — ConstraintSchema.to_json() caches per-instance — ~1500 redundant dumps eliminated per backtest.
  • N11 (891f234) — runner inner-loop — numpy SMA, cached trade_stats, single-pass MTM.
  • N12 (891f234) — BacktestResult.on_day_errors counter + log; broken subscribers surface instead of looking frozen.

⬜ Low (4, +3 rehoused)

  • R7 (incidentally Session 3) — unused other_label param removed.
  • R8 / N17 (3a6e34e) — dead on_events API surface dropped.
  • R9 (3a6e34e) — B-preset success toast added.
  • R10 (3a6e34e) — list_recoverable() wrapped in @st.cache_data(ttl=10).
  • N14 / N15 / N16 deferred to AgentCore backlog D6–D8 (architectural).

Infrastructure

  • Seam 0 — backtest fidelity (8754365)
  • Seam 3 — breaker registry (30866aa)
  • Seam 5 — equity curve + metrics module (d69ce11)
  • Streamlit Cloud deploy: Python 3.12 set via Cloud Settings UI (the misleading .python-version was dropped in ee586d5).

Verification

  • 513 unit tests passing, 1 skipped (was 460 at start of Sessions 1–7)
  • Local pytest: clean
  • Lint: clean on every file touched
  • Streamlit Cloud (Python 3.12): R3 ScriptRunContext attach verified with progress bars updating live across full 18-day Fast run; both R9 success toasts displayed; comparison summary rendered

Installation

pip install player-coach-core==2.0.0

v1.1.0 — Phase 3: Evidence Loop, Pattern Memory, Backtest, Reasoning Eval

24 May 09:35

Choose a tag to compare

What's new in v1.1.0

Feature 1 — ConstraintDeriver live connection

Constraints are now derived from accumulated exchange evidence. The Constraints page gains a 'Derive from History' button that reads approved runs, extracts trade size and risk/reward, and updates all constraint sliders in one click.

Feature 3 — Pattern memory

After each exchange, named violations are recorded to coach_memory with confidence scores. Before each PlayerAgent call, relevant patterns for the current symbol are injected into context — cross-session learning without retraining.

Feature 4 — Comparative backtest dashboard page

A fifth dashboard page runs two constraint presets over historical data side by side. Approval rate, average rounds, days aborted, total return, and max drawdown shown with winner highlighted per metric.

Feature 5 — Reasoning quality evaluation

A secondary adversarial check on the Player's reasoning. After each CoachAgent evaluation, a ReasoningEvaluator scores the Player's reasoning on consistency with market state, internal coherence, and proportionality. Score displayed as a coloured badge on each round card.

Bug fixes

  • Player now receives Coach critique on revision rounds in all paths
  • Pattern memory and patterns injection wired into dashboard streaming path
  • Circuit breaker trips now persist abort artifacts to disk and SQLite
  • Artifact metadata (daily_pnl_at_time, enriched fields) flushed to disk correctly
  • Reasoning score and critique now appear in History page round cards
  • Derive from History now correctly seeds constraint form sliders
  • anthropic>=0.49.0 required for get_final_usage()

Tests

118 unit tests + 1 integration test, all passing.

v1.0.1 — Dashboard and Streamlit Cloud deployment

17 May 19:16

Choose a tag to compare

First full release. Includes backend, four-page Streamlit dashboard, PyPI package, and Streamlit Cloud deployment. See README for installation and usage.