Releases: MaverickHQ/crucible-player-coach
v2.0.1 — packaging fix for v2.0.0
Packaging-only patch. No runtime code changed. v2.0.0 → v2.0.1 fixes two latent defects that have been present since v1.0.0 but only surface when installing from PyPI (not from git clone, which is how this repo is normally deployed).
What was broken in v2.0.0
pip install player-coach-core==2.0.0 produced a package where:
player_coach/artifacts/was missing entirely from the wheel. Anyfrom player_coach.loop.coach_loop import CoachLoopchained throughplayer_coach.artifacts.writer.ArtifactWriterand crashed withModuleNotFoundError.yfinancewasn't declared in any extra. Evenpip install player-coach-core[market](which had numpy, hmmlearn, arch) couldn't importBacktestRunner— yfinance is needed for OHLCV fetching at module level.
Neither bug was caught by the existing 513-test suite because development happens from git clone where requirements.txt covers everything. They were caught during the post-publish smoke test on a clean venv.
Fixes
| Defect | Cause | Fix |
|---|---|---|
D1 — player_coach/artifacts/ missing from wheel |
.gitignore:12 had unanchored artifacts/ — hatchling's pathspec parser matched the Python sub-module too, even though git check-ignore correctly said it wasn't ignored |
.gitignore: artifacts/ → /artifacts/ (anchor to repo root) |
D2 — yfinance not declared |
[project.optional-dependencies.market] listed numpy, hmmlearn, arch — yfinance was missed |
pyproject.toml [market]: add yfinance>=0.2.40 |
TDD coverage added
Two new tests in tests/unit/test_packaging.py pin both defects so they cannot regress:
test_no_tracked_player_coach_file_matches_gitignore— walksgit ls-files player_coach/and asserts no path is matched by any.gitignorepattern under hatchling-equivalent semantics. Would have failed on v2.0.0.test_declared_dependencies_cover_runtime_imports— AST-parses everyplayer_coach/**/*.py, asserts every non-stdlib top-level import is in[dependencies]or an[optional-dependencies]table. Would have failed on v2.0.0.
Plus two guardrail tests that were already passing and remain unchanged:
test_pyproject_declares_llm_extra_for_anthropic— pins the CLAUDE.md install laddertest_tomllib_available— pins the Python 3.11+ floor needed for the static analysis above
Stale test removed
test_python_version_pinned_for_streamlit_cloud (in tests/unit/test_requirements.py) was pinning the .python-version file we removed in ee586d5. That file turned out to be a pyenv-only convention that Streamlit Cloud ignores (Cloud reads its Python version from the per-app Settings UI, not from any repo file). The test was pinning a contract that no longer applies; the deploy-mechanism docs moved to CLAUDE.md's "Deploy environment" section.
Verification
# clean venv, install from PyPI
python3 -m venv /tmp/v
/tmp/v/bin/pip install 'player-coach-core[market,llm]==2.0.1'
# the v2.0.0-broken import chain
/tmp/v/bin/python -c "
from player_coach.loop.coach_loop import CoachLoop
from player_coach.backtest.runner import BacktestRunner
from player_coach.artifacts.writer import ArtifactWriter # was missing in v2.0.0
print('all imports OK')
"
Test count
513 → 516 (+4 new packaging tests, –1 stale .python-version test = +3 net), 1 skipped, lint clean.
Who needs to upgrade
pip install player-coach-core users — yes, upgrade. v2.0.0 is fundamentally broken on import.
Streamlit Cloud users (this app's actual deploy path) — no change needed. Cloud reads from git, not PyPI.
git clone developers — no change needed; requirements.txt was always covering this.
Installation
pip install --upgrade 'player-coach-core[market,llm]==2.0.1'
---
## Why this body is shorter than v2.0.0's
v2.0.0 shipped features. v2.0.1 ships a packaging correction. The body reflects that — single section per defect, the TDD evidence, verification recipe, and a clear "who needs to upgrade" guide. ~50 lines instead of 100.
Ping me when the GitHub Release is published. The background install will notify me when it finishes (~2 min from when I started it).v2.0.0 — Phase 3B + 4A + dashboard + Sessions 1–7 correctness
42 commits, +7,570 lines, 96 files since v1.1.0.
Phase 3B + 4A roadmap, dashboard rebuild, and a 7-session local code-review correctness pass. v1.0.1 → v2.0.0 because behavioural semantics changed (see Breaking Changes).
⚠️ Breaking changes
- A1 bar-timing (
faf1254) — the runner now decides on bart-1's close and fills any approved entry/exit at bart's open, not the same bar. Backtests will produce numerically different results from any pre-v2.0 pinned fixture. N bars produce N-1 decisions (day 0 has no prior data). BacktestResultshape expanded — new fields:sharpe,sortino,calmar,max_drawdown_duration,avg_recovery_time,mc_success_prob,on_day_errors. Consumers reading the dataclass by field name are unaffected; consumers relying on positional unpacking will break.ConstraintSchemashape expanded — 7 new optional fields with defaults:max_daily_loss_pct(0.02),consistency_rule_pct(0.50),consistency_warn_pct(0.40),trading_cutoff_time("16:20"),min_stop_atr_multiple(1.5),prefer_entry_below_vwap(True),trailing_max_drawdown_pct(None → defaults tomax_drawdown_pct). Old preset JSON without these keys still loads.dashboard/parallel.py.run_parallelsignature — dropped the unusedon_eventsparameter (R8).PresetWorkeris nowCallable[[str], Any].
Phase 3B — Prop trading infra
- F11 trailing drawdown breaker (via Seam 3 — breaker registry)
- F12 challenge phase management (
challenge_pnl_pct→ building / conservation) - F13 daily consistency enforcement (graduated
approaching/breachedsignal) - F14 Monte Carlo challenge simulator (P(pass) projection from realised edge)
Phase 4A — Backtesting quality
- F15 walk-forward windows + OOS aggregation
- F16 transaction cost modelling
- F17 risk-adjusted metrics (Sharpe, Sortino, Calmar)
- F18 drawdown profile analysis (duration, avg recovery)
- F19 regime-conditional decomposition
Dashboard workstream
- A1 Seam 0 bar-timing fidelity (
faf1254) - A2 Phase 4A metrics surfaced in backtest dashboard (
64c59fb) - A3 Monte Carlo P(pass) per configuration (
806953f) - T0 backtest persistence & recovery (
569d1fb) - T1 prompt caching for Player + Coach (
14bede3) - T2 parallel preset execution (
2825167) - T3 Fast / Standard backtest depth toggle (
3f2cf28) - Live backtest progress bar + sparkline + status caption (
bbae90c)
Sessions 1–7 correctness pass
22 of 25 local /code-review findings shipped over 7 sessions.
🟥 Critical (7) — silent bugs in v1.x
- N1 (
5161dc8) —daily_pnl=0.0at decision time disabled daily-loss and consistency breakers pre-LLM. Now passes overnight-gap MTM. - N2 (
b7f36ce) — runner discarded Player's proposedentry_price. Now re-validates direction + RR againsttoday_open, skips on broken geometry. - N3 (
11cb49d) — parallel presets now write to per-strategy artifact dirs (artifacts/<strategy_id>/). - R1 + N13 (
11cb49d) — same-preset slot collision —_persist_presetnow takesslotas parameter, never derives. - R2 (
11cb49d) — recovery banner phantom winner —_bfields areNone, not0.0. - R3 (
4dc057d) — Streamlit widget mutations from worker threads now carryScriptRunContextviathread_inithook inrun_parallel. - N4 (
11cb49d) —_render_metrics_panelKeyErroron partial persistence — iteratesmetrics_panel_slots(m). - N7 (
5161dc8) — day-1consistency_statusfixed as part of N1.
🟧 High (5)
- R4 (
7a3c11c) —artifact["rounds"][-1]empty-listIndexErrorguarded. - N5 (
7a3c11c) — CoachLoop forwards everytokens_usedkey —cache_read_player/cache_read_coachreach the artifact. - N6 (
7a3c11c) — T1 caching plumbed throughreasoning_evaluator.py+ both streaming clones viabuild_cached_system(). - N8 (
11cb49d) —fresh_metrics_state()installed on every Run click — no stalepreset_bleak from prior runs.
🟨 Medium (6) — efficiency + observability
- R5 (
891f234) — narrowedBaseException→Exception(KeyboardInterruptpropagates). - R6 (
891f234) — strict typed encoder inrecovery.save_snapshot— numpy coerce cleanly, unknown types raise. - N9 (
891f234) — sparkline throttled to every 5 bars + final bar viashould_render_sparkline(). - N10 (
891f234) —ConstraintSchema.to_json()caches per-instance — ~1500 redundant dumps eliminated per backtest. - N11 (
891f234) — runner inner-loop — numpy SMA, cachedtrade_stats, single-pass MTM. - N12 (
891f234) —BacktestResult.on_day_errorscounter + log; broken subscribers surface instead of looking frozen.
⬜ Low (4, +3 rehoused)
- R7 (incidentally Session 3) — unused
other_labelparam removed. - R8 / N17 (
3a6e34e) — deadon_eventsAPI surface dropped. - R9 (
3a6e34e) — B-preset success toast added. - R10 (
3a6e34e) —list_recoverable()wrapped in@st.cache_data(ttl=10). - N14 / N15 / N16 deferred to AgentCore backlog D6–D8 (architectural).
Infrastructure
- Seam 0 — backtest fidelity (
8754365) - Seam 3 — breaker registry (
30866aa) - Seam 5 — equity curve + metrics module (
d69ce11) - Streamlit Cloud deploy: Python 3.12 set via Cloud Settings UI (the misleading
.python-versionwas dropped inee586d5).
Verification
- 513 unit tests passing, 1 skipped (was 460 at start of Sessions 1–7)
- Local
pytest: clean - Lint: clean on every file touched
- Streamlit Cloud (Python 3.12): R3 ScriptRunContext attach verified with progress bars updating live across full 18-day Fast run; both R9 success toasts displayed; comparison summary rendered
Installation
pip install player-coach-core==2.0.0v1.1.0 — Phase 3: Evidence Loop, Pattern Memory, Backtest, Reasoning Eval
What's new in v1.1.0
Feature 1 — ConstraintDeriver live connection
Constraints are now derived from accumulated exchange evidence. The Constraints page gains a 'Derive from History' button that reads approved runs, extracts trade size and risk/reward, and updates all constraint sliders in one click.
Feature 3 — Pattern memory
After each exchange, named violations are recorded to coach_memory with confidence scores. Before each PlayerAgent call, relevant patterns for the current symbol are injected into context — cross-session learning without retraining.
Feature 4 — Comparative backtest dashboard page
A fifth dashboard page runs two constraint presets over historical data side by side. Approval rate, average rounds, days aborted, total return, and max drawdown shown with winner highlighted per metric.
Feature 5 — Reasoning quality evaluation
A secondary adversarial check on the Player's reasoning. After each CoachAgent evaluation, a ReasoningEvaluator scores the Player's reasoning on consistency with market state, internal coherence, and proportionality. Score displayed as a coloured badge on each round card.
Bug fixes
- Player now receives Coach critique on revision rounds in all paths
- Pattern memory and patterns injection wired into dashboard streaming path
- Circuit breaker trips now persist abort artifacts to disk and SQLite
- Artifact metadata (daily_pnl_at_time, enriched fields) flushed to disk correctly
- Reasoning score and critique now appear in History page round cards
- Derive from History now correctly seeds constraint form sliders
- anthropic>=0.49.0 required for get_final_usage()
Tests
118 unit tests + 1 integration test, all passing.
v1.0.1 — Dashboard and Streamlit Cloud deployment
First full release. Includes backend, four-page Streamlit dashboard, PyPI package, and Streamlit Cloud deployment. See README for installation and usage.