Releases · MaverickHQ/crucible-player-coach

21 Jun 10:15

v2.0.1

e3ddbb4

v2.0.1 — packaging fix for v2.0.0 Latest

Latest

Packaging-only patch. No runtime code changed. v2.0.0 → v2.0.1 fixes two latent defects that have been present since v1.0.0 but only surface when installing from PyPI (not from git clone, which is how this repo is normally deployed).

What was broken in v2.0.0

pip install player-coach-core==2.0.0 produced a package where:

player_coach/artifacts/ was missing entirely from the wheel. Any from player_coach.loop.coach_loop import CoachLoop chained through player_coach.artifacts.writer.ArtifactWriter and crashed with ModuleNotFoundError.
yfinance wasn't declared in any extra. Even pip install player-coach-core[market] (which had numpy, hmmlearn, arch) couldn't import BacktestRunner — yfinance is needed for OHLCV fetching at module level.

Neither bug was caught by the existing 513-test suite because development happens from git clone where requirements.txt covers everything. They were caught during the post-publish smoke test on a clean venv.

Fixes

Defect	Cause	Fix
D1 — `player_coach/artifacts/` missing from wheel	`.gitignore:12` had unanchored `artifacts/` — hatchling's pathspec parser matched the Python sub-module too, even though `git check-ignore` correctly said it wasn't ignored	`.gitignore`: `artifacts/` → `/artifacts/` (anchor to repo root)
D2 — `yfinance` not declared	`[project.optional-dependencies.market]` listed numpy, hmmlearn, arch — yfinance was missed	`pyproject.toml [market]`: add `yfinance>=0.2.40`

TDD coverage added

Two new tests in tests/unit/test_packaging.py pin both defects so they cannot regress:

test_no_tracked_player_coach_file_matches_gitignore — walks git ls-files player_coach/ and asserts no path is matched by any .gitignore pattern under hatchling-equivalent semantics. Would have failed on v2.0.0.
test_declared_dependencies_cover_runtime_imports — AST-parses every player_coach/**/*.py, asserts every non-stdlib top-level import is in [dependencies] or an [optional-dependencies] table. Would have failed on v2.0.0.

Plus two guardrail tests that were already passing and remain unchanged:

test_pyproject_declares_llm_extra_for_anthropic — pins the CLAUDE.md install ladder
test_tomllib_available — pins the Python 3.11+ floor needed for the static analysis above

Stale test removed

test_python_version_pinned_for_streamlit_cloud (in tests/unit/test_requirements.py) was pinning the .python-version file we removed in ee586d5. That file turned out to be a pyenv-only convention that Streamlit Cloud ignores (Cloud reads its Python version from the per-app Settings UI, not from any repo file). The test was pinning a contract that no longer applies; the deploy-mechanism docs moved to CLAUDE.md's "Deploy environment" section.

Verification

# clean venv, install from PyPI
python3 -m venv /tmp/v
/tmp/v/bin/pip install 'player-coach-core[market,llm]==2.0.1'

# the v2.0.0-broken import chain
/tmp/v/bin/python -c "
from player_coach.loop.coach_loop import CoachLoop
from player_coach.backtest.runner import BacktestRunner
from player_coach.artifacts.writer import ArtifactWriter   # was missing in v2.0.0
print('all imports OK')
"

Test count
513 → 516 (+4 new packaging tests, –1 stale .python-version test = +3 net), 1 skipped, lint clean.

Who needs to upgrade
pip install player-coach-core users — yes, upgrade. v2.0.0 is fundamentally broken on import.
Streamlit Cloud users (this app's actual deploy path) — no change needed. Cloud reads from git, not PyPI.
git clone developers — no change needed; requirements.txt was always covering this.
Installation

pip install --upgrade 'player-coach-core[market,llm]==2.0.1'

---

## Why this body is shorter than v2.0.0's

v2.0.0 shipped features. v2.0.1 ships a packaging correction. The body reflects that — single section per defect, the TDD evidence, verification recipe, and a clear "who needs to upgrade" guide. ~50 lines instead of 100.

Ping me when the GitHub Release is published. The background install will notify me when it finishes (~2 min from when I started it).

Assets 2

21 Jun 08:40

MaverickHQ

v2.0.0

f5ed0c3

v2.0.0 — Phase 3B + 4A + dashboard + Sessions 1–7 correctness

42 commits, +7,570 lines, 96 files since v1.1.0.

Phase 3B + 4A roadmap, dashboard rebuild, and a 7-session local code-review correctness pass. v1.0.1 → v2.0.0 because behavioural semantics changed (see Breaking Changes).

⚠️ Breaking changes

A1 bar-timing (faf1254) — the runner now decides on bar t-1's close and fills any approved entry/exit at bar t's open, not the same bar. Backtests will produce numerically different results from any pre-v2.0 pinned fixture. N bars produce N-1 decisions (day 0 has no prior data).
BacktestResult shape expanded — new fields: sharpe, sortino, calmar, max_drawdown_duration, avg_recovery_time, mc_success_prob, on_day_errors. Consumers reading the dataclass by field name are unaffected; consumers relying on positional unpacking will break.
ConstraintSchema shape expanded — 7 new optional fields with defaults: max_daily_loss_pct (0.02), consistency_rule_pct (0.50), consistency_warn_pct (0.40), trading_cutoff_time ("16:20"), min_stop_atr_multiple (1.5), prefer_entry_below_vwap (True), trailing_max_drawdown_pct (None → defaults to max_drawdown_pct). Old preset JSON without these keys still loads.
dashboard/parallel.py.run_parallel signature — dropped the unused on_events parameter (R8). PresetWorker is now Callable[[str], Any].

Phase 3B — Prop trading infra

F11 trailing drawdown breaker (via Seam 3 — breaker registry)
F12 challenge phase management (challenge_pnl_pct → building / conservation)
F13 daily consistency enforcement (graduated approaching / breached signal)
F14 Monte Carlo challenge simulator (P(pass) projection from realised edge)

Phase 4A — Backtesting quality

F15 walk-forward windows + OOS aggregation
F16 transaction cost modelling
F17 risk-adjusted metrics (Sharpe, Sortino, Calmar)
F18 drawdown profile analysis (duration, avg recovery)
F19 regime-conditional decomposition

Dashboard workstream

A1 Seam 0 bar-timing fidelity (faf1254)
A2 Phase 4A metrics surfaced in backtest dashboard (64c59fb)
A3 Monte Carlo P(pass) per configuration (806953f)
T0 backtest persistence & recovery (569d1fb)
T1 prompt caching for Player + Coach (14bede3)
T2 parallel preset execution (2825167)
T3 Fast / Standard backtest depth toggle (3f2cf28)
Live backtest progress bar + sparkline + status caption (bbae90c)

Sessions 1–7 correctness pass

22 of 25 local /code-review findings shipped over 7 sessions.

🟥 Critical (7) — silent bugs in v1.x

N1 (5161dc8) — daily_pnl=0.0 at decision time disabled daily-loss and consistency breakers pre-LLM. Now passes overnight-gap MTM.
N2 (b7f36ce) — runner discarded Player's proposed entry_price. Now re-validates direction + RR against today_open, skips on broken geometry.
N3 (11cb49d) — parallel presets now write to per-strategy artifact dirs (artifacts/<strategy_id>/).
R1 + N13 (11cb49d) — same-preset slot collision — _persist_preset now takes slot as parameter, never derives.
R2 (11cb49d) — recovery banner phantom winner — _b fields are None, not 0.0.
R3 (4dc057d) — Streamlit widget mutations from worker threads now carry ScriptRunContext via thread_init hook in run_parallel.
N4 (11cb49d) — _render_metrics_panel KeyError on partial persistence — iterates metrics_panel_slots(m).
N7 (5161dc8) — day-1 consistency_status fixed as part of N1.

🟧 High (5)

R4 (7a3c11c) — artifact["rounds"][-1] empty-list IndexError guarded.
N5 (7a3c11c) — CoachLoop forwards every tokens_used key — cache_read_player / cache_read_coach reach the artifact.
N6 (7a3c11c) — T1 caching plumbed through reasoning_evaluator.py + both streaming clones via build_cached_system().
N8 (11cb49d) — fresh_metrics_state() installed on every Run click — no stale preset_b leak from prior runs.

🟨 Medium (6) — efficiency + observability

R5 (891f234) — narrowed BaseException → Exception (KeyboardInterrupt propagates).
R6 (891f234) — strict typed encoder in recovery.save_snapshot — numpy coerce cleanly, unknown types raise.
N9 (891f234) — sparkline throttled to every 5 bars + final bar via should_render_sparkline().
N10 (891f234) — ConstraintSchema.to_json() caches per-instance — ~1500 redundant dumps eliminated per backtest.
N11 (891f234) — runner inner-loop — numpy SMA, cached trade_stats, single-pass MTM.
N12 (891f234) — BacktestResult.on_day_errors counter + log; broken subscribers surface instead of looking frozen.

⬜ Low (4, +3 rehoused)

R7 (incidentally Session 3) — unused other_label param removed.
R8 / N17 (3a6e34e) — dead on_events API surface dropped.
R9 (3a6e34e) — B-preset success toast added.
R10 (3a6e34e) — list_recoverable() wrapped in @st.cache_data(ttl=10).
N14 / N15 / N16 deferred to AgentCore backlog D6–D8 (architectural).

Infrastructure

Seam 0 — backtest fidelity (8754365)
Seam 3 — breaker registry (30866aa)
Seam 5 — equity curve + metrics module (d69ce11)
Streamlit Cloud deploy: Python 3.12 set via Cloud Settings UI (the misleading .python-version was dropped in ee586d5).

Verification

513 unit tests passing, 1 skipped (was 460 at start of Sessions 1–7)
Local pytest: clean
Lint: clean on every file touched
Streamlit Cloud (Python 3.12): R3 ScriptRunContext attach verified with progress bars updating live across full 18-day Fast run; both R9 success toasts displayed; comparison summary rendered

Installation

pip install player-coach-core==2.0.0

Assets 2

24 May 09:35

MaverickHQ

v1.1.0

2b3ed82

v1.1.0 — Phase 3: Evidence Loop, Pattern Memory, Backtest, Reasoning Eval

What's new in v1.1.0

Feature 1 — ConstraintDeriver live connection

Constraints are now derived from accumulated exchange evidence. The Constraints page gains a 'Derive from History' button that reads approved runs, extracts trade size and risk/reward, and updates all constraint sliders in one click.

Feature 3 — Pattern memory

After each exchange, named violations are recorded to coach_memory with confidence scores. Before each PlayerAgent call, relevant patterns for the current symbol are injected into context — cross-session learning without retraining.

Feature 4 — Comparative backtest dashboard page

A fifth dashboard page runs two constraint presets over historical data side by side. Approval rate, average rounds, days aborted, total return, and max drawdown shown with winner highlighted per metric.

Feature 5 — Reasoning quality evaluation

A secondary adversarial check on the Player's reasoning. After each CoachAgent evaluation, a ReasoningEvaluator scores the Player's reasoning on consistency with market state, internal coherence, and proportionality. Score displayed as a coloured badge on each round card.

Bug fixes

Player now receives Coach critique on revision rounds in all paths
Pattern memory and patterns injection wired into dashboard streaming path
Circuit breaker trips now persist abort artifacts to disk and SQLite
Artifact metadata (daily_pnl_at_time, enriched fields) flushed to disk correctly
Reasoning score and critique now appear in History page round cards
Derive from History now correctly seeds constraint form sliders
anthropic>=0.49.0 required for get_final_usage()

Tests

118 unit tests + 1 integration test, all passing.

Assets 2

17 May 19:16

MaverickHQ

v1.0.1

d837140

v1.0.1 — Dashboard and Streamlit Cloud deployment

First full release. Includes backend, four-page Streamlit dashboard, PyPI package, and Streamlit Cloud deployment. See README for installation and usage.

Assets 3

Releases: MaverickHQ/crucible-player-coach

v2.0.1 — packaging fix for v2.0.0

What was broken in v2.0.0

Fixes

TDD coverage added

Stale test removed

Verification

Uh oh!

v2.0.0 — Phase 3B + 4A + dashboard + Sessions 1–7 correctness

⚠️ Breaking changes

Phase 3B — Prop trading infra

Phase 4A — Backtesting quality

Dashboard workstream

Sessions 1–7 correctness pass

🟥 Critical (7) — silent bugs in v1.x

🟧 High (5)

🟨 Medium (6) — efficiency + observability

⬜ Low (4, +3 rehoused)

Infrastructure

Verification

Installation

Uh oh!

v1.1.0 — Phase 3: Evidence Loop, Pattern Memory, Backtest, Reasoning Eval

What's new in v1.1.0

Feature 1 — ConstraintDeriver live connection

Feature 3 — Pattern memory

Feature 4 — Comparative backtest dashboard page

Feature 5 — Reasoning quality evaluation

Bug fixes

Tests

Uh oh!

v1.0.1 — Dashboard and Streamlit Cloud deployment

Uh oh!