PR #67 follow-ups: run-lifecycle unification, heartbeat recovery, sys.exit deep fix, green CI#71
Merged
Conversation
…ipped bundle pages/not-found.tsx was fully dead (App.tsx renders its own inline 404) and ui/card.tsx survived the LOW sweep only as its import — both deleted; tsc + vite build verified green. The committed frontend bundle was still the PRE-prune build: its CSS (107.76 kB) predates the 65-file component deletion (3618b6a). Swapped in the fresh assets (CSS 31.24 kB, JS byte-identical) and updated index.html's references, dropping the now-redundant ?v= cache-busters (filenames are content-hashed and vercel.json serves /assets/* immutable, / no-cache). Headless smoke: root renders, auth modal present, no console errors. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…TH backstop in CI B0/B1 follow-ups: _owner_context's X-Session-Id fallback makes session_id an ownership credential for import-created agents — document that 'session_id is never a credential' only holds for built-in agents. CI gets an explicit DATABASE_PATH env so a conftest isolation regression can never write into the committed seed DB. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
H4 follow-up. RunStore (protocol steps) and BacktestDatabase (equity/trades) share one file; under the rollback journal a finalize's heavy writes block every reader. WAL lets readers proceed during commits — the actual concurrency shape here. Best-effort with fallback for filesystems without shm support; journal_mode persists in the file so one switch at construction covers all later connections from either layer (RunStore's own _enable_wal lands with the lifecycle-unification commit that reworks that module). Verified on WSL drvfs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
… sweep, heartbeat recovery Closes the three H4 reconciliation follow-ups in one mechanism: v2 runs now write protocol_runs rows through their whole lifecycle (create=loading, load-failure/finalize/cancel update the row), so * the per-agent active-run cap counts run_store.count_active_runs() on BOTH surfaces under the SHARED v1 create lock — an agent can no longer hold 2× MAX_ACTIVE_RUNS_PER_AGENT by splitting across v1+v2. The v2 create path first reconciles its own finished backends (passive is_active() peek only; live runs are never touched under the lock); * the reaper reaches v2: reap_runs() drives sweeps registered via register_reaper_sweep() (composition root registers reap_v2_runs — domain never imports api/*). The sweep drains abandoned v2 runs through elapsed deadlines and swaps terminal backends for a DB-backed ArchivedBacktestBackend tombstone: market-data buffers freed, while status/result/decisions/idempotent-replay stay serviceable — and v2's _require_run rehydrates terminal rows post-restart (owners see 'failed' instead of a 404; anti-oracle 404 shape preserved for foreign probes); * startup orphan recovery now covers v2 rows for free, and multi-worker deployments get heartbeat-based recovery: rows carry owner_instance/heartbeat_at (schema migrated in place), every reaper pass heartbeats runs whose session is live in this process, and fail_stale_runs() only fails runs nobody heartbeats (RUN_HEARTBEAT_STALE_SECONDS, default 300s; blanket RUN_RECOVERY_ON_STARTUP stays for single-process). 16 new tests (tests/test_run_lifecycle_unification.py) pin the cross-surface cap both directions, sweep archival + deadline draining, restart rehydration, oracle shape, heartbeat stamping/refresh, stale recovery, and the legacy schema migration. FakeBackend gained an honest is_active(); test_active_run_count_is_side_effect_free re-pinned to the ledger-based counting (passivity invariant unchanged). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…t() (B0 deep fix) The class fix behind B0 and its recurring guard sites: AlpacaDataLoader (missing creds / missing alpaca-py), BaselineGenerator (same) and HourlyBacktester.load_data (no data) used to sys.exit(1). SystemExit is a BaseException — it evaded every 'except Exception' server boundary, silently killed daemon loader threads, and wedged the ASGI loop. They now raise MarketDataUnavailableError (a plain RuntimeError subclass); the CLI entrypoints (backtest_hourly_agent, backtest_custom_algo) translate it to exit code 1 at the process boundary. The existing (Exception, SystemExit) guards stay as defense-in-depth; their comments now describe them as such. app.py's paper-baseline init, the legacy /backtest/run background thread, and the algo-service subprocess path are covered by their existing 'except Exception' handlers with no local changes. Tests: tests/test_market_data_errors.py (red-green) pins raise-not-exit for all three libraries, the plain-Exception contract, and the CLI boundary. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…_TOKENS
LOW-sweep residual: the per-request output-token ceiling is an env knob that
changes a run's spend/behavior, but nothing recorded which value was in
force. agent_runs gains a JSON metadata column (added in BOTH _init_schema
and _migrate_schema, per the schema convention); insert_run accepts an
optional metadata dict; every agent_runs reader returns it parsed (one
_parse_run_row helper) so listings and get_run agree. The engine's LLM
agent run records {llm_max_output_tokens: <effective value>}; rule-based
runs record nothing. 6 red-green tests incl. legacy-schema migration.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
The new CI ran RED on 5 failures that predate this PR (they fail on main identically). All five pinned contracts the product deliberately no longer has: * GET /runs is a PUBLIC listing (route docstring; the dashboard home page fetches it session-less for the seed runs) — the strict per-session listing tests now pin the real contract: public metadata listing, session-scoped per-run data access, middleware 400 on malformed session ids for scoped routes (fixture also now patches the backtests router's import-time db binding, which the old tests silently missed). * /runs/latest/metrics filters agent_name == 'Agent' (the internal hourly agent); the old fixture's 'Agent A'/'Agent B' rows could never match. * create_safe_prompt's strategy body was rewritten; the old CANNOT-phrased constraint sentences only exist in a retired commented-out prompt. The active prompt regains a compact no-tools constraint line (parity with CUSTOM_STRATEGY_OUTPUT_CONTRACT — tightens, does not loosen) and the test pins the current wording; the hard boundary remains response-side validate_llm_response enforcement (unchanged, still pinned). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…ise contract test_missing_credentials_exits pinned the retired sys.exit(1) behavior; it now pins MarketDataUnavailableError (companion to the B0 deep-fix commit). Full backend suite: 829 passed, 0 failed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…fication
A 50-agent review (5 lenses, 3-vote refute panels) of the follow-ups diff
confirmed 12 findings; the real defects, all fixed red-green:
* CRITICAL cancel-clobber: POST /v2/runs/{id}/cancel hardcoded status
"closed" into the run's ledger row. A cleanup cancel landing after the
final decision (before the sweep archives the backend) downgraded a
genuinely completed run to "closed" — in the DB row AND the in-memory
session — permanently hiding its metrics. cancel_run now no-ops on
terminal runs and reports the run's true state; BacktestBackend.cancel()
guards the terminal check under the session lock (closes the peek→cancel
race with a finalizing submit).
* _archive_run no longer swaps in the tombstone when the terminal row write
fails — the backend stays live so the next sweep retries, instead of the
row freezing in an active status with nothing left to reconcile it.
* v2 rows now leave "loading" (→ "running") when the background market-data
load succeeds.
* RunStore's heartbeat-column migration tolerates losing the probe→ALTER
race to a concurrently-starting worker (duplicate-column errors are the
goal state, not a crash).
* ArchivedBacktestBackend.from_record recovers step_index/total_steps for
completed runs from the decision log (restart-rehydrated runs reported a
misleading 0/0).
* test_active_run_count_is_side_effect_free regained its count assertion
(ledger-based) alongside the passivity invariants.
Accepted + documented in code: cross-surface create serialization on the
shared lock, and deliberate v2<->v1 cross-surface run visibility.
Suites after fixes: backend 837 passed / 0 failed; packaging 41 passed.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
Upstream merged PR #67 plus two landing refreshes (#69/#70) while this follow-up pass was in flight. Conflicts were confined to the shipped frontend bundle: upstream's refresh was built before this branch's dead component prune, and this branch's refresh predated upstream's brand changes (Hero/Navbar/index.css, atltransparent logo, Inter 800). Resolved by REBUILDING the bundle from the merged landing source — brand changes + prune together (tsc green, vite build green, index-CoQ1VbAs.css 31.71 kB, headless smoke: renders, auth modal, 0 console errors) — and pointing index.html at the rebuilt hashed assets (content hashes + the vercel.json immutable/no-cache split make the ?v= params redundant). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
… reads, bounded registry Seven findings from the PR #71 review, each red-green pinned in tests/test_pr71_review_fixes.py: 1. cancel-during-loading race — load_market_data published waiting_decision WITHOUT _step_lock, defeating cancel()'s lock guard; the background _load also wrote failed/running unconditionally. A cancel landing during the load was silently reverted (run kept running, cap slot freed, client misled). Both load paths now respect an already-terminal (cancelled) status under the lock. 2. archived status() dropped baseline_run_ids/compare_url — the live get_status() returns them for a completed run but the tombstone omitted them, so the same endpoint changed shape once the reaper archived the run. ArchivedBacktestBackend.status() now rebuilds both from the persisted baseline columns (shared build_compare_url helper). 3. rehydrated failed/closed runs reported 0/0 steps — from_record only recovered counts for completed runs (from the decision log). protocol_runs now persists step_index/total_steps at archive time; from_record reads them (decision-log fallback kept for legacy rows). 4. cancel hung during finalize — cancel() blocked on _step_lock held through _finalize's baseline generation (seconds-to-minutes). _finalize now publishes "completed" before the best-effort baseline block, and cancel() short-circuits on a lock-free terminal read. 5. v2 _runs never pruned — the reaper now evicts prior-pass archived tombstones (a later read rehydrates from the terminal row), bounding _runs by ACTIVE runs instead of total historical runs. 6/7. dedupe — extracted a shared enable_wal(db_path) helper (was duplicated verbatim across both DB layers) and a single TERMINAL_STATUSES constant in execution/base.py (was defined three times). Suite: 843 passed / 0 failed; packaging 41. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01KoZVnrSPdYNyrxSfrUwaok
…eanup Quality-only cleanups (no behavior change; suite 843/0). From a 4-lens review of the PR diff: - Extract shared build_final_metrics(run) in external_run_service (mirrors build_compare_url); the 9-key completed-run metrics dict was duplicated verbatim in ArchivedBacktestBackend.status(). - BacktestBackend.advance() calls session.drain_expired() instead of get_current_step(), which rebuilt + discarded a full market snapshot under _step_lock on every reaper pass, per live run. - _archive_run derived step_index/total_steps twice with divergent is-None vs or fallbacks (disagreeing on a real 0) — collapse to one helper. - Drop the dead `not isinstance(backend, ArchivedBacktestBackend)` guard in cancel_run (active is read from is_active(), which a tombstone always reports False, so active already implies a live backend). - reap_runs and the _v2_fakes test double use the TERMINAL_STATUSES constant instead of hardcoded status tuples (the constant's own anti-drift purpose). - get_runs_by_mode routes through _parse_run_row like the other five agent_runs readers (no consumer depends on the raw-string metadata shape). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01KoZVnrSPdYNyrxSfrUwaok
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes out every open follow-up recorded on
docs/reviews/PR67-FIX-CHECKLIST.mdfrom the PR #67 review cycle. PR #67 merged while this pass was in flight, so these land as their own PR (upstreammain, including the #69/#70 landing refreshes, is merged in — the shipped bundle was rebuilt from the merged source so the brand changes and the dead-component prune ship together).What's here
Run-lifecycle unification (H4 follow-ups) — v2 runs now write
protocol_runsrows through their whole lifecycle, which yields three fixes with one mechanism:run_store.count_active_runs()under the shared v1 create lock — an agent can no longer hold 2×MAX_ACTIVE_RUNS_PER_AGENTby splitting across v1+v2.register_reaper_sweep, wired in the composition root sodomain/never importsapi/) that drains abandoned runs through elapsed deadlines and swaps terminal backends for DB-backedArchivedBacktestBackendtombstones — market-data buffers freed while status/result/decisions/idempotent-replay stay serviceable.failedinstead of a 404; the anti-oracle 404 shape for foreign probes is preserved and pinned).Multi-worker recovery hardening —
protocol_runsgainsowner_instance/heartbeat_at(migrated in place, ALTER race-tolerant); the reaper heartbeats every run whose session is live in this process;fail_stale_runs()(RUN_HEARTBEAT_STALE_SECONDS, default 300s) only fails runs nobody heartbeats — a sibling worker's live runs are safe, unlike the blanket startup UPDATE (which stays, for single-process).sys.exit()deep fix (B0 class) —AlpacaDataLoader/BaselineGenerator/HourlyBacktester.load_dataraiseMarketDataUnavailableError(plain exception) instead of exiting; the backtest CLIs translate it to exit-code 1 at the process boundary. The(Exception, SystemExit)guards remain as defense-in-depth.SQLite WAL on the shared DB file (both connection layers, best-effort with fallback).
agent_runs.metadataJSON column — records the effectiveLLM_MAX_OUTPUT_TOKENSper LLM run (init + migrate paths, parsed consistently by every reader).Green CI end to end — the 5 pre-existing failures were stale expectations (public
/runslisting is a deliberate 2026-05-25 product decision;/runs/latest/metricsfiltersagent_name == 'Agent'; the safe prompt was rewritten). Tests re-pinned to the real contracts; the active prompt regained a compact no-tools constraint line (tightens, never loosens — enforcement stays response-side invalidate_llm_response). Suite: 837 passed / 0 failed; packaging 41.Landing — dead
pages/not-found.tsx+ui/card.tsxdeleted; shipped bundle rebuilt from the merged source (tsc + vite green, headless smoke: renders, auth modal, 0 console errors).Docs/CI —
_owner_contextsession-credential caveat documented; CIDATABASE_PATHbackstop; checklist updated with the full follow-up record.Review
A 50-agent adversarial workflow (5 lens finders + 3-vote refute panels over 15 findings) reviewed the full diff; 12 confirmed findings were all resolved before this PR — including a critical cancel-clobber (a cleanup cancel racing the final decision downgraded a completed run's ledger row to
closed, hiding its metrics). 30+ new red-green tests acrosstest_run_lifecycle_unification.py,test_market_data_errors.py,test_agent_runs_metadata.py,test_sqlite_wal.py.🤖 Generated with Claude Code
https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu