feat(research): L1995 Phase 5 — consume standalone scanner candidates.json (L4464 timeout fix)#256
Merged
Merged
Conversation
….json (L4464 timeout fix)
Research now reads s3://.../candidates/{run_date}/candidates.json (written
by the standalone Scanner SF state) and feeds the sector teams the
pre-filtered candidate set (~60) ∪ the held population — instead of the
raw ~900-by-sector slice (92-217/sector) that overran the Lambda recursion
budget (every quant ReAct agent hit recursion_limit → 0 picks → retry
storm → 900s timeout; signals.json stale since 2026-05-22).
- archive/manager.py: load_candidates_json(run_date) reader.
- graph/research_graph.py: _resolve_agent_input_set helper (scanner_tickers
∪ population_tickers; held pop sourced from Research state, NOT the
cold-start-empty candidates.json::population). fetch_data wires it into
the new ResearchState.agent_input_set. scanner_universe RETAINED (full)
for the exit_evaluator constituent whitelist.
- sector_team.py / dry_run.py: get_team_tickers screens ctx.agent_input_set
(~10/sector → converges first attempt), retiring the raw-universe handoff.
- Fail-loud: missing/empty candidates.json raises (no silent fallback to
the raw ~900). ALPHA_ENGINE_DRY_RUN_STUB sentinel (set by the stub/offline
installers only) relaxes to a full-universe fallback for wiring validation;
prod never sets it.
- Tests: tests/test_scanner_cutover_phase5.py (union, held-pop retention,
size bound, hard-fail w/o sentinel, fallback w/ sentinel, reader, source
contract). Suite 1663 → 1672.
Perf cleanup (skip_dry_run_gate in the scheduled path) + the SF topology
fix (Predictor parallel to Scanner) + the CW timeout alarm ship in the
companion alpha-engine-data PR. Deploys via Lambda image rebuild on merge.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cipher813
added a commit
that referenced
this pull request
May 30, 2026
…L4464 recovery) (#257) The 2026-05-30 L4464 recovery failed at Research: the standalone Scanner wrote candidates/2026-05-30/ (calendar date, from the SF's date(Execution.StartTime)), but Research reads candidates/2026-05-29/ (its trading day, most_recent_trading_day(today)) — the same axis used by signals.json, sector_team_runs, and scanner_evaluations. The Phase-5 cutover's fail-loud (research #256) correctly caught the producer/consumer date-axis mismatch instead of silently producing nothing. Fix: scanner_handler normalizes run_date to the trading day via the alpha_engine_lib.trading_calendar chokepoint (on-or-before semantics: Sat/Sun/holiday → most recent trading day; trading day → unchanged). Now the Scanner and Research key off the identical date, and the write matches what ARTIFACT_REGISTRY already expects (scanner_candidates_json was flipped to the trading_day axis in config #356). Per DATE_CONVENTIONS — every trade artifact keys by trading day. Tests: test_run_date_normalized_to_trading_day (Sat→Fri, Sun→Fri, Fri→Fri) + existing happy/threading assertions updated. Suite 1672 → 1673. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes the 2026-05-30 Research
States.Timeout(900s Lambda hard ceiling;signals.jsonstale since 2026-05-22). Root cause was a convergence pathology, not research growth: the sector-team quant ReAct agents were handed their full sector slice of the raw ~903 universe (92–217 tickers) with ~9–10 reasoning iterations → every agent hit the recursion limit → 0 picks →retry-on-emptystorm → overran 15 min.This completes the already-decided L1995 scanner cutover (plan:
alpha-engine-docs/private/scanner-cutover-phase5-260530.md): Research consumes the standalone Scanner SF state'scandidates.jsonand screens the pre-filtered set instead of the raw universe.Changes
archive/manager.py—load_candidates_json(run_date)reader.graph/research_graph.py—_resolve_agent_input_sethelper:scanner_tickers ∪ population_tickers(held population sourced from Research's own state, not the cold-start-emptycandidates.json::population). NewResearchState.agent_input_set.scanner_universeretained (full) for theexit_evaluatorconstituent whitelist.sector_team.py/dry_run.py—get_team_tickersscreensctx.agent_input_set(~10/sector → converges first attempt); raw-universe handoff retired.candidates.jsonraises (no silent fallback to the raw ~900). TheALPHA_ENGINE_DRY_RUN_STUBsentinel (set only by the stub/offline installers) relaxes to a full-universe fallback for wiring validation; prod never sets it.Tests
tests/test_scanner_cutover_phase5.py(9): union, held-pop retention, size bound (≤65 not ~900), hard-fail without sentinel, fallback with sentinel, reader key/parse, source contract. Full suite 1663 → 1672 passing.Companion / deploy
skip_dry_run_gateperf cleanup + SF topology fix (Predictor ∥ Scanner) + CW timeout alarm ship in the alpha-engine-data PR.🤖 Generated with Claude Code