feat(research): L1995 Phase 5 — consume standalone scanner candidates.json (L4464 timeout fix) by cipher813 · Pull Request #256 · cipher813/alpha-engine-research

cipher813 · 2026-05-30T15:53:33Z

Fixes the 2026-05-30 Research States.Timeout (900s Lambda hard ceiling; signals.json stale since 2026-05-22). Root cause was a convergence pathology, not research growth: the sector-team quant ReAct agents were handed their full sector slice of the raw ~903 universe (92–217 tickers) with ~9–10 reasoning iterations → every agent hit the recursion limit → 0 picks → retry-on-empty storm → overran 15 min.

This completes the already-decided L1995 scanner cutover (plan: alpha-engine-docs/private/scanner-cutover-phase5-260530.md): Research consumes the standalone Scanner SF state's candidates.json and screens the pre-filtered set instead of the raw universe.

Changes

archive/manager.py — load_candidates_json(run_date) reader.
graph/research_graph.py — _resolve_agent_input_set helper: scanner_tickers ∪ population_tickers (held population sourced from Research's own state, not the cold-start-empty candidates.json::population). New ResearchState.agent_input_set. scanner_universe retained (full) for the exit_evaluator constituent whitelist.
sector_team.py / dry_run.py — get_team_tickers screens ctx.agent_input_set (~10/sector → converges first attempt); raw-universe handoff retired.
Fail-loud — missing/empty candidates.json raises (no silent fallback to the raw ~900). The ALPHA_ENGINE_DRY_RUN_STUB sentinel (set only by the stub/offline installers) relaxes to a full-universe fallback for wiring validation; prod never sets it.

Tests

tests/test_scanner_cutover_phase5.py (9): union, held-pop retention, size bound (≤65 not ~900), hard-fail without sentinel, fallback with sentinel, reader key/parse, source contract. Full suite 1663 → 1672 passing.

Companion / deploy

skip_dry_run_gate perf cleanup + SF topology fix (Predictor ∥ Scanner) + CW timeout alarm ship in the alpha-engine-data PR.
Deploys via Lambda image rebuild on merge. Held for review.

🤖 Generated with Claude Code

….json (L4464 timeout fix) Research now reads s3://.../candidates/{run_date}/candidates.json (written by the standalone Scanner SF state) and feeds the sector teams the pre-filtered candidate set (~60) ∪ the held population — instead of the raw ~900-by-sector slice (92-217/sector) that overran the Lambda recursion budget (every quant ReAct agent hit recursion_limit → 0 picks → retry storm → 900s timeout; signals.json stale since 2026-05-22). - archive/manager.py: load_candidates_json(run_date) reader. - graph/research_graph.py: _resolve_agent_input_set helper (scanner_tickers ∪ population_tickers; held pop sourced from Research state, NOT the cold-start-empty candidates.json::population). fetch_data wires it into the new ResearchState.agent_input_set. scanner_universe RETAINED (full) for the exit_evaluator constituent whitelist. - sector_team.py / dry_run.py: get_team_tickers screens ctx.agent_input_set (~10/sector → converges first attempt), retiring the raw-universe handoff. - Fail-loud: missing/empty candidates.json raises (no silent fallback to the raw ~900). ALPHA_ENGINE_DRY_RUN_STUB sentinel (set by the stub/offline installers only) relaxes to a full-universe fallback for wiring validation; prod never sets it. - Tests: tests/test_scanner_cutover_phase5.py (union, held-pop retention, size bound, hard-fail w/o sentinel, fallback w/ sentinel, reader, source contract). Suite 1663 → 1672. Perf cleanup (skip_dry_run_gate in the scheduled path) + the SF topology fix (Predictor parallel to Scanner) + the CW timeout alarm ship in the companion alpha-engine-data PR. Deploys via Lambda image rebuild on merge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…L4464 recovery) (#257) The 2026-05-30 L4464 recovery failed at Research: the standalone Scanner wrote candidates/2026-05-30/ (calendar date, from the SF's date(Execution.StartTime)), but Research reads candidates/2026-05-29/ (its trading day, most_recent_trading_day(today)) — the same axis used by signals.json, sector_team_runs, and scanner_evaluations. The Phase-5 cutover's fail-loud (research #256) correctly caught the producer/consumer date-axis mismatch instead of silently producing nothing. Fix: scanner_handler normalizes run_date to the trading day via the alpha_engine_lib.trading_calendar chokepoint (on-or-before semantics: Sat/Sun/holiday → most recent trading day; trading day → unchanged). Now the Scanner and Research key off the identical date, and the write matches what ARTIFACT_REGISTRY already expects (scanner_candidates_json was flipped to the trading_day axis in config #356). Per DATE_CONVENTIONS — every trade artifact keys by trading day. Tests: test_run_date_normalized_to_trading_day (Sat→Fri, Sun→Fri, Fri→Fri) + existing happy/threading assertions updated. Suite 1672 → 1673. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cipher813 merged commit 031e7a1 into main May 30, 2026
2 checks passed

cipher813 deleted the feat/l1995-phase5-scanner-cutover branch May 30, 2026 15:57

This was referenced May 30, 2026

feat(sf): research perf (skip_dry_run_gate) + named timeout alarm (L4464) cipher813/alpha-engine-data#350

Merged

fix(scanner): key candidates.json by trading day, not calendar date (L4464 recovery) #257

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(research): L1995 Phase 5 — consume standalone scanner candidates.json (L4464 timeout fix)#256

feat(research): L1995 Phase 5 — consume standalone scanner candidates.json (L4464 timeout fix)#256
cipher813 merged 1 commit into
mainfrom
feat/l1995-phase5-scanner-cutover

cipher813 commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cipher813 commented May 30, 2026

Changes

Tests

Companion / deploy

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant