Skip to content

feat(research): L1995 Phase 5 — consume standalone scanner candidates.json (L4464 timeout fix)#256

Merged
cipher813 merged 1 commit into
mainfrom
feat/l1995-phase5-scanner-cutover
May 30, 2026
Merged

feat(research): L1995 Phase 5 — consume standalone scanner candidates.json (L4464 timeout fix)#256
cipher813 merged 1 commit into
mainfrom
feat/l1995-phase5-scanner-cutover

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Fixes the 2026-05-30 Research States.Timeout (900s Lambda hard ceiling; signals.json stale since 2026-05-22). Root cause was a convergence pathology, not research growth: the sector-team quant ReAct agents were handed their full sector slice of the raw ~903 universe (92–217 tickers) with ~9–10 reasoning iterations → every agent hit the recursion limit → 0 picks → retry-on-empty storm → overran 15 min.

This completes the already-decided L1995 scanner cutover (plan: alpha-engine-docs/private/scanner-cutover-phase5-260530.md): Research consumes the standalone Scanner SF state's candidates.json and screens the pre-filtered set instead of the raw universe.

Changes

  • archive/manager.pyload_candidates_json(run_date) reader.
  • graph/research_graph.py_resolve_agent_input_set helper: scanner_tickers ∪ population_tickers (held population sourced from Research's own state, not the cold-start-empty candidates.json::population). New ResearchState.agent_input_set. scanner_universe retained (full) for the exit_evaluator constituent whitelist.
  • sector_team.py / dry_run.pyget_team_tickers screens ctx.agent_input_set (~10/sector → converges first attempt); raw-universe handoff retired.
  • Fail-loud — missing/empty candidates.json raises (no silent fallback to the raw ~900). The ALPHA_ENGINE_DRY_RUN_STUB sentinel (set only by the stub/offline installers) relaxes to a full-universe fallback for wiring validation; prod never sets it.

Tests

tests/test_scanner_cutover_phase5.py (9): union, held-pop retention, size bound (≤65 not ~900), hard-fail without sentinel, fallback with sentinel, reader key/parse, source contract. Full suite 1663 → 1672 passing.

Companion / deploy

  • skip_dry_run_gate perf cleanup + SF topology fix (Predictor ∥ Scanner) + CW timeout alarm ship in the alpha-engine-data PR.
  • Deploys via Lambda image rebuild on merge. Held for review.

🤖 Generated with Claude Code

….json (L4464 timeout fix)

Research now reads s3://.../candidates/{run_date}/candidates.json (written
by the standalone Scanner SF state) and feeds the sector teams the
pre-filtered candidate set (~60) ∪ the held population — instead of the
raw ~900-by-sector slice (92-217/sector) that overran the Lambda recursion
budget (every quant ReAct agent hit recursion_limit → 0 picks → retry
storm → 900s timeout; signals.json stale since 2026-05-22).

- archive/manager.py: load_candidates_json(run_date) reader.
- graph/research_graph.py: _resolve_agent_input_set helper (scanner_tickers
  ∪ population_tickers; held pop sourced from Research state, NOT the
  cold-start-empty candidates.json::population). fetch_data wires it into
  the new ResearchState.agent_input_set. scanner_universe RETAINED (full)
  for the exit_evaluator constituent whitelist.
- sector_team.py / dry_run.py: get_team_tickers screens ctx.agent_input_set
  (~10/sector → converges first attempt), retiring the raw-universe handoff.
- Fail-loud: missing/empty candidates.json raises (no silent fallback to
  the raw ~900). ALPHA_ENGINE_DRY_RUN_STUB sentinel (set by the stub/offline
  installers only) relaxes to a full-universe fallback for wiring validation;
  prod never sets it.
- Tests: tests/test_scanner_cutover_phase5.py (union, held-pop retention,
  size bound, hard-fail w/o sentinel, fallback w/ sentinel, reader, source
  contract). Suite 1663 → 1672.

Perf cleanup (skip_dry_run_gate in the scheduled path) + the SF topology
fix (Predictor parallel to Scanner) + the CW timeout alarm ship in the
companion alpha-engine-data PR. Deploys via Lambda image rebuild on merge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 031e7a1 into main May 30, 2026
2 checks passed
@cipher813 cipher813 deleted the feat/l1995-phase5-scanner-cutover branch May 30, 2026 15:57
cipher813 added a commit that referenced this pull request May 30, 2026
…L4464 recovery) (#257)

The 2026-05-30 L4464 recovery failed at Research: the standalone Scanner
wrote candidates/2026-05-30/ (calendar date, from the SF's
date(Execution.StartTime)), but Research reads candidates/2026-05-29/
(its trading day, most_recent_trading_day(today)) — the same axis used by
signals.json, sector_team_runs, and scanner_evaluations. The Phase-5
cutover's fail-loud (research #256) correctly caught the producer/consumer
date-axis mismatch instead of silently producing nothing.

Fix: scanner_handler normalizes run_date to the trading day via the
alpha_engine_lib.trading_calendar chokepoint (on-or-before semantics:
Sat/Sun/holiday → most recent trading day; trading day → unchanged). Now
the Scanner and Research key off the identical date, and the write matches
what ARTIFACT_REGISTRY already expects (scanner_candidates_json was flipped
to the trading_day axis in config #356).

Per DATE_CONVENTIONS — every trade artifact keys by trading day.

Tests: test_run_date_normalized_to_trading_day (Sat→Fri, Sun→Fri,
Fri→Fri) + existing happy/threading assertions updated. Suite 1672 → 1673.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant