fix(research): all-agents-strict — long 429 retry, hard-fail, stub quarantine (reworks #193/#194)#195
Merged
Merged
Conversation
…arantine (reworks #193/#194) Authoritative directive (Brian, 2026-05-16) — supersedes #193/#194's degrade-and-continue philosophy: "If the sector agents don't run, Research shouldn't complete until all sectors are run. We should have a long retry mechanism and after this long period if we still don't have all sectors it should fail. We don't get anything from this process if the sectors, or any other agent for that matter, fail/don't run." Four behavior changes: 1. Long 429 retry. invoke_with_rate_limit_retry is now an overall wall-clock DEADLINE (default RATE_LIMIT_RETRY_DEADLINE_SECONDS = 75 min, env-overridable, clamped 5 min .. 3 hr) of persistent 429 retry with capped expo backoff, honoring retry-after — NOT a fixed ~6-attempt cap. Non-429 errors still propagate immediately. 2. All-agents hard-fail. score_aggregator raises (-> handler status:ERROR, NO signals.json / email / DB write) if ANY sector team is missing (absent from ALL_TEAM_IDS), failed, or partial. CIO + macro economist + macro critic LLM calls wrapped in the same deadline helper with explicit max_retries; their failure hard-fails the run (strict-mode default raise) — no synthetic substitute promoted. 3. Removed #193 carry-forward. _update_thesis_for_held_stock no longer carries the prior thesis forward on persistent failure: a bounded parse re-roll for the transient tool-XML schema leak, then RAISE (no isolation fallback). A 429 past the deadline fails fast. 4. Reverted #194 isolation. A failed/partial sector team aborts the whole run even when other teams produced usable picks. KEPT from #194 (composes with the directive): per-team S3 persistence + sector_team_node resume short-circuit. Extended the same persist+resume pattern to CIO + macro (save_agent_run / load_agent_run, archive/agent_runs/{run_date}/{agent_id}.json) so an SF redrive after a hard-fail reuses every already-succeeded agent with zero LLM calls and the long retry only re-attempts the still-missing one(s) — what makes a 60-90 min window bounded. Stub-quarantine root-cause + structural fix (the dangerous bug): s3://alpha-engine-research/signals/2026-05-15/signals.json (written 2026-05-16T17:08:46Z) shipped synthetic [DRY-RUN] stub theses promoted as real (GOOG/AFL/AXP/ABT/APD/ADBE/AMD). Leak path: install_dry_run_stubs only no-op'd write_signals_json + upload_db, NOT save_sector_team_run. The stub-pass runs the full graph; _stub_run_sector_team returns error=None so sector_team_node PERSISTED synthetic [DRY-RUN] output to archive/sector_team_runs/{run_date}/{team_id}.json. The subsequent REAL pass's #194 resume short-circuit LOADED that stub-persisted output and promoted it — zero real Haiku calls. Fix (defense in depth): (a) install_dry_run_stubs now also no-ops save_sector_team_run + save_agent_run so the stub-pass cannot write resume keys; (b) graph.stub_quarantine.assert_no_stub_output at the top of archive_writer refuses to write if the [DRY-RUN marker appears in any promotable surface or a sector team is missing. Tests: full suite 1305 passed, 1 pre-existing unrelated test_scoring RSI failure (known stale-local-config, passes on CI). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 16, 2026
cipher813
added a commit
that referenced
this pull request
May 17, 2026
yfinance-centralization arc (plan: alpha-engine-docs/private/yfinance-centralization-260516.md),
PR 2 — R3, the last HOT research yfinance socket.
Rewrote agents/sector_teams/quant_tools.py::get_balance_sheet to read
alpha-engine-data's weekly Finnhub fundamentals snapshot instead of
yfinance.Ticker().info. Removed `import yfinance as yf` from that path —
quant_tools.py is now yfinance-free for PR2's scope (the get_balance_sheet
`import yfinance as yf` was the only yfinance reference).
Mechanics (mirrors the in-repo factor_profiles pattern + predictor's
inference/stages/fetch_alt_data.py read/date-resolution):
- New module-level `read_fundamentals_from_s3()` — reads
archive/fundamentals/{date}.json; if no run_date, scans the prefix and
reads the most-recent snapshot (predictor's fallback; the snapshot is a
slow weekly signal so stale-by-days is acceptable). Returns None on any
failure (caller graceful-degrades, never raises).
- Cached once at tool-creation time (one S3 read per sector team per
Saturday SF run, same as factor_profiles) via a new `fundamentals_data`
context key (also injectable for tests — no S3/network in tests).
Schema-mapping correction vs the plan doc: the data-module collector does
NOT persist raw Finnhub keys — collectors/fundamentals.py writes a
NORMALIZED/CLIPPED per-ticker schema {pe_ratio, pb_ratio, debt_to_equity,
revenue_growth_yoy, fcf_yield, gross_margin, roe, current_ratio}. Followed
the plan's intent (consume the S3 key the predictor already reads) and
mapped that ACTUAL schema onto the tool's exact existing output keys:
debt_to_equity ← debt_to_equity (already a ratio — NO yfinance %/100
scaling, per plan)
current_ratio ← current_ratio
pe_ratio ← pe_ratio
price_to_book ← pb_ratio
revenue_growth ← revenue_growth_yoy
gross_margins ← gross_margin
forward_pe → None (not persisted by the collector)
market_cap → None (computed transiently for fcf_yield, not persisted)
The agent consumes these as soft directional context (not a hard gate —
the hard D/E gate R2 is being deleted in PR1), so the normalized-scale /
trailing-vs-TTM delta is tolerable per the plan's risk note.
Graceful-degrade preserved: exact return-schema keys unchanged; a ticker
absent from the snapshot (or an empty/missing snapshot) returns
{"error": ...} per the existing contract — never raises (all-agents-strict,
#195). No read that degraded before now raises.
Tests: added tests/test_get_balance_sheet_s3.py (8 tests, monkeypatch/fakes,
NO unittest.mock.patch per the documented full-suite bleed) — mapped schema
from a faked fundamentals JSON, D/E not %/100-rescaled, {"error":...} on
missing ticker + empty snapshot (no raise), no yfinance import, S3-reader
scan-for-latest + None-on-failure. Full suite: 1329 passed
(1321 baseline + 8 new), 1 pre-existing acceptable failure
(tests/test_scoring.py::TestRSIScoring::test_bull_overbought_matches_neutral_post_revert
— stale-local-config, passes on CI). Zero new failures.
**DEPLOY HELD — research auto-deploys on merge; do not merge until user
directs. Part of the yfinance-centralization arc (plan doc
alpha-engine-docs/private/yfinance-centralization-260516.md); intended to
land before the held Research re-run.**
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813
added a commit
that referenced
this pull request
May 17, 2026
… (R5) (#199) yfinance-centralization arc (plan: alpha-engine-docs/private/yfinance-centralization-260516.md), PR 3 — R5. With PR1+PR2 this makes alpha-engine-research yfinance-free at runtime (only local/time_scanner.py keeps yfinance — explicitly excluded by the plan as a dev-only scanner not in the Lambda/SF path). scoring/performance_tracker.py::run_performance_checks — replaced the `yf.download(period="2d")` FALLBACK leg with the in-repo data/fetchers/feature_store_reader.py::read_latest_daily_closes() (alpha-engine-data's staging/daily_closes/ parquet → {ticker: close}). Deleted `import yfinance as yf` + `import pandas as pd` (pandas was only used by the removed yfinance multi-ticker DataFrame parsing); updated the module docstring. Polygon grouped-daily stays the PRIMARY path — completely unchanged. Only the fallback leg swapped. The new get_latest_price() returns polygon_prices[t] first, then daily_closes.get(t) — exact same precedence as before (polygon, then fallback). Graceful-degrade preserved (all-agents-strict, #195): - read_latest_daily_closes() returns {ticker: close} or None and NEVER raises (broad except → None), strictly matching the old try/except contract — no read that degraded before now raises. - If both polygon and the fallback yield nothing, the function returns _compute_accuracy_stats() with no new evaluations recorded — identical net behavior to the old yfinance-failed branch (which proceeded with price_data=None and recorded nothing). - Missing-ticker → row skipped (current_price None), unchanged. Tests: reworked tests/test_performance_tracker.py — the 5 tests that patched `scoring.performance_tracker.yf.download` would break (yf removed), so rewrote them to fake feature_store_reader.read_latest_daily_closes via `monkeypatch` (NOT unittest.mock.patch — documented full-suite bleed; mirrors tests/test_held_thesis_strict.py). Replaced the brittle `@patch("...yf.download")` shape; removed the now-unused yf-shaped `_mock_price_data` + pandas import. Added: module-is-yfinance-free assertion, daily_closes-fallback 10d/beat-SPY/missing-ticker evals, graceful-degrade-when-fallback-unavailable, and an explicit polygon-primary-path-unaffected test with a tripwire that fails if the fallback reader is called when polygon covers all tickers. test_archive.py only references performance_tracker in a docstring — unaffected. Full suite: 1323 passed (perf-tracker class 5 yfinance tests → 7 daily-closes tests; net +2 vs the 1321 baseline), 1 pre-existing acceptable failure (tests/test_scoring.py::TestRSIScoring::test_bull_overbought_matches_neutral_post_revert — stale-local-config, passes on CI). Zero new failures. **DEPLOY HELD — research auto-deploys on merge; do not merge until user directs. Part of the yfinance-centralization arc (plan doc alpha-engine-docs/private/yfinance-centralization-260516.md); intended to land before the held Research re-run.** Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
DEPLOY HELD
DEPLOY HELD — reworks just-merged #193/#194; do not merge/deploy/re-run; after merge the Research Lambda must be redeployed before live; user will re-run manually. This reverses the degrade-and-continue philosophy of two PRs that merged earlier today. The user reviews before deploy.
The directive (Brian, 2026-05-16)
This OVERRIDES #193's carry-forward and #194's per-team isolation. All-agents-strict scope: sector teams, held-thesis updates, CIO, macro economist (+ critic) — every agent.
The 4 behavior changes
1. Long 429 retry (
agents/langchain_utils.py)invoke_with_rate_limit_retryis now an overall wall-clock deadline of persistent 429 retry, not a fixed ~6-attempt cap. Capped exponential backoff (base 4s, cap 60s) + decorrelated jitter between attempts, honoring theretry-afterheader. When the deadline is exceeded the 429 propagates and the caller hard-fails.RATE_LIMIT_RETRY_DEADLINE_SECONDS, default 75 min (4500s), set module-level inagents/langchain_utils.pyvia_resolve_deadline_seconds(). Env overrideRATE_LIMIT_RETRY_DEADLINE_SECONDS, clamped to 5 min .. 3 hr so a typo can't make it unbounded or trivial. The wrapper reads the module constant at call time (tests monkeypatch it; the Lambda sets the env before cold-start import).max_attemptskwarg was removed (replaced bydeadline_seconds). Non-429 errors still propagate immediately and unchanged.2. All-agents hard-fail (
graph/research_graph.pyscore_aggregator)Reverts #194.
score_aggregatorraisesRuntimeError(→ handlerstatus:ERROR, no signals.json / email / DB write) if ANY sector team is missing (absent fromALL_TEAM_IDScoverage), failed (carrieserror), or partial. Surviving picks from other teams no longer save the run.CIO (
agents/investment_committee/ic_cio.py), macro economist + critic (agents/macro_agent.py): their.invoke()calls are wrapped in the same deadline helper, with explicitmax_retries=SECTOR_TEAM_LLM_MAX_RETRIESon everyChatAnthropic. A 429 past the deadline → propagate → strict-mode (production default) raise → run hard-fails. No synthetic/empty CIO or macro substitute is promoted.3. Removed #193 carry-forward (
agents/sector_teams/sector_team.py)_update_thesis_for_held_stockno longer carries the prior thesis forward on persistent failure. It does a bounded parse re-roll (_MAX_PARSE_ATTEMPTS = 3) for the transient tool-XML schema leak (the 2026-05-16catalystsstring-not-list nondeterminism — these recover on a re-roll), then RAISES. A 429 (detected via_is_rate_limit_error) past the deadline-bounded wrapper fails fast — re-rolling can't fix an org TPM ceiling. The raise surfaces as the team's harderror→score_aggregatorhard-fails.4. Reverted #194 isolation
Covered by change 2 — a failed/partial team aborts the run even with other teams' picks.
What #194 behavior is KEPT and why it composes
Per-team S3 persistence +
sector_team_noderesume short-circuit are KEPT unchanged (archive/manager.pysave_sector_team_run/load_sector_team_run, thearchive/sector_team_runs/{run_date}/{team_id}.jsonscheme). It composes with the directive: each agent that succeeds is persisted the moment it completes, so when one agent hard-fails and the run ERRORs, an SF redrive reuses the persisted succeeded teams (zero Haiku calls) and the long 75-min retry only ever re-attempts the still-missing agent(s). That is exactly what makes a 60-90 min window affordable and bounded rather than re-paying the whole 6-team fan-out every redrive.Extended the same persist+resume to CIO + macro (new
ArchiveManager.save_agent_run/load_agent_run,archive/agent_runs/{run_date}/{agent_id}.json; resume short-circuit + persist-on-success incio_nodeandmacro_economist_node). Macro runs upstream of sector dispatch (Stage B) and CIO is the single most expensive Sonnet call — without this, every redrive after a sector hard-fail would re-pay them before even reaching the still-missing team.Stub-quarantine — root-cause + structural fix (the dangerous one)
Evidence:
s3://alpha-engine-research/signals/2026-05-15/signals.json, written2026-05-16T17:08:46Zby the recovery-postfix194 run, shipped with syntheticdry_run.pystub output promoted as real — every new buy_candidate (GOOG/AFL/AXP/ABT/APD/ADBE/AMD) had athesis_summarystarting"[DRY-RUN] Strong fundamentals…"and the email rendered them as real picks, on a run that was NOTdry_run_llm.Precise leak path (root-caused): The default dry-run gate runs a synthetic stub-pass first via
install_dry_run_stubs(archive), then_restore().install_dry_run_stubsonly no-op'dwrite_signals_json+upload_db. The stub-pass runs the full graph:sector_team_nodecallssave_sector_team_runthe moment a team "succeeds", and_stub_run_sector_teamreturns a normal-looking dict witherror=None— so the stub-pass persisted synthetic[DRY-RUN]sector-team output toarchive/sector_team_runs/{run_date}/{team_id}.json. The subsequent real pass'ssector_team_noderesume short-circuit (#194) then loaded that stub-persisted output viaload_sector_team_runand promoted the synthetic theses straight into signals.json + the email — with zero real Haiku calls. The per-team resume persistence path (added by #194) was never suppressed for the stub-pass, so the stub-pass leaked synthetic state into the real pass.Structural fix (defense in depth):
install_dry_run_stubsnow also no-opssave_sector_team_run+save_agent_runso the stub-pass cannot write the resume keys at all.graph/stub_quarantine.pyassert_no_stub_output, called at the top ofarchive_writer(before any write; archive_writer precedes email_sender so a raise here also blocks the email). It raisesStubQuarantineErrorif the[DRY-RUNmarker (canonicaldry_run.DRY_RUN_MARKER, embedded in every stub string) appears anywhere in the signals payload / consolidated report / investment theses / sector-team outputs / entry theses / IC decisions / macro report, or if any sector team is missing/failed/partial. A promoted artifact may ONLY be produced by a fully-real, all-agents-complete pass. The signals payload is built once and reused at the write site (no double-build, no TOCTOU).Pre-existing #193/#194 tests changed (in-scope contract changes)
tests/test_held_thesis_isolation.py→tests/test_held_thesis_strict.py(renamed). Every test rewritten to the new contract:test_malformed_catalysts_degrades_to_prior_thesis→test_malformed_catalysts_raises_after_reroll(carry-forward removed → now RAISES).test_retry_recovers_when_second_attempt_valid→test_reroll_recovers_when_a_later_attempt_valid(kept; parse budget 2→3, post-budget = raise not carry).test_validation_error_does_not_propagate_in_strict_mode→test_validation_error_DOES_propagate(the OPPOSITE assertion — the whole point of the rework).test_no_prior_thesis_isolation_marks_score_failed→test_no_prior_thesis_still_raises(no isolation fallback).tests/test_score_aggregator_failure.pyrewritten.TestScoreAggregatorPartialTolerance/TestScoreAggregatorIsolationpinned fix(research): per-sector-team S3 persistence + 429 backoff + isolation — re-runs never re-pay completed teams #194's tolerate-and-continue; replaced withTestPartialNoLongerTolerated/TestNoIsolationasserting RAISE.test_passes_through_when_no_errors/_when_error_key_absent(a single clean team used to pass) →test_missing_teams_hard_fail*+ atest_full_clean_set_passes_throughthat supplies the full 6-team set. Thetest_2026_05_16_multi_team_429_does_not_abort_when_picks_surviveregression is inverted totest_2026_05_16_multi_team_429_now_hard_fails.tests/test_sector_team_persist_backoff.py:test_persistent_429_eventually_raises→test_persistent_429_raises_at_deadline_not_attempt_count(themax_attemptskwarg was removed). Addedtest_deadline_constant_default_is_75_min+test_long_429_then_success_within_deadline.tests/test_thesis_update_recompute.py: the_statehelper now pads absentALL_TEAM_IDSteams with clean stubs so these tests (which exercise the recompute/normalization path after the strict gate, using a non-ALL_TEAM_IDSteam_id like"energy") reach the path under test. Zero recommendations/theses contributed by the padding → every test'sinvestment_thesesassertion is unchanged.tests/test_max_tokens_lint.py: allowlist line for the macro criticmax_tokens=512moved 462 → 476 (my import +max_retries=kwarg shifted it). Same intentional 512-literal call.New tests added:
tests/test_stub_quarantine.py(the precise 2026-05-15 failure shape — stub-pass cannot persist resume keys + write-site guard blocks the marker/missing-team),tests/test_cio_macro_strict_persist.py(CIO/macro 429-past-deadline propagation + agent_run persist/resume).Test count
Full suite: 1305 passed, 1 pre-existing unrelated failure (
tests/test_scoring.py::TestRSIScoring::test_bull_overbought_matches_neutral_post_revert— known stale-local-config artifact, passes on CI). Validated with~/Development/alpha-engine-research/.venv/bin/python -m pytest tests/ -q -p no:cacheprovider.Ambiguity resolved
The directive said "extend persist+resume to CIO/macro if practical; if high-risk, persist sector teams only and document why." It was practical and low-risk —
save_agent_run/load_agent_runmirror the proven #194 sector-team envelope + identity-guard contract exactly, and the stub-quarantine suppression ofsave_agent_runkeeps it consistent with the structural fix — so I extended it to both CIO and macro.🤖 Generated with Claude Code