feat(preflight): add Research-side static checks (price cards + recursion budget) by cipher813 · Pull Request #137 · cipher813/alpha-engine-data

cipher813 · 2026-05-02T17:53:42Z

Summary

Today's incident chain proved the existing dry-run/stub-LLM gate catches structural issues but is by-design blind to runtime LLM behavior — bugs in cost-telemetry lookup and recursion budget only surface when real LLM calls happen.

Two new static checks (zero LLM cost) catch today's exact failure modes at preflight time:

`check_price_cards_cover_all_models`

Walks runtime model names from research config (per_stock_model, strategic_model, _FALLBACK_AGENT_MODEL_NAMES), normalizes via the same snapshot-suffix strip PR #77 added, and asserts each maps to a card in alpha-engine-config/cost/model_pricing.yaml. Pre-empts PriceCardLookupError.

`check_recursion_budget_for_response_format`

Static regex scan of agents/sector_teams/{quant,qual}_analyst.py. For every site using response_format=, asserts recursion_limit includes a +N buffer (not bare MAX_ITERATIONS * 2). Pre-empts GraphRecursionError from the structured-extraction call.

Both WARN (don't FAIL) when sibling repos aren't checked out — preserves "useful even when partial" property.

Validation against current state

[OK]   price_cards_cover_all_models     3 runtime models map to cards
[OK]   recursion_budget_for_response_format   2 ReAct sites buffered

Both pass after PR #77 + #78 merged.

Test plan

8 new tests in test_sf_preflight.py: happy path, failure path (reproducing today's exact incidents in tmp sibling layout), absent-sibling skip, snapshot-suffix normalization round-trip
Full suite: 403 passed
Live run on Friday before next Saturday SF — should catch any drift before launch

🤖 Generated with Claude Code

…sion budget) Today's incident chain proved the existing preflight catches structural issues but is by-design blind to runtime LLM behavior. Specifically: PR #77 (PriceCardLookupError): runtime model 'claude-haiku-4-5-20251001' didn't normalize to any price card → Research SF crash. PR #78 (GraphRecursionError): ReAct sites used response_format= but recursion_limit was bare MAX_ITERATIONS * 2 → Research SF crash. Both are catchable by static config-walk preflight (zero LLM cost): ## check_price_cards_cover_all_models Walks every runtime model name (universe.yaml's per_stock_model + strategic_model + research_graph.py's _FALLBACK_AGENT_MODEL_NAMES dict), normalizes via the same snapshot-suffix strip the production cost tracker uses (PR #77's _normalize_model_for_pricing — duplicated here to avoid heavy imports), and asserts each maps to a card in alpha-engine-config/cost/model_pricing.yaml. ## check_recursion_budget_for_response_format Static regex scan of agents/sector_teams/{quant,qual}_analyst.py. For every file using response_format= in create_react_agent, asserts recursion_limit is NOT bare 'MAX_ITERATIONS * 2' (must include +N buffer for the post-loop structured-extraction call). Catches PR #78's exact failure mode at config-walk time. Both checks WARN (don't FAIL) when sibling repos aren't checked out (CI / restricted environments) — preserves the preflight's "useful even when partial" property. Validation against current state (post PR #77 + #78): [OK] price_cards_cover_all_models 3 runtime models map to cards [OK] recursion_budget_for_response_format 2 ReAct sites buffered 8 new tests in test_sf_preflight.py covering happy path, failure path (reproducing today's exact incidents in tmp sibling layout), absent- sibling skip, and the snapshot-suffix-normalization round-trip. 403 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(iam): take ownership of shared orchestration roles Step Functions execution role + EventBridge cron role move here from alpha-engine/infrastructure/iam/. Their grants are derived from code that lives in this repo (SF JSON Lambda invoke targets, EC2 instances the SF SSMs, EventBridge SFN target ARNs) — co-locating the codified IAM with the source of those grants tightens the coupling so a single PR can change SF behavior + matching IAM atomically. Files added: - infrastructure/iam/alpha-engine-step-functions-role.json - infrastructure/iam/alpha-engine-eventbridge-sfn-role.json - infrastructure/iam/check-drift.py (flat-layout variant of alpha-engine's directory-per-role drift checker) - infrastructure/iam/README.md (documents which roles this repo owns + the single-writer rule) - .github/workflows/iam-drift-check.yml (PR + daily 09:30 UTC + manual) Files updated: - infrastructure/deploy_step_function.sh — drops the surviving inline put-role-policy block against alpha-engine-step-functions-role (PR #170 dropped the EB-SFN twin; PR #151 dropped the daily-script twin; this completes the trio). The script kept a stale narrower policy that clobbered ssm:DescribeInstanceInformation + ec2:StopInstances + the trading-instance SSM ARN every saturday deploy. Trust policy + create-role bootstrap stay (one-time setup). - infrastructure/add-ssm-policy.sh — drops alpha-engine-executor-role + alpha-engine-predictor-role from the ROLES list. Both now have alpha-engine-ssm-read codified in their home repos (executor: already; predictor: covered by separate PR codifying its existing live grant). Script remains the writer for non-codified Lambda execution roles only. OIDC trust policy on github-actions-iam-drift-check widened live to also permit repo:cipher813/alpha-engine-data so the new drift-check workflow here can authenticate with the existing OIDC role. Companion PRs: - alpha-engine #137 — removes the codified directories, updates the cross-repo foreign-writer guard. - alpha-engine-predictor (separate) — codifies existing live alpha-engine-ssm-read grant on predictor-role. Supersedes alpha-engine-data #171 (which only addressed the saturday script's inline write). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: re-trigger after OIDC trust + scope widening * ci: re-trigger pull_request workflow after OIDC widening --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cipher813 merged commit 79d95e1 into main May 2, 2026
1 check passed

cipher813 deleted the feat/sf-preflight-research-checks branch May 2, 2026 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(preflight): add Research-side static checks (price cards + recursion budget)#137

feat(preflight): add Research-side static checks (price cards + recursion budget)#137
cipher813 merged 1 commit into
mainfrom
feat/sf-preflight-research-checks

cipher813 commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cipher813 commented May 2, 2026

Summary

check_price_cards_cover_all_models

check_recursion_budget_for_response_format

Validation against current state

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`check_price_cards_cover_all_models`

`check_recursion_budget_for_response_format`