feat(eventbridge): Option-D pipeline_role tags on all 3 SF cron rules#317
Merged
Conversation
Tags every cron-triggered SF execution with a ``pipeline_role`` input
field per the Option-D execution-picker plan (Brian-approved 2026-05-25
evening, lib substrate in alpha-engine-lib#75). Page 25 / Slack / CLI
consumers filter by role so smoke / recovery / operator-replay
executions never displace the canonical cadence run as "most recent."
- SaturdayTrigger (cron 09:00 UTC SAT) → pipeline_role="weekly"
- FridayShellRunTrigger (cron 20:45 UTC FRI, shipped DISABLED)
→ pipeline_role="shell-run"
- WeekdayTrigger (cron 12:45 UTC MON-FRI) → pipeline_role="daily"
Edits land in BOTH source-of-truth paths:
1. CFN orchestration template (``alpha-engine-orchestration.yaml``) —
the fresh-region/account bootstrap path; re-apply re-stamps the
live rule from this YAML.
2. ``deploy_step_function.sh`` + ``deploy_step_function_daily.sh`` —
the operator-applied path; running these scripts updates the live
rule directly. Both paths now carry pipeline_role to prevent
drift between CFN and live state.
Naming convention for ad-hoc operator launches (documented in PR body
+ pipeline-reporting-revamp plan doc follow-up):
pipeline_role Triggered by Page 25 default
------------- ------------------------------------ ---------------
"weekly" SaturdayTrigger EventBridge cron shown
"daily" WeekdayTrigger EventBridge cron shown
"eod" daemon._trigger_eod_pipeline shown
"shell-run" FridayShellRunTrigger (DISABLED) hidden by default
"smoke" operator smoke / debug hidden by default
"recovery" operator fix-and-rerun hidden by default
"backfill" operator historical backfill hidden by default
"operator- operator ad-hoc replay (catch-all) hidden by default
replay"
Chokepoint tests added (TestEventBridgeInput / TestWeekdayEventBridgeInput
/ TestOrchestrationCFNPipelineRoles): assert pipeline_role on the right
value for each trigger across all three sources of truth. Drift in
either the CFN OR a deploy script fails CI loudly with a named
remediation path.
Companion edit in alpha-engine repo:
``executor/daemon.py::_trigger_eod_pipeline`` adds
``"pipeline_role": "eod"`` to its start_execution input dict — ships
in a separate PR for that repo.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813
added a commit
that referenced
this pull request
May 26, 2026
…add target-uniqueness chokepoint test (closes 2026-05-26 duplicate-target incident) (#322) Root cause of the 2026-05-26 weekday-pipeline failure: PR #317 (33c3753, 2026-05-25 evening) added pipeline_role tagging to both the CFN template AND the deploy scripts, with different target IDs (Id=1 from scripts, Id=<rule>-pipeline from CFN). EventBridge couldn't dedupe — different IDs meant different targets — so the alpha-engine-weekday rule shipped with TWO targets pointing at the SAME state machine. Every weekday cron firing fanned to two parallel SF executions; both ran MorningEnrich on the trading instance; both opened ArcticDB and reached daily_append's update_batch/write_batch; the C++ engine emitted 321 unique-symbol E_NON_INCREASING_INDEX_VERSION (code 5090) races; the 5%-threshold gate hard-failed both runs at 35.6% (n_err=322 of 905). Saturday rule had the identical defect (first exposure Sat 5/30). This PR makes EventBridge rules + targets CFN-canonical (deploy scripts no longer call put-rule or put-targets), migrates the enable_standalone_scanner=true flag onto the CFN saturday target (was script-only — re-deploying CFN would have silently reverted L1995 Phase 3), and adds two substrate gates: - TestDeployScriptsHaveNoEventBridgeWrites: both deploy scripts must not contain executable `aws events put-rule` or `aws events put-targets` lines. Closes the dual-write path that enabled the bug. - TestCFNTargetUniqueness: each cron-triggered AWS::Events::Rule in the orchestration CFN has exactly 1 entry under Targets:. PR #317's existing tests validated input contents but not target count; this is the gap that let the defect through CI. Operational state at PR-open time: - Duplicate targets removed manually via `aws events remove-targets` (Id=weekday-pipeline + Id=saturday-pipeline) so Wed 5/27 + Sat 5/30 fire correctly without waiting for this PR to merge. - Today's failed weekday SF redriven via operator-launched execution (pipeline_role=recovery). - EB IAM role bootstrap (trust policy + create-role) kept in the saturday deploy script so a fresh region/account can still bootstrap via that script alone. Inline policy remains codified in alpha-engine repo's infrastructure/iam/. Follow-up (filed at wind-down): - P1: SF-side MutualExclusionGuard (DynamoDB conditional PUT) so any future duplicate trigger (operator double-paste, EB bug, cross-region replay) hard-fails before any SSM SendCommand. - P1: alpha-engine-lib producer-side universe-writer lock (S3 conditional PUT) — covers manual `python -m builders.daily_append` runs that don't go through SF. Test: `pytest tests/test_deploy_step_function_eventbridge_input.py` (11 passed) + full suite (1539 passed, 1 skipped). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
cipher813
added a commit
that referenced
this pull request
May 27, 2026
…oint module (#328) Closes the remaining 6 findings from the 2026-05-27 wider L302 audit (P0 retrospective on PR #317's content-vs-uniqueness CI gap that caused the 2026-05-26 dup-EB-target trading-day miss). PR #322 closed the EventBridge target instance; this closes the meta-pattern across the rest of this repo's CI: tests pin WHAT was put, not HOW MANY were put or whether anything ELSE was put. New module tests/test_sf_payload_uniqueness.py with 29 tests across 6 classes — one per audit finding: | Class | L302 Finding | What it closes | |---|---|---| | TestSaturdaySFPayloadFieldSetsClosed | F2 eval_judge_wiring + F4 aggregate_costs + cross-cutting | Lambda Payload field-set drift across all 14 Saturday SF Lambda calls | | TestWeekdaySFPayloadFieldSetsClosed | (extension) | Lambda Payload field-set drift across 6 weekday SF Lambda calls | | TestSFRoleInvokeFunctionStatementCount | F3 iam_lambda_grants | exactly 1 lambda:InvokeFunction Statement in alpha-engine-step-functions-role.json (catches stale overlapping ARN statements from pre-2026 refactors) | | TestWeekdaySSMFlowDoctorOrdering | F5 ssm_pipefail_wiring | FLOW_DOCTOR_ENABLED=1 in first 3 commands (closes 2026-05-11 ordering-incident recurrence path) | | TestEODSFTopLevelFieldsClosed | F6 eod_substrate_check_wiring | top-level $.field namespace closed across input + intermediate ResultPath fields (catches silent collisions) | | TestSaturdaySFSpotStateCount | F7 friday_shell_run_wiring | exactly 8 spot-launching states (catches orphaned legacy state from incomplete refactor) | Shape per surface: pin a closed registry of expected keys/states, fail loud when actual diverges. Mirrors PR #322's TestCFNTargetUniqueness pattern; same chokepoint shape applied to 6 more surfaces. Suite: 1567 → 1596 passed (+29 net). Composes with #322, [[reference-eventbridge-target-uniqueness-invariant]], [[feedback-mocked-tests-dont-validate-external-api-contract]], [[feedback-audit-findings-become-roadmap-followups]] and the L302 P0-retrospective entry in ROADMAP. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Tags every cron-triggered SF execution with a
pipeline_roleinput field per the Option-D execution-picker plan (Brian-approved 2026-05-25 evening; lib substrate in alpha-engine-lib#75; daemon companion in alpha-engine#214). Page 25 / Slack / CLI consumers filter by role so smoke / recovery / operator-replay executions never displace the canonical cadence run as "most recent."SaturdayTrigger(cron 09:00 UTC SAT) →pipeline_role="weekly"FridayShellRunTrigger(cron 20:45 UTC FRI, shipped DISABLED) →pipeline_role="shell-run"WeekdayTrigger(cron 12:45 UTC MON-FRI) →pipeline_role="daily"Dual source-of-truth coverage
Edits land in BOTH the CFN orchestration template AND the live-EventBridge-applied deploy scripts so neither path goes stale:
infrastructure/cloudformation/alpha-engine-orchestration.yamlinfrastructure/deploy_step_function.shinfrastructure/deploy_step_function_daily.shNaming convention
pipeline_role\"weekly\"\"daily\"\"eod\"daemon._trigger_eod_pipeline\"shell-run\"\"smoke\"\"recovery\"\"backfill\"\"operator-replay\"Test plan
test_deploy_step_function_eventbridge_input.py— assert the rightpipeline_roleon each trigger across all three sources of truth. Drift in either the CFN OR a deploy script fails CI loudly with a named remediation path.1532 passed, 1 skipped, 7 warnings in 8.15sInitializeInputusesStates.JsonMerge— the newpipeline_rolefield is preserved through to all downstream states without breaking any JSONPath reference.infrastructure/deploy_step_function.sh+infrastructure/deploy_step_function_daily.shto update the live EventBridge rules. Verify viaaws events list-targets-by-rule --rule alpha-engine-saturday --region us-east-1.🤖 Generated with Claude Code