feat(sf): add DailySubstrateHealthCheck state at end of weekday EOD SF#178
Merged
Conversation
Mirrors the Saturday-SF WeeklySubstrateHealthCheck (PR #175) into the weekday EOD SF, running ``python -m alpha_engine_lib.transparency --cadence daily --alert`` on the dashboard EC2 between EODReconcile success and StopTradingInstance. Closes the Phase 2 → 3 gap where rows 4/5/6 of transparency_inventory (lineage, risk_events, residual_pct) only got checked once per week despite emitting daily — a bad emission Mon-Thu would otherwise sit undetected until Saturday's run. The same per-row CloudWatch alarms from PR #176 cover daily emissions (SubstrateRowOK metric is cadence-agnostic). No new alarms needed. Both new states are non-blocking (Catch routes to StopTradingInstance) so a substrate-check infra failure can never leave the trading EC2 running overnight (cost-guard requirement). Refactors update_eod_pipeline_sf.sh to read the SF definition from infrastructure/step_function_eod.json instead of an inline heredoc, matching the deploy_step_function.sh pattern for the Saturday SF. The JSON file is now the single source of truth; wiring tests pin its contents. Eliminates the two-staleness-vectors antipattern that had the heredoc and the JSON file diverging silently. 17 new wiring tests pin chain ordering, Catch semantics, command shape (--cadence daily, --alert, dashboard EC2, git pull before run), ResultPath isolation, and instance targeting (dashboard EC2 not trading EC2). 519 total passing. Requires: alpha-engine PR adding ec2_instance_id to the daemon's EOD SF input (DailySubstrateHealthCheck targets \$.ec2_instance_id, which the daemon trigger now populates with the dashboard EC2 instance id). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813
added a commit
that referenced
this pull request
May 7, 2026
… Comment (#179) The Sat-SF substrate state's Comment promised "the weekday EOD SF will run --cadence daily in a follow-up PR." That follow-up shipped today (PR #178 — DailySubstrateHealthCheck SF state on the weekday EOD SF). Comment-only edit; no behavioral change. Will land in live SF on next deploy_step_function.sh run (idempotent — AWS only bumps the revision when the definition actually changes). Suite unchanged (15/15 substrate wiring tests still pass; comment text isn't pinned by the wiring tests). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DailySubstrateHealthCheck+WaitForDailySubstrateHealthCheckbetween the existingCheckEODStatus(Success branch) andStopTradingInstancein the weekday EOD SF.python -m alpha_engine_lib.transparency --cadence daily --alerton the dashboard EC2 (the SF dispatcher), running the row-driven substrate health checker shipped in alpha-engine-lib v0.5.0.update_eod_pipeline_sf.shto read the SF definition frominfrastructure/step_function_eod.jsoninstead of an inline heredoc — single source of truth, mirrorsdeploy_step_function.shpattern.Why
PR #175 shipped the Saturday-SF
WeeklySubstrateHealthCheckwhich sweeps weekly + daily rows oftransparency_inventory.yaml. Rows 4/5/6 (lineage, risk_events, residual_pct) emit daily but only got checked once per week — a bad emission Mon-Thu would sit undetected until Saturday's run. This PR closes that gap by running the daily-cadence subset on every weekday EOD SF, the same day the artifacts land.The same per-row CloudWatch alarms PR #176 created cover both cadences (
SubstrateRowOKmetric is cadence-agnostic). No new alarms needed.Wiring
Both new states are non-blocking (
Catchroutes toStopTradingInstance), preserving the cost-guard requirement that the trading EC2 always stops at end of day, regardless of substrate-check outcome. Per-row CloudWatch alarms own paging on row-level failures; the SF Catch only fires on infra-level failures (SSM unreachable, EC2 down).The substrate state targets
$.ec2_instance_id(dashboard EC2 — where alpha-engine-dashboard + lib pin are installed) rather than$.trading_instance_id. The companion alpha-engine PR addsec2_instance_idto the daemon's EOD SF input.Single source of truth refactor
The old
update_eod_pipeline_sf.shhad the SF definition inline as a heredoc, whileinfrastructure/step_function_eod.jsonhad a stale separate copy from the original 2026-04-21 setup. This PR makes the JSON file authoritative and drops the heredoc — the deploy script now reads--definition file://step_function_eod.json. Mirrors the Sat-SF pattern (deploy_step_function.shreadsstep_function.json). Eliminates the two-staleness-vectors antipattern.Dependencies
_trigger_eod_pipelineaddsec2_instance_idto SF input) — must merge before this so the next daemon-triggered firing populates the new field. Without it,DailySubstrateHealthCheckwould error at the SSM step withStates.Runtimeon the missing JSONPath, fall through the non-blocking Catch toStopTradingInstance— trading EC2 still stops (cost-guard preserved) but the substrate check never runs.Test plan
pytest tests/test_sf_eod_substrate_check_wiring.py— all 17 new wiring tests pass../infrastructure/update_eod_pipeline_sf.sh(reads canonical JSON file).AlphaEngine/Substratenamespace daily, no false-alarm storm on rows already passing weekly.🤖 Generated with Claude Code