Skip to content

feat(sf): add DailySubstrateHealthCheck state at end of weekday EOD SF#178

Merged
cipher813 merged 1 commit into
mainfrom
feat/eod-sf-daily-substrate-check
May 7, 2026
Merged

feat(sf): add DailySubstrateHealthCheck state at end of weekday EOD SF#178
cipher813 merged 1 commit into
mainfrom
feat/eod-sf-daily-substrate-check

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

  • Inserts DailySubstrateHealthCheck + WaitForDailySubstrateHealthCheck between the existing CheckEODStatus (Success branch) and StopTradingInstance in the weekday EOD SF.
  • New states invoke python -m alpha_engine_lib.transparency --cadence daily --alert on the dashboard EC2 (the SF dispatcher), running the row-driven substrate health checker shipped in alpha-engine-lib v0.5.0.
  • 17 new wiring tests pin chain ordering, Catch semantics, command shape, ResultPath isolation, and instance targeting. 519 total passing.
  • Refactors update_eod_pipeline_sf.sh to read the SF definition from infrastructure/step_function_eod.json instead of an inline heredoc — single source of truth, mirrors deploy_step_function.sh pattern.

Why

PR #175 shipped the Saturday-SF WeeklySubstrateHealthCheck which sweeps weekly + daily rows of transparency_inventory.yaml. Rows 4/5/6 (lineage, risk_events, residual_pct) emit daily but only got checked once per week — a bad emission Mon-Thu would sit undetected until Saturday's run. This PR closes that gap by running the daily-cadence subset on every weekday EOD SF, the same day the artifacts land.

The same per-row CloudWatch alarms PR #176 created cover both cadences (SubstrateRowOK metric is cadence-agnostic). No new alarms needed.

Wiring

... → CheckEODStatus → (Success) → DailySubstrateHealthCheck
    → WaitForDailySubstrateHealthCheck → StopTradingInstance (End)

Both new states are non-blocking (Catch routes to StopTradingInstance), preserving the cost-guard requirement that the trading EC2 always stops at end of day, regardless of substrate-check outcome. Per-row CloudWatch alarms own paging on row-level failures; the SF Catch only fires on infra-level failures (SSM unreachable, EC2 down).

The substrate state targets $.ec2_instance_id (dashboard EC2 — where alpha-engine-dashboard + lib pin are installed) rather than $.trading_instance_id. The companion alpha-engine PR adds ec2_instance_id to the daemon's EOD SF input.

Single source of truth refactor

The old update_eod_pipeline_sf.sh had the SF definition inline as a heredoc, while infrastructure/step_function_eod.json had a stale separate copy from the original 2026-04-21 setup. This PR makes the JSON file authoritative and drops the heredoc — the deploy script now reads --definition file://step_function_eod.json. Mirrors the Sat-SF pattern (deploy_step_function.sh reads step_function.json). Eliminates the two-staleness-vectors antipattern.

Dependencies

  • alpha-engine companion PR (daemon _trigger_eod_pipeline adds ec2_instance_id to SF input) — must merge before this so the next daemon-triggered firing populates the new field. Without it, DailySubstrateHealthCheck would error at the SSM step with States.Runtime on the missing JSONPath, fall through the non-blocking Catch to StopTradingInstance — trading EC2 still stops (cost-guard preserved) but the substrate check never runs.

Test plan

  • pytest tests/test_sf_eod_substrate_check_wiring.py — all 17 new wiring tests pass.
  • Full alpha-engine-data suite — 502 passed.
  • After merge: deploy via ./infrastructure/update_eod_pipeline_sf.sh (reads canonical JSON file).
  • First Mon-Fri after deploy: verify substrate check fires in <3 min, populates AlphaEngine/Substrate namespace daily, no false-alarm storm on rows already passing weekly.
  • Sat 5/9 SF: confirm WeeklySubstrateHealthCheck still passes (refactor changed deploy plumbing, not Sat-SF state definitions).

🤖 Generated with Claude Code

Mirrors the Saturday-SF WeeklySubstrateHealthCheck (PR #175) into the
weekday EOD SF, running ``python -m alpha_engine_lib.transparency
--cadence daily --alert`` on the dashboard EC2 between EODReconcile
success and StopTradingInstance.

Closes the Phase 2 → 3 gap where rows 4/5/6 of transparency_inventory
(lineage, risk_events, residual_pct) only got checked once per week
despite emitting daily — a bad emission Mon-Thu would otherwise sit
undetected until Saturday's run.

The same per-row CloudWatch alarms from PR #176 cover daily emissions
(SubstrateRowOK metric is cadence-agnostic). No new alarms needed.

Both new states are non-blocking (Catch routes to StopTradingInstance)
so a substrate-check infra failure can never leave the trading EC2
running overnight (cost-guard requirement).

Refactors update_eod_pipeline_sf.sh to read the SF definition from
infrastructure/step_function_eod.json instead of an inline heredoc,
matching the deploy_step_function.sh pattern for the Saturday SF. The
JSON file is now the single source of truth; wiring tests pin its
contents. Eliminates the two-staleness-vectors antipattern that had
the heredoc and the JSON file diverging silently.

17 new wiring tests pin chain ordering, Catch semantics, command
shape (--cadence daily, --alert, dashboard EC2, git pull before run),
ResultPath isolation, and instance targeting (dashboard EC2 not
trading EC2). 519 total passing.

Requires: alpha-engine PR adding ec2_instance_id to the daemon's EOD
SF input (DailySubstrateHealthCheck targets \$.ec2_instance_id, which
the daemon trigger now populates with the dashboard EC2 instance id).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit ea1eecb into main May 7, 2026
1 check passed
@cipher813 cipher813 deleted the feat/eod-sf-daily-substrate-check branch May 7, 2026 02:05
cipher813 added a commit that referenced this pull request May 7, 2026
… Comment (#179)

The Sat-SF substrate state's Comment promised "the weekday EOD SF will
run --cadence daily in a follow-up PR." That follow-up shipped today
(PR #178 — DailySubstrateHealthCheck SF state on the weekday EOD SF).

Comment-only edit; no behavioral change. Will land in live SF on next
deploy_step_function.sh run (idempotent — AWS only bumps the revision
when the definition actually changes).

Suite unchanged (15/15 substrate wiring tests still pass; comment
text isn't pinned by the wiring tests).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant