Skip to content

fix(sf-daily): set -eo pipefail on every SSM RunShellScript block#209

Merged
cipher813 merged 2 commits into
mainfrom
fix/sf-pipefail-on-ssm-tee-commands
May 11, 2026
Merged

fix(sf-daily): set -eo pipefail on every SSM RunShellScript block#209
cipher813 merged 2 commits into
mainfrom
fix/sf-pipefail-on-ssm-tee-commands

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

  • Add set -eo pipefail as the first command of every SSM RunShellScript block in infrastructure/step_function_daily.json: CheckTradingDay, MorningEnrich, RunMorningPlanner, RunDaemon.
  • New wiring test tests/test_sf_ssm_pipefail_wiring.py parametrizes over all three SF defns (Saturday + weekday + EOD) and asserts every SSM commands array begins with a set ... pipefail line. Catches future regressions where a state is added or rewritten without the flag.

Why

Without pipefail, python ... 2>&1 | tee -a /var/log/foo.log silently masks any non-zero exit from python because tee returns 0. SSM RunShellScript then reports ResponseCode: 0, Status: Success and the SF moves past the failure.

This was the load-bearing cause of the 2026-05-11 silent-MorningEnrich cascade:

  1. python weekly_collector.py --morning-enrich exited 1 (constituents-preflight raise)
  2. | tee returned 0 → SSM Success → SF advanced through PredictorInference + RunMorningPlanner against stale ArcticDB / S3 daily_data
  3. Morning planner aborted with daily_data: 46h stale → no order book written → daemon Telegram alert

Adding set -eo pipefail makes any future MorningEnrich / planner failure propagate to SSM ResponseCode != 0, which the SF's existing Catch[States.ALL] already routes to HandleFailure / MorningEnrichFailed. Saturday SF already uses this convention in every state (8 instances); this PR brings the daily SF in line.

Test plan

  • pytest tests/test_sf_ssm_pipefail_wiring.py — 3/3 pass (parametrized over saturday/weekday/eod SF defns)
  • Full suite: pytest tests/ — 714 passed, 1 skipped
  • JSON syntax verified: python -c "import json; json.load(open('infrastructure/step_function_daily.json'))"
  • Post-merge: deploy-infrastructure workflow updates the live weekday SF defn; next weekday SF firing (or manual start-execution) will exercise the new flag end-to-end.

Independent of #207, #208

This PR touches only infrastructure/step_function_daily.json + a new test file. No conflict with #207 (constituents.py) or #208 (constituents.py + .gitignore). All three can land in any order.

🤖 Generated with Claude Code

cipher813 and others added 2 commits May 11, 2026 06:51
Without `pipefail`, a non-zero exit from `python ...` upstream of a pipe
to `tee` is silently masked by tee's exit 0. SSM RunShellScript then
reports `ResponseCode: 0, Status: Success` and the SF moves on with no
indication that the step actually failed.

This was the load-bearing cause of the 2026-05-11 silent-MorningEnrich
cascade: `python weekly_collector.py --morning-enrich 2>&1 | tee -a
/var/log/morning-enrich.log` exited 1 from the constituents-preflight
raise, the SF reported MorningEnrich Success, PredictorInference +
RunMorningPlanner ran against stale daily_data, and the planner
aborted minutes later with "daily_data: 46h stale".

Add `set -eo pipefail` as the first command of every SSM
RunShellScript block in step_function_daily.json:
- CheckTradingDay
- MorningEnrich  (pipe to tee — load-bearing)
- RunMorningPlanner  (pipe to tee — load-bearing)
- RunDaemon

New wiring test (`tests/test_sf_ssm_pipefail_wiring.py`) parametrizes
over all three SF defns (saturday + weekday + eod) and asserts every
SSM RunShellScript command array begins with a `set ... pipefail`
line. Accepts both `set -eo pipefail` (Saturday + weekday) and `set -o
pipefail` (EOD) conventions — the absence of `pipefail` is the bug
being prevented.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit eeeeacb into main May 11, 2026
1 check passed
@cipher813 cipher813 deleted the fix/sf-pipefail-on-ssm-tee-commands branch May 11, 2026 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant