fix(sf-daily): export FLOW_DOCTOR_ENABLED=1 in every SSM command block#210
Merged
Merged
Conversation
`alpha_engine_lib.logging.setup_logging` only attaches FlowDoctorHandler
when FLOW_DOCTOR_ENABLED=1 is set in env. Otherwise ERROR-level logs
go only to stdout — they never enter flow-doctor's dispatch pipeline
(email + GitHub issue + S3 changelog).
The trading instance's `/home/ec2-user/.alpha-engine.env` has
ANTHROPIC_API_KEY + FLOW_DOCTOR_GITHUB_TOKEN but does NOT have
FLOW_DOCTOR_ENABLED. This was a silent regression from the
2026-05-05 systemd→SSM migration: the disabled-but-retained systemd
units (`alpha-engine-morning.service`, `alpha-engine-daemon.service`)
had the flag baked in via `Environment=FLOW_DOCTOR_ENABLED=1`. SSM
`RunShellScript` only sources `.alpha-engine.env`, which never gained
the var. The daemon's systemd unit still applies (RunDaemon =
`systemctl restart …`), but MorningEnrich + RunMorningPlanner run as
direct SSM commands and ran without flow-doctor.
On 2026-05-11 MorningEnrich emitted two ERROR logs from
weekly_collector ('Pre-MorningEnrich constituents refresh failed' +
'Weekly collection finished with non-ok status=failed') and
flow-doctor never escalated. Zero flow-doctor entries exist in the
624-entry S3 changelog corpus since the s3 sink was wired ~2026-05-01,
which is the same regression hiding the past ~2 weeks of silent
MorningEnrich failures.
Fix: `export FLOW_DOCTOR_ENABLED=1` as the second line of every SSM
command array in step_function_daily.json (right after `set -eo
pipefail`). Keeps the contract version-controlled and survives
env-file drift or instance rebuilds.
New wiring test pins this. Saturday + EOD SFs not modified here —
they run on different instances with different module wiring; audit
deferred.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
567d0fd to
b4aab91
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
"export FLOW_DOCTOR_ENABLED=1"as the second line of every SSMRunShellScriptblock instep_function_daily.json(right afterset -eo pipefailfrom fix(sf-daily): set -eo pipefail on every SSM RunShellScript block #209):CheckTradingDay,MorningEnrich,RunMorningPlanner,RunDaemon.test_sf_ssm_pipefail_wiring.pywithtest_weekday_sf_ssm_blocks_export_flow_doctor_enabledto pin the flag in every command array.Why
alpha_engine_lib.logging.setup_loggingonly attachesFlowDoctorHandlerwhenFLOW_DOCTOR_ENABLED=1is set in env (liblogging.py:138). Otherwise ERROR-level logs go only to stdout — never entering flow-doctor's dispatch pipeline (email + GitHub issue + S3 changelog).I just confirmed via SSM that the trading instance's
/home/ec2-user/.alpha-engine.envhasANTHROPIC_API_KEY+FLOW_DOCTOR_GITHUB_TOKENbut does NOT haveFLOW_DOCTOR_ENABLED.Regression timeline
FLOW_DOCTOR_ENABLED=1was originally injected by two systemd units:alpha-engine/infrastructure/systemd/alpha-engine-morning.service:19alpha-engine/infrastructure/systemd/alpha-engine-daemon.service:22Both via
Environment=FLOW_DOCTOR_ENABLED=1.Per
alpha-engine/CLAUDE.mdthese units were disabled on 2026-05-05 when the morning planner + daemon migrated from boot-triggered systemd to SF-triggered SSM. Thealpha-engine-daemonunit is still systemd-managed (the SF doessystemctl restart), so the daemon process inheritsFLOW_DOCTOR_ENABLED=1✓. But MorningEnrich + RunMorningPlanner became direct SSMRunShellScriptcommands that onlysource .alpha-engine.env, and the env file never gained the flag ✗.Smoking gun
On 2026-05-11 MorningEnrich emitted two ERROR-level logs from
weekly_collector:"Pre-MorningEnrich constituents refresh failed"(RuntimeError sector-mapping-incomplete traceback)"Weekly collection finished with non-ok status=failed — exiting 1 to halt the pipeline. Per-collector statuses: {'constituents_preflight': 'error'}"Neither fired flow-doctor. Zero flow-doctor entries exist in the 624-entry S3 changelog corpus since the s3 sink was wired ~2026-05-01 — same regression hiding the past ~2 weeks of silent MorningEnrich failures.
Test plan
pytest tests/test_sf_ssm_pipefail_wiring.py— 4/4 pass (3 existing + 1 new)pytest tests/— 715 passed, 1 skippedStacked on #209
This PR's diff is on top of #209 (
set -eo pipefail). If #209 merges first, this rebases cleanly. Both touch the same SF JSON and the same wiring test file. Recommended merge order: #207 → #208 → #209 → this PR (#210).Scope deliberately narrow
Saturday + EOD SF SSM command arrays NOT modified here. Those run on different instances (Saturday spot, dashboard) where module-side flow-doctor wiring may not be uniformly present — setting
FLOW_DOCTOR_ENABLED=1for a module whosesetup_loggingcall doesn't passflow_doctor_yamlraisesRuntimeErrorat startup. A separate audit PR can extend coverage once each module's setup_logging call site is confirmed.🤖 Generated with Claude Code