Skip to content

feat(ssm_log_capture): v0.25.0 — Python CLI chokepoint for SSM-step log capture#57

Merged
cipher813 merged 1 commit into
mainfrom
feat/ssm-log-capture-chokepoint
May 22, 2026
Merged

feat(ssm_log_capture): v0.25.0 — Python CLI chokepoint for SSM-step log capture#57
cipher813 merged 1 commit into
mainfrom
feat/ssm-log-capture-chokepoint

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

Lifts the inline-bash trap 'aws s3 cp /var/log/X.log "s3://..." || true' EXIT + tee /var/log/X.log pattern that previously appeared as 8 mirrored copies across the Saturday SF spot states (alpha-engine-data PR #244, 2026-05-15) into a Python CLI under alpha_engine_lib.ssm_log_capture.

Why now

The 2026-05-22 Friday-PM shell-run dry-pass of the Saturday SF caught a latent regression introduced by alpha-engine-data PR #253 (merged 2026-05-17). #253 switched the Saturday spot states from plain commands JSON arrays to commands.$ States.Array(...) so they could splice $.run_date / $.preflight_args via States.Format.

Inside States.Array arg strings, ASL's documented escape for an inner single quote is \' — but in practice the AWS evaluator does NOT unescape \' to '. The trap line 'trap \'cmd\' EXIT' rendered into the SSM _script.sh with literal backslash-apostrophe pairs; bash word-split the line and passed every token after aws to trap as a signal name. Symptom: trap: s3: invalid signal specification, exit 127 at line 7 of the rendered script. The dry-pass fired exactly as designed — first execution under the broken pattern; no Saturday SF had run between the #253 merge and the dry-pass.

Per the ~/Development/CLAUDE.md SOTA / institutional-approach rule — sub-sub-rule:

when mirroring a pattern across repos, consider lifting it into alpha-engine-lib... Pure-Bash primitives can stay mirrored unless re-expressible as a Python CLI entry callable from Bash, in which case the CLI re-expression is the institutional path

The SF JSON can now spell a single States.Format template with {} for $.preflight_args and the entire trap+tee mechanism lives in tested Python — no bash trap, no inner single quotes, no ASL escape surface.

API

  • ssm_log_capture.run(slug, log_path, cmd, *, bucket=None, env=None) -> int — tees inner subprocess's merged stdout+stderr to a local log file AND parent stdout (so the SSM script's stdout-up-to-24KB preview still surfaces in CloudWatch), on exit ships the log to s3://{bucket}/_ssm_logs/{slug}/{YYYY-MM-DD}/{hostname}-{HHMMSSZ}.log, propagates inner exit code verbatim. Never raises — S3 errors logged at WARNING and swallowed so the SF Catch sees the true inner exit.
  • CLI: python -m alpha_engine_lib.ssm_log_capture run --slug X --log /var/log/X.log -- <cmd...>. Designed for SF JSON callers; mirrors the alpha_engine_lib.alerts CLI convention.

S3 layout

Unchanged from the pre-lift form:
s3://{bucket}/_ssm_logs/{slug}/{YYYY-MM-DD}/{hostname}-{HHMMSSZ}.log
(date/time/host computed at exit — multi-hour runs straddling UTC midnight key on the exit-side date)

Test coverage

20 new tests in tests/test_ssm_log_capture.py:

  • canonical S3 key layout pinned
  • inner exit propagation: success, non-zero, missing binary → 127
  • stdout AND stderr land in BOTH the log file AND the parent stdout
  • S3 upload called with correct bucket + key
  • S3 failure does NOT mask inner exit (SF Catch sees the true cause)
  • missing log file at ship time returns False without raising
  • CLI subcommand parsing, -- separator, help, missing-args errors

Full suite: 597 passed, 7 warnings in 3.04s.

Test plan

  • Lib unit tests pass (597/597)
  • Consumers bump pin to v0.25.0 (3 follow-up PRs: alpha-engine-data, alpha-engine-predictor, alpha-engine-backtester)
  • alpha-engine-data Saturday SF JSON converts 8 broken states to python -m alpha_engine_lib.ssm_log_capture run ... -- bash <launcher> {} (follow-up PR)
  • Friday-PM shell-run dry-pass redrive confirms all 8 states green before Saturday 09:00 UTC

🤖 Generated with Claude Code

…og capture

Lifts the inline-bash `trap 'aws s3 cp /var/log/X.log "s3://..." || true' EXIT`
+ `tee /var/log/X.log` pattern that previously appeared as 8 mirrored
copies across the Saturday SF spot states (alpha-engine-data PR #244,
2026-05-15) into a Python CLI under `alpha_engine_lib.ssm_log_capture`.

**Why now:** the 2026-05-22 Friday-PM shell-run dry-pass of the Saturday
SF caught a latent regression introduced by alpha-engine-data PR #253
(merged 2026-05-17). #253 switched the Saturday spot states from plain
`commands` JSON arrays to `commands.$ States.Array(...)` so they could
splice `$.run_date` / `$.preflight_args` via `States.Format`. Inside
`States.Array` arg strings, ASL's documented escape for an inner single
quote is `\'` — but in practice the AWS evaluator does NOT unescape
`\'` to `'`. The trap line `'trap \'cmd\' EXIT'` rendered into the SSM
`_script.sh` with literal backslash-apostrophe pairs; bash word-split
the line and passed every token after `aws` to `trap` as a signal
name. Symptom: `trap: s3: invalid signal specification`, exit 127 at
line 7 of the rendered script. The dry-pass fired exactly as designed
(first execution under the broken pattern; no Saturday SF had run
between the #253 merge and the dry-pass).

Per the `~/Development/CLAUDE.md` SOTA / institutional-approach rule —
sub-sub-rule "when mirroring a pattern across repos, consider lifting
it into `alpha-engine-lib`... Pure-Bash primitives can stay mirrored
unless re-expressible as a Python CLI entry callable from Bash, in
which case the CLI re-expression is the institutional path" — this is
the right shape: the SF JSON can now spell a single `States.Format`
template with `{}` for `$.preflight_args` and the entire trap+tee
mechanism lives in tested Python.

**Public API:**

- `ssm_log_capture.run(slug, log_path, cmd, *, bucket=None, env=None)
  -> int` — tees the inner subprocess's merged stdout+stderr to a local
  log file AND parent stdout (so the SSM script's stdout-up-to-24KB
  preview still surfaces in CloudWatch / StandardOutputContent), on
  exit ships the log to `s3://{bucket}/_ssm_logs/{slug}/{YYYY-MM-DD}/{hostname}-{HHMMSSZ}.log`,
  propagates the inner exit code verbatim. Never raises — S3 errors
  are logged at WARNING and swallowed so the SF Catch sees the true
  inner exit.
- CLI: `python -m alpha_engine_lib.ssm_log_capture run --slug X --log /var/log/X.log -- <cmd...>`.
  Designed for SF JSON callers; mirrors the `alerts` CLI convention.

**S3 layout** (unchanged from the pre-lift form):
`s3://{bucket}/_ssm_logs/{slug}/{YYYY-MM-DD}/{hostname}-{HHMMSSZ}.log`
with date/time/host computed at exit (a multi-hour run that straddles
UTC midnight keys on the exit-side date).

Test coverage (20 new tests):
- canonical key layout pinned
- inner exit propagation (success, non-zero, missing binary → 127)
- stdout AND stderr land in the log file AND the parent stdout
- S3 upload called with correct bucket + key
- S3 failure does NOT mask inner exit
- missing log file at ship time returns False without raising
- CLI subcommand parsing, `--` separator, help, missing-args errors

Consumers (next commits): alpha-engine-data, alpha-engine-predictor,
alpha-engine-backtester bump the lib pin to v0.25.0; the alpha-engine-data
Saturday SF JSON converts 8 broken states from inline-bash-trap to
`python -m alpha_engine_lib.ssm_log_capture run ... -- bash <launcher> {}`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant