Skip to content

fix(spot): re-export AWS_REGION into spot shell — close #241 .env-removal regression (Sat SF DataPhase1 P0)#247

Merged
cipher813 merged 1 commit into
mainfrom
fix/spot-export-aws-region
May 16, 2026
Merged

fix(spot): re-export AWS_REGION into spot shell — close #241 .env-removal regression (Sat SF DataPhase1 P0)#247
cipher813 merged 1 commit into
mainfrom
fix/spot-export-aws-region

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Root cause (P0 — 2026-05-16 Saturday SF DataPhase1 failure)

Saturday pipeline aborted ~1m52s into DataPhase1 at weekly_collector.py --morning-enrich preflight:

RuntimeError: Pre-flight: required env vars missing: ['AWS_REGION']

PR 9f (#241, 61253df) removed the spot-side .env sourcing in favor of runtime get_secret() SSM lookups. Diff line 118→119:

  • before: ENV_SOURCE="set -a; ... source .../.env; set +a; export XDG_CACHE_HOME=/tmp; ..."
  • after: ENV_SOURCE="export XDG_CACHE_HOME=/tmp; export PYTHON_BIN=$REMOTE_PYTHON;"

The audit correctly migrated secrets to get_secret(), but the same .env was also the only thing exporting AWS_REGION — a plain env var (not a secret) that alpha_engine_lib.preflight.check_env_vars() hard-requires and boto3 needs as a default region. This is the [shim-deletion / launch-mechanism] regression class.

No stale-data risk: failure happened at preflight, before any S3/ArcticDB write; spot self-terminated.

Fix

ENV_SOURCE (interpolated into every run_remote bash -s <<HEREDOC workload) now also exports AWS_REGION + AWS_DEFAULT_REGION from the dispatcher-side $AWS_REGION (already defaulted to us-east-1). Applied to both spot_data_weekly.sh and spot_drift_detection.sh — the identical #241 regression also affects the Saturday DriftDetection state.

Test

tests/test_spot_env_source_aws_region.py pins both scripts' ENV_SOURCE region exports so a future edit can't silently drop them. Subset run: 278 passed, 1 skipped.

Recovery

After merge, re-trigger the Saturday SF (dispatcher git-pulls main before running the spot script). DataPhase1 must clear before downstream Research/Predictor/Backtester.

🤖 Generated with Claude Code

…oval regression

PR 9f (#241) removed `.env` sourcing from the spot bootstrap in favor of
runtime get_secret() SSM lookups. That handled secrets, but the same
`.env` was also the only thing exporting AWS_REGION — a plain env var
(not a secret) that alpha_engine_lib.preflight.check_env_vars() hard-
requires and boto3 needs as a default region.

Result: 2026-05-16 Saturday SF DataPhase1 aborted ~1m52s in, at
`weekly_collector.py --morning-enrich` preflight:
  RuntimeError: Pre-flight: required env vars missing: ['AWS_REGION']
Whole Saturday pipeline aborted (no downstream stale-data risk — it
failed before any write).

Fix: ENV_SOURCE (interpolated into every remote `run_remote bash`
heredoc) now also exports AWS_REGION + AWS_DEFAULT_REGION from the
dispatcher-side $AWS_REGION (already defaulted to us-east-1). Applied to
both spot_data_weekly.sh and spot_drift_detection.sh — the identical
#241 regression affects the Saturday DriftDetection state too.

Regression test pins the ENV_SOURCE region exports in both scripts so a
future ENV_SOURCE edit can't silently drop them again
(shim-deletion launch-mechanism class).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit dd5a2c9 into main May 16, 2026
1 check passed
@cipher813 cipher813 deleted the fix/spot-export-aws-region branch May 16, 2026 12:50
cipher813 added a commit that referenced this pull request May 16, 2026
…nv-var check (#248)

Second facet of the #241/#242 .env-deprecation regression, surfaced by
the 2026-05-16 Saturday SF recovery run: AWS_REGION fix (#247) let
DataPhase1 clear preflight and MorningEnrich completed (polygon 913/921
+ FRED 4/4 fetched fine via get_secret()), but `weekly_collector.py
--phase 1` then aborted at preflight:

  RuntimeError: Pre-flight: required env vars missing:
  ['FRED_API_KEY', 'POLYGON_API_KEY']

Every collector AND both reachability probes in this file already
resolve these keys via get_secret() (SSM). The only stale code was
DataPreflight.run()'s `check_env_vars("FRED_API_KEY","POLYGON_API_KEY")`
(and the phase2 FMP/FINNHUB/EDGAR equivalent) — an os.environ assertion
the env-deprecation arc migrated every consumer away from but missed
here. MorningEnrich slipped through because its preflight only checks
AWS_REGION; phase1/phase2 hard-failed on the stale gate.

Fix: AWS_REGION stays an env-var check (plain boto3 region, not a
secret); the API keys now go through a new `_check_secrets()` helper
that calls get_secret(required=False) — same <1s fail-fast intent,
same RuntimeError shape, sourced from SSM (with get_secret's env
fallback) instead of os.environ. phase2 had the identical latent bug
and is fixed in the same change.

Tests updated to the get_secret() reality (patch preflight.get_secret
rather than os.environ); full suite 1050 passed, 1 skipped.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant