feat(sf): split DataPhase1 → MorningEnrich + DataPhase1(phase1) — preflight task split P0 by cipher813 · Pull Request #249 · cipher813/alpha-engine-data

cipher813 · 2026-05-16T14:12:43Z

Standing rule + origin

Per the plan doc alpha-engine-docs/private/preflight-task-split-260516.md (§3–4, authoritative): every preflight-bearing action is its own SF task; a downstream failure must never re-run a completed upstream task. The extra spot-launch cost (~6–10 min, ~$0.005 per split) is explicitly weighed and accepted over launch economy.

Origin: 2026-05-16 Saturday SF DataPhase1 ran spot_data_weekly.sh --data-only = morning-enrich (~28 min) then phase1 on one spot, with phase1's preflight buried 28 minutes behind a completed morning-enrich. Every phase1 recovery re-paid the 28-min morning-enrich. A fast-fail that fires 28 minutes deep is not a fast-fail. RAGIngestion is the canonical split precedent this PR mirrors.

Changes

1. infrastructure/spot_data_weekly.sh — added --morning-enrich-only / --phase1-only run modes (RUN_MODE morning-enrich-only / phase1-only). morning-enrich and phase1+prune are independently gated by DO_MORNING_ENRICH / DO_PHASE1 derived from RUN_MODE. --data-only preserved (runs both) for manual/adhoc backward-compat. Per-mode MODE_LABEL drives the spot-side S3 log key (health/morning_enrich_log/… vs health/data_phase1_log/…) and the heartbeat dimension so a morning-enrich-only run is not mislabeled data-phase1. Shared scaffolding (log capture, S3 EXIT-trap upload, watchdog, heartbeat) works for all three modes.

2. preflight.py + weekly_collector.py — new dedicated morning_enrich DataPreflight mode whose checks are the UNION of what _run_morning_enrich actually needs: AWS_REGION env, polygon + FRED secret presence (_check_secrets), polygon + FRED reachability probes, S3 bucket + writeable sentinel, ArcticDB libraries present. Deliberately NO check_arcticdb_fresh — morning-enrich is part of what makes ArcticDB fresh, so a freshness gate at its own entry would be circular. weekly_collector.main() now maps --morning-enrich → "morning_enrich" (was the dependency-blind "daily", which never probed polygon/FRED — a drifted key failed 28 min into the spot run). Mode whitelist + docstring updated.

3. infrastructure/step_function.json — new MorningEnrich quartet (CheckSkipMorningEnrich / MorningEnrich / WaitForMorningEnrich / CheckMorningEnrichStatus, plus MorningEnrichWait + ExtractMorningEnrichError) inserted before DataPhase1, mirroring the RAGIngestion/DataPhase1 quartets exactly: same Retry (States.TaskFailed, MaxAttempts 1), same States.ALL → HandleFailure Catch with ResultPath $.error, same HeartbeatSeconds/TimeoutSeconds (5400/5460), same skip-input Choice shape (skip_morning_enrich, the analogue of skip_data_phase1). MorningEnrich runs --morning-enrich-only; DataPhase1 switched --data-only → --phase1-only. Chain: InitializeInput → CheckSkipMorningEnrich → MorningEnrich → WaitForMorningEnrich → CheckMorningEnrichStatus (success) → CheckSkipDataPhase1 → DataPhase1 → (existing next, unchanged). Every existing downstream state untouched.

4. Tests — +44 tests:

test_sf_morning_enrich_split_wiring.py — quartet presence, happy-path reachability (MorningEnrich strictly before DataPhase1), --morning-enrich-only / --phase1-only command shapes, HandleFailure Catch, pipefail + S3-log-trap invariants, ResultPath isolation.
test_spot_data_weekly_run_modes.py — flag→RUN_MODE parsing, independent DO_* gating, SKIP_RAG_BLOCK, per-mode MODE_LABEL + heartbeat (grep-style, mirrors test_spot_env_source_aws_region.py).
test_weekly_collector_preflight_mode_mapping.py — pins --morning-enrich → "morning_enrich" (not "daily").
test_preflight.py — extended with TestMorningEnrichMode (probes polygon+FRED, no arcticdb-freshness via check_arcticdb_fresh patch assertion, fail-fast on missing secret).

Validation

bash -n infrastructure/spot_data_weekly.sh — OK
python3 -c "import json; json.load(open('infrastructure/step_function.json'))" — OK
Full suite (pytest tests/ -q): 1094 passed, 1 skipped, zero failures (clean-main baseline ~1050; +44 new). 5 pre-existing daily_append.py concat FutureWarnings, unrelated.

Deploy

DEPLOY IS HELD. This is review-ready only. The in-flight recovery Saturday SF run must complete green (proving the #247/#248 preflight fixes end-to-end) before any SF redeploy. The Saturday SF must NOT be redeployed while a recovery execution is live on it.

🤖 Generated with Claude Code

…reflight task split P0 Standing rule (preflight-task-split-260516.md): every preflight-bearing action is its own SF task; a downstream failure must never re-run a completed upstream task. Accept the extra spot-launch cost. Origin: 2026-05-16 Saturday SF DataPhase1 ran spot_data_weekly.sh --data-only = morning-enrich (~28 min) THEN phase1 on one spot, with phase1's preflight buried 28 minutes behind a completed morning-enrich. Every phase1 recovery re-paid the 28-min morning-enrich. A fast-fail that fires 28 minutes deep is not a fast-fail. Changes: - spot_data_weekly.sh: add --morning-enrich-only / --phase1-only run modes (RUN_MODE morning-enrich-only / phase1-only). morning-enrich and phase1+prune are now independently gated by DO_MORNING_ENRICH / DO_PHASE1 derived from RUN_MODE. --data-only preserved (runs both) for manual/adhoc backward-compat. Per-mode MODE_LABEL feeds the spot-side S3 log key + heartbeat dimension so a morning-enrich-only run is not mislabeled data-phase1. Shared scaffolding (log capture, S3 EXIT-trap upload, watchdog, heartbeat) works for all three modes. - preflight.py: dedicated "morning_enrich" mode whose checks are the UNION of what _run_morning_enrich needs (AWS_REGION env, polygon + FRED secret presence + reachability probes, S3 bucket + writeable sentinel, ArcticDB libraries present). Deliberately NO ArcticDB- freshness check -- morning-enrich is part of what makes it fresh. weekly_collector.main() now maps --morning-enrich -> "morning_enrich" (was the dependency-blind "daily" which skipped polygon/FRED probes). - step_function.json: new MorningEnrich quartet (CheckSkipMorningEnrich / MorningEnrich / WaitForMorningEnrich / CheckMorningEnrichStatus + MorningEnrichWait + ExtractMorningEnrichError) inserted BEFORE DataPhase1, mirroring the RAGIngestion/DataPhase1 quartets exactly (same Retry/Catch/Heartbeat/Timeout/HandleFailure wiring + a skip_morning_enrich Choice). MorningEnrich runs --morning-enrich-only; DataPhase1 switched --data-only -> --phase1-only. Chain: InitializeInput -> CheckSkipMorningEnrich -> MorningEnrich -> ... -> CheckSkipDataPhase1 -> DataPhase1 -> (existing next, unchanged). All downstream states untouched. - tests: +44 tests across test_sf_morning_enrich_split_wiring.py, test_spot_data_weekly_run_modes.py, test_weekly_collector_preflight_mode_mapping.py, and extended test_preflight.py (morning_enrich mode: probes polygon+FRED, no arcticdb-freshness, fail-fast on missing secret). Full suite: 1094 passed, 1 skipped (clean-main baseline ~1050; +44 new). bash -n + SF JSON parse validated. DEPLOY HELD. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…e; rule shipped disabled) (#258) Foundational spine of ROADMAP "Scheduled Friday-PM 'shell run'" (P1, added 2026-05-16) — the *prevention* half of Saturday-SF reliability (the *containment* half, preflight-task-split, shipped 2026-05-16 in data #249/#250). Surfaces a Saturday-fatal bootstrap break ~11.5h before the unattended Sat 02:00 PT firing, inside an operator-awake Friday-evening fix window, instead of as a Saturday-morning-after lost-week incident. STRICT SUPERSET — shell_run absent/false ⇒ byte-identical to today's real Saturday run. Only two existing edges change, each routed through a new Choice whose Default is the pre-spine target: InitializeInput.Next: CheckSkipMorningEnrich -> CheckShellRun (Default -> CheckSkipMorningEnrich; unchanged for the real run) WaitForWeeklySubstrateHealthCheck.Next: NotifyComplete -> CheckShellRunNotify (Default -> NotifyComplete; the real Saturday SUCCESS email is untouched) shell_run propagation (mirrors the existing skip_*/JsonMerge precedent exactly — no new mechanism invented): CheckShellRun (Choice): {"shell_run": true} -> ApplyShellRunDefaults ApplyShellRunDefaults (Pass): States.JsonMerge(<all 16 skip_*=true>, $, false) layers every skip flag = true UNDER the execution input so an explicit per-flag override still wins (e.g. {"shell_run":true,"skip_research":false} still runs Research). Every workload state already has a Choice-gated skip_*, so the whole workload no-ops via the EXISTING skip mechanism. Per-state dry-vs-skip inventory under shell_run (spine = pure-skip; per-module --preflight-only/--dry-run "spots boot + smoke" are SCOPED FOLLOW-ONS): SKIPPED via existing skip_* gate (16): MorningEnrich, DataPhase1, RAGIngestion, RegimeSubstrate, RegimeRetrospectiveEval, Research, DataPhase2, EvalJudge(+RollingMean), RationaleClustering, ReplayConcordance, Counterfactual, PredictorTraining, DriftDetection, Backtester, Parity, Evaluator STILL RUNS (read-only, no skip gate by design — exactly the bootstrap/ transport smoke the shell run wants Friday PM): SaturdayHealthCheck, WeeklySubstrateHealthCheck. Their shell_run-aware missing-Friday-bar tolerance is ROADMAP owed-work item 5 (scoped follow-on). NOTIFY: NotifyShellRunComplete (shell-run-tagged Subject, reuses the exact NotifyComplete SNS substrate — alpha-engine-alerts topic, same Resource). Friday EventBridge rule (CFN, the documented infra-as-code home for EventBridge rules — SaturdayTrigger/WeekdayTrigger live there): FridayShellRunTrigger, cron(30 21 ? * FRI *) = 21:30 UTC Fri = 14:30 PT (PDT, dominant season) / 13:30 PT (PST). Chosen AFTER the Friday EOD SF (~1:25 PT) so it never collides with PostMarketData/EODReconcile/ StopTradingInstance on the trading instance, and ~11.5h BEFORE the real Sat 09:00 UTC firing. Targets the SAME alpha-engine-saturday-pipeline SF (NOT a parallel SF) with {"shell_run": true}, same EventBridgeSfnRoleArn — the existing states:StartExecution grant is SF-ARN-scoped so NO IAM change is needed. SHIPPED State: DISABLED — zero-risk merge. Additive observability, NOT a backstop (the "fail loud, no backstop" design decision stands). Operator enable step: aws events enable-rule --name alpha-engine-friday-shell-run --region us-east-1 Consolidated-notify decision: shell-run SUCCESS is delivered by reusing the existing NotifyComplete SNS pattern with a SHELL RUN-tagged Subject (zero new infra). A shell-run FAILURE reuses the unchanged HandleFailure (its 20 inbound error edges deliberately NOT re-pointed: high churn, zero added operator value, and would perturb the real Saturday failure path's risk surface — the FAILED alert's Friday execution timestamp/ID is the actionable signal). The richer per-state pass/fail report (ROADMAP design point 5) is a scoped follow-on. Scoped per-module follow-on PRs (repo -> state -> dry mode needed; NOT done here — these convert "skipped" to "spots boot + smoke"): alpha-engine-data -> DataPhase1/MorningEnrich -> spot_data_weekly.sh --preflight-only (preflight + universe-freshness scan, no polygon/FMP writes); shell_run-aware tolerance for "Friday bar not yet present" alpha-engine-data -> RAGIngestion -> spot_data_weekly.sh --rag-only --preflight-only (corpus reachability + secrets, no SEC/embedding writes) alpha-engine-predictor -> PredictorTraining -> spot_train.sh --preflight-only (load + WF-gate-shape check, NO predictor/weights/ promotion) alpha-engine-backtester -> Backtester/Parity/Evaluator -> spot_backtest.sh --mode=smoke + simulate-dry, NO config/*.json auto-apply (freeze_evaluator pattern is the model) alpha-engine-data -> SaturdayHealthCheck/WeeklySubstrateHealthCheck -> shell_run-aware missing-Friday-bar tolerance (ROADMAP owed-work item 5) (Research/predictor-inference/executor already have --dry-run/--simulate; wiring those into the SF states is part of the per-state follow-ons above.) Tests: tests/test_sf_friday_shell_run_wiring.py (23 cases — strict-superset edges, JsonMerge user-input-wins order, every skip-gate covered by the defaults blob, full happy-path traversal for shell_run true vs absent, Friday rule DISABLED + same-SF + shell_run=true + cron). Updated two pre-spine wiring tests (morning_enrich_split, substrate_check) to assert through the new gates while pinning Default == pre-spine target. Full suite: 1242 passed, 1 skipped (pre-existing, unrelated). No new pip deps. No secrets. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cipher813 mentioned this pull request May 16, 2026

feat(sf): split Backtester → Backtest + Parity — preflight task split P1 #250

Merged

cipher813 merged commit fe9507f into main May 16, 2026
1 check passed

cipher813 deleted the feat/split-dataphase1-morningenrich branch May 16, 2026 14:53

cipher813 mentioned this pull request May 18, 2026

feat(sf): Friday-PM shell_run dry-pass of the Saturday pipeline (spine; rule shipped disabled) #258

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sf): split DataPhase1 → MorningEnrich + DataPhase1(phase1) — preflight task split P0#249

feat(sf): split DataPhase1 → MorningEnrich + DataPhase1(phase1) — preflight task split P0#249
cipher813 merged 1 commit into
mainfrom
feat/split-dataphase1-morningenrich

cipher813 commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cipher813 commented May 16, 2026

Standing rule + origin

Changes

Validation

Deploy

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant