feat(sf): split Backtester → Backtest + Parity — preflight task split P1#250
Merged
Conversation
…reflight task split P0 Standing rule (preflight-task-split-260516.md): every preflight-bearing action is its own SF task; a downstream failure must never re-run a completed upstream task. Accept the extra spot-launch cost. Origin: 2026-05-16 Saturday SF DataPhase1 ran spot_data_weekly.sh --data-only = morning-enrich (~28 min) THEN phase1 on one spot, with phase1's preflight buried 28 minutes behind a completed morning-enrich. Every phase1 recovery re-paid the 28-min morning-enrich. A fast-fail that fires 28 minutes deep is not a fast-fail. Changes: - spot_data_weekly.sh: add --morning-enrich-only / --phase1-only run modes (RUN_MODE morning-enrich-only / phase1-only). morning-enrich and phase1+prune are now independently gated by DO_MORNING_ENRICH / DO_PHASE1 derived from RUN_MODE. --data-only preserved (runs both) for manual/adhoc backward-compat. Per-mode MODE_LABEL feeds the spot-side S3 log key + heartbeat dimension so a morning-enrich-only run is not mislabeled data-phase1. Shared scaffolding (log capture, S3 EXIT-trap upload, watchdog, heartbeat) works for all three modes. - preflight.py: dedicated "morning_enrich" mode whose checks are the UNION of what _run_morning_enrich needs (AWS_REGION env, polygon + FRED secret presence + reachability probes, S3 bucket + writeable sentinel, ArcticDB libraries present). Deliberately NO ArcticDB- freshness check -- morning-enrich is part of what makes it fresh. weekly_collector.main() now maps --morning-enrich -> "morning_enrich" (was the dependency-blind "daily" which skipped polygon/FRED probes). - step_function.json: new MorningEnrich quartet (CheckSkipMorningEnrich / MorningEnrich / WaitForMorningEnrich / CheckMorningEnrichStatus + MorningEnrichWait + ExtractMorningEnrichError) inserted BEFORE DataPhase1, mirroring the RAGIngestion/DataPhase1 quartets exactly (same Retry/Catch/Heartbeat/Timeout/HandleFailure wiring + a skip_morning_enrich Choice). MorningEnrich runs --morning-enrich-only; DataPhase1 switched --data-only -> --phase1-only. Chain: InitializeInput -> CheckSkipMorningEnrich -> MorningEnrich -> ... -> CheckSkipDataPhase1 -> DataPhase1 -> (existing next, unchanged). All downstream states untouched. - tests: +44 tests across test_sf_morning_enrich_split_wiring.py, test_spot_data_weekly_run_modes.py, test_weekly_collector_preflight_mode_mapping.py, and extended test_preflight.py (morning_enrich mode: probes polygon+FRED, no arcticdb-freshness, fail-fast on missing secret). Full suite: 1094 passed, 1 skipped (clean-main baseline ~1050; +44 new). bash -n + SF JSON parse validated. DEPLOY HELD. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…it P1
Standing rule (preflight-task-split-260516.md): every preflight-bearing
action is its own SF task; a downstream failure must never re-run a
completed upstream task. Accept the extra spot-launch cost.
Origin: the Saturday SF Backtester state ran
`spot_backtest.sh --skip-stages=evaluator` = backtest (~121 min, 10y
simulate + param sweep) THEN parity on one spot. Every parity recovery
re-paid the 121-min backtest because backtest and parity are independent
preflight-bearing actions bundled in one task.
SF-WIRING-ONLY -- no backtester-repo change. spot_backtest.sh's
--skip-stages already supports backtest/parity/evaluator independently
(validated stage vocabulary _KNOWN_STAGES="backtest parity evaluator").
Naming: LOWER-CHURN option chosen -- the existing `Backtester` state
name is KEPT for the backtest-stage state (its SSM command flips
--skip-stages=evaluator -> --skip-stages=parity,evaluator so it runs ONLY
the backtest stage); a NEW `Parity` quartet is added after it. Keeping
`Backtester` avoids rewiring DriftDetection's two Next/Catch edges and
all inbound references to CheckSkipBacktester (vs renaming to `Backtest`,
which would touch 3+ unrelated states).
Changes:
- step_function.json:
- Backtester: --skip-stages=evaluator -> --skip-stages=parity,evaluator
(backtest stage only); Comment updated. CheckBacktesterStatus
Success: CheckSkipEvaluator -> CheckSkipParity.
- New Parity quartet inserted after BacktesterWait, before
CheckSkipEvaluator: CheckSkipParity / Parity / WaitForParity /
CheckParityStatus / ParityWait + ExtractParityError. Parity runs
--skip-stages=backtest,evaluator. Mirrors the Backtester quartet
exactly -- same Retry/Catch/HeartbeatSeconds/TimeoutSeconds/
HandleFailure-Catch/skip-Choice patterns; TimeoutSeconds 7260 /
executionTimeout 7200 copied from the old combined Backtester (not
under-sized). Parity success -> CheckSkipEvaluator (existing
downstream chain UNCHANGED).
- CheckSkipBacktester: {"skip_backtester": true} retains its original
whole-pair semantics (skip BOTH backtest and parity -> route to
CheckSkipEvaluator). New {"skip_parity": true} on CheckSkipParity
skips ONLY parity.
- Evaluator / MorningEnrich (P0) / DataPhase1 / all other states
untouched.
- tests/test_sf_backtest_parity_split_wiring.py: +41 tests mirroring
test_sf_morning_enrich_split_wiring.py conventions (quartet presence,
chain ordering, happy-path reachability Backtester<Parity<Evaluator,
skip-stages argument correctness, no combined --skip-stages=evaluator
anywhere, budget parity Parity==Backtester, HandleFailure Catch on
both, result-path isolation, lower-churn naming pinned).
- tests/test_sf_eval_judge_wiring.py: updated TestBacktesterTransition
for the new transitive path (Backtester success -> CheckSkipParity ->
... -> CheckSkipEvaluator) -- the eval-judge-reachability invariant
stays pinned.
Full suite: 1135 passed, 1 skipped (post-P0 baseline 1094 passed; +41
new). SF JSON parses, no dangling Next/Default/Catch targets. DEPLOY
HELD until the in-flight recovery Saturday SF run is green; SF redeploy
must NOT happen while a recovery execution is live.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813
added a commit
that referenced
this pull request
May 18, 2026
…e; rule shipped disabled) (#258) Foundational spine of ROADMAP "Scheduled Friday-PM 'shell run'" (P1, added 2026-05-16) — the *prevention* half of Saturday-SF reliability (the *containment* half, preflight-task-split, shipped 2026-05-16 in data #249/#250). Surfaces a Saturday-fatal bootstrap break ~11.5h before the unattended Sat 02:00 PT firing, inside an operator-awake Friday-evening fix window, instead of as a Saturday-morning-after lost-week incident. STRICT SUPERSET — shell_run absent/false ⇒ byte-identical to today's real Saturday run. Only two existing edges change, each routed through a new Choice whose Default is the pre-spine target: InitializeInput.Next: CheckSkipMorningEnrich -> CheckShellRun (Default -> CheckSkipMorningEnrich; unchanged for the real run) WaitForWeeklySubstrateHealthCheck.Next: NotifyComplete -> CheckShellRunNotify (Default -> NotifyComplete; the real Saturday SUCCESS email is untouched) shell_run propagation (mirrors the existing skip_*/JsonMerge precedent exactly — no new mechanism invented): CheckShellRun (Choice): {"shell_run": true} -> ApplyShellRunDefaults ApplyShellRunDefaults (Pass): States.JsonMerge(<all 16 skip_*=true>, $, false) layers every skip flag = true UNDER the execution input so an explicit per-flag override still wins (e.g. {"shell_run":true,"skip_research":false} still runs Research). Every workload state already has a Choice-gated skip_*, so the whole workload no-ops via the EXISTING skip mechanism. Per-state dry-vs-skip inventory under shell_run (spine = pure-skip; per-module --preflight-only/--dry-run "spots boot + smoke" are SCOPED FOLLOW-ONS): SKIPPED via existing skip_* gate (16): MorningEnrich, DataPhase1, RAGIngestion, RegimeSubstrate, RegimeRetrospectiveEval, Research, DataPhase2, EvalJudge(+RollingMean), RationaleClustering, ReplayConcordance, Counterfactual, PredictorTraining, DriftDetection, Backtester, Parity, Evaluator STILL RUNS (read-only, no skip gate by design — exactly the bootstrap/ transport smoke the shell run wants Friday PM): SaturdayHealthCheck, WeeklySubstrateHealthCheck. Their shell_run-aware missing-Friday-bar tolerance is ROADMAP owed-work item 5 (scoped follow-on). NOTIFY: NotifyShellRunComplete (shell-run-tagged Subject, reuses the exact NotifyComplete SNS substrate — alpha-engine-alerts topic, same Resource). Friday EventBridge rule (CFN, the documented infra-as-code home for EventBridge rules — SaturdayTrigger/WeekdayTrigger live there): FridayShellRunTrigger, cron(30 21 ? * FRI *) = 21:30 UTC Fri = 14:30 PT (PDT, dominant season) / 13:30 PT (PST). Chosen AFTER the Friday EOD SF (~1:25 PT) so it never collides with PostMarketData/EODReconcile/ StopTradingInstance on the trading instance, and ~11.5h BEFORE the real Sat 09:00 UTC firing. Targets the SAME alpha-engine-saturday-pipeline SF (NOT a parallel SF) with {"shell_run": true}, same EventBridgeSfnRoleArn — the existing states:StartExecution grant is SF-ARN-scoped so NO IAM change is needed. SHIPPED State: DISABLED — zero-risk merge. Additive observability, NOT a backstop (the "fail loud, no backstop" design decision stands). Operator enable step: aws events enable-rule --name alpha-engine-friday-shell-run --region us-east-1 Consolidated-notify decision: shell-run SUCCESS is delivered by reusing the existing NotifyComplete SNS pattern with a SHELL RUN-tagged Subject (zero new infra). A shell-run FAILURE reuses the unchanged HandleFailure (its 20 inbound error edges deliberately NOT re-pointed: high churn, zero added operator value, and would perturb the real Saturday failure path's risk surface — the FAILED alert's Friday execution timestamp/ID is the actionable signal). The richer per-state pass/fail report (ROADMAP design point 5) is a scoped follow-on. Scoped per-module follow-on PRs (repo -> state -> dry mode needed; NOT done here — these convert "skipped" to "spots boot + smoke"): alpha-engine-data -> DataPhase1/MorningEnrich -> spot_data_weekly.sh --preflight-only (preflight + universe-freshness scan, no polygon/FMP writes); shell_run-aware tolerance for "Friday bar not yet present" alpha-engine-data -> RAGIngestion -> spot_data_weekly.sh --rag-only --preflight-only (corpus reachability + secrets, no SEC/embedding writes) alpha-engine-predictor -> PredictorTraining -> spot_train.sh --preflight-only (load + WF-gate-shape check, NO predictor/weights/ promotion) alpha-engine-backtester -> Backtester/Parity/Evaluator -> spot_backtest.sh --mode=smoke + simulate-dry, NO config/*.json auto-apply (freeze_evaluator pattern is the model) alpha-engine-data -> SaturdayHealthCheck/WeeklySubstrateHealthCheck -> shell_run-aware missing-Friday-bar tolerance (ROADMAP owed-work item 5) (Research/predictor-inference/executor already have --dry-run/--simulate; wiring those into the SF states is part of the per-state follow-ons above.) Tests: tests/test_sf_friday_shell_run_wiring.py (23 cases — strict-superset edges, JsonMerge user-input-wins order, every skip-gate covered by the defaults blob, full happy-path traversal for shell_run true vs absent, Friday rule DISABLED + same-SF + shell_run=true + cron). Updated two pre-spine wiring tests (morning_enrich_split, substrate_check) to assert through the new gates while pinning Default == pre-spine target. Full suite: 1242 passed, 1 skipped (pre-existing, unrelated). No new pip deps. No secrets. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked PR — merge AFTER #249
This PR STACKS ON the P0 branch
feat/split-dataphase1-morningenrich(PR #249). Base is the P0 branch, NOTmain— both PRs editinfrastructure/step_function.jsonand stacking avoids a merge conflict. Merge #249 first, then this PR (GitHub will auto-retarget this PR tomainonce #249 merges).Rule
Per
alpha-engine-docs/private/preflight-task-split-260516.md§5 (standing rule, stated twice by Brian with explicit irritation): every Step Function state that bundles more than one independent preflight-bearing action is split so each is its own SF task, and a failure in a later task never re-runs a completed earlier task. Accept the extra spot-launch cost.Origin: the Saturday SF
Backtesterstate ranspot_backtest.sh --skip-stages=evaluator= backtest (~121 min, 10y simulate + param sweep) then parity on one spot. Every parity recovery re-paid the 121-min backtest.SF-wiring-only — no backtester-repo change
spot_backtest.sh's--skip-stagesalready supportsbacktest/parity/evaluatorindependently (validated stage vocabulary_KNOWN_STAGES="backtest parity evaluator"). This PR only re-wires the Step Function — no backtester-repo change is needed.Evaluatorwas already its own state (split 2026-05-07) and is untouched.Naming: lower-churn option chosen
Kept the existing
Backtesterstate name for the backtest-stage state (its SSM command flips--skip-stages=evaluator→--skip-stages=parity,evaluatorso it runs ONLY the backtest stage) and added a newParityquartet after it. KeepingBacktesteravoids rewiringDriftDetection's twoNext/Catchedges and all inbound references toCheckSkipBacktester(renaming toBacktestwould have touched 3+ unrelated states). The task spec explicitly permits this option.Changes
infrastructure/step_function.json:Backtester: SSM command--skip-stages=evaluator→--skip-stages=parity,evaluator(backtest stage only); Comment updated.CheckBacktesterStatusSuccess:CheckSkipEvaluator→CheckSkipParity.Parityquartet inserted afterBacktesterWait, beforeCheckSkipEvaluator:CheckSkipParity/Parity/WaitForParity/CheckParityStatus/ParityWait+ExtractParityError.Parityruns--skip-stages=backtest,evaluator. Mirrors theBacktesterquartet exactly — same Retry/Catch/HeartbeatSeconds/TimeoutSeconds(7260) /executionTimeout(7200, copied from the old combinedBacktester— not under-sized) /HandleFailure-Catch / skip-Choice patterns.Paritysuccess →CheckSkipEvaluator(existing downstream chain UNCHANGED).CheckSkipBacktester:{"skip_backtester": true}retains its original whole-pair semantics (skip BOTH backtest and parity →CheckSkipEvaluator). New{"skip_parity": true}onCheckSkipParityskips ONLY parity.Evaluator/MorningEnrich(P0) /DataPhase1/ all other states untouched.New chain:
... → CheckSkipBacktester → Backtester (--skip-stages=parity,evaluator) → WaitForBacktester → CheckBacktesterStatus(Success) → CheckSkipParity → Parity (--skip-stages=backtest,evaluator) → WaitForParity → CheckParityStatus(Success) → CheckSkipEvaluator → … (unchanged)Tests:
tests/test_sf_backtest_parity_split_wiring.py: +41 tests mirroringtest_sf_morning_enrich_split_wiring.pyconventions — quartet presence, chain ordering, happy-path reachability (Backtester<Parity<Evaluator), skip-stages argument correctness per state, no combined--skip-stages=evaluatoranywhere, budget parity (Parity==Backtester),HandleFailureCatch on both, result-path isolation, lower-churn naming pinned.tests/test_sf_eval_judge_wiring.py: updatedTestBacktesterTransitionfor the new transitive path (Backtestersuccess →CheckSkipParity→ … →CheckSkipEvaluator) — the eval-judge-reachability invariant stays pinned.Validation
Next/Default/Catch.Nexttargets (full state-graph walk).DEPLOY IS HELD
DEPLOY IS HELD until the in-flight recovery Saturday SF run is green (proving #247/#248 end-to-end), and the SF redeploy must NOT happen while a recovery execution is live on the Saturday SF. No deploy script was run, no Step Function was redeployed, no SF was triggered.
🤖 Generated with Claude Code