feat(sf): split Backtester → Backtest + Parity — preflight task split P1 by cipher813 · Pull Request #250 · cipher813/alpha-engine-data

cipher813 · 2026-05-16T14:20:40Z

Stacked PR — merge AFTER #249

This PR STACKS ON the P0 branch feat/split-dataphase1-morningenrich (PR #249). Base is the P0 branch, NOT main — both PRs edit infrastructure/step_function.json and stacking avoids a merge conflict. Merge #249 first, then this PR (GitHub will auto-retarget this PR to main once #249 merges).

Rule

Per alpha-engine-docs/private/preflight-task-split-260516.md §5 (standing rule, stated twice by Brian with explicit irritation): every Step Function state that bundles more than one independent preflight-bearing action is split so each is its own SF task, and a failure in a later task never re-runs a completed earlier task. Accept the extra spot-launch cost.

Origin: the Saturday SF Backtester state ran spot_backtest.sh --skip-stages=evaluator = backtest (~121 min, 10y simulate + param sweep) then parity on one spot. Every parity recovery re-paid the 121-min backtest.

SF-wiring-only — no backtester-repo change

spot_backtest.sh's --skip-stages already supports backtest/parity/evaluator independently (validated stage vocabulary _KNOWN_STAGES="backtest parity evaluator"). This PR only re-wires the Step Function — no backtester-repo change is needed. Evaluator was already its own state (split 2026-05-07) and is untouched.

Naming: lower-churn option chosen

Kept the existing Backtester state name for the backtest-stage state (its SSM command flips --skip-stages=evaluator → --skip-stages=parity,evaluator so it runs ONLY the backtest stage) and added a new Parity quartet after it. Keeping Backtester avoids rewiring DriftDetection's two Next/Catch edges and all inbound references to CheckSkipBacktester (renaming to Backtest would have touched 3+ unrelated states). The task spec explicitly permits this option.

Changes

infrastructure/step_function.json:

Backtester: SSM command --skip-stages=evaluator → --skip-stages=parity,evaluator (backtest stage only); Comment updated. CheckBacktesterStatus Success: CheckSkipEvaluator → CheckSkipParity.
New Parity quartet inserted after BacktesterWait, before CheckSkipEvaluator: CheckSkipParity / Parity / WaitForParity / CheckParityStatus / ParityWait + ExtractParityError. Parity runs --skip-stages=backtest,evaluator. Mirrors the Backtester quartet exactly — same Retry/Catch/HeartbeatSeconds/TimeoutSeconds (7260) / executionTimeout (7200, copied from the old combined Backtester — not under-sized) / HandleFailure-Catch / skip-Choice patterns. Parity success → CheckSkipEvaluator (existing downstream chain UNCHANGED).
CheckSkipBacktester: {"skip_backtester": true} retains its original whole-pair semantics (skip BOTH backtest and parity → CheckSkipEvaluator). New {"skip_parity": true} on CheckSkipParity skips ONLY parity.
Evaluator / MorningEnrich (P0) / DataPhase1 / all other states untouched.

New chain:
... → CheckSkipBacktester → Backtester (--skip-stages=parity,evaluator) → WaitForBacktester → CheckBacktesterStatus(Success) → CheckSkipParity → Parity (--skip-stages=backtest,evaluator) → WaitForParity → CheckParityStatus(Success) → CheckSkipEvaluator → … (unchanged)

Tests:

tests/test_sf_backtest_parity_split_wiring.py: +41 tests mirroring test_sf_morning_enrich_split_wiring.py conventions — quartet presence, chain ordering, happy-path reachability (Backtester < Parity < Evaluator), skip-stages argument correctness per state, no combined --skip-stages=evaluator anywhere, budget parity (Parity == Backtester), HandleFailure Catch on both, result-path isolation, lower-churn naming pinned.
tests/test_sf_eval_judge_wiring.py: updated TestBacktesterTransition for the new transitive path (Backtester success → CheckSkipParity → … → CheckSkipEvaluator) — the eval-judge-reachability invariant stays pinned.

Validation

SF JSON parses; no dangling Next/Default/Catch.Next targets (full state-graph walk).
Full suite: 1135 passed, 1 skipped (post-P0 baseline 1094 passed; +41 new; zero new failures — the one eval-judge assertion my reroute invalidated was updated in place to the new transitive path).

DEPLOY IS HELD

DEPLOY IS HELD until the in-flight recovery Saturday SF run is green (proving #247/#248 end-to-end), and the SF redeploy must NOT happen while a recovery execution is live on the Saturday SF. No deploy script was run, no Step Function was redeployed, no SF was triggered.

🤖 Generated with Claude Code

…reflight task split P0 Standing rule (preflight-task-split-260516.md): every preflight-bearing action is its own SF task; a downstream failure must never re-run a completed upstream task. Accept the extra spot-launch cost. Origin: 2026-05-16 Saturday SF DataPhase1 ran spot_data_weekly.sh --data-only = morning-enrich (~28 min) THEN phase1 on one spot, with phase1's preflight buried 28 minutes behind a completed morning-enrich. Every phase1 recovery re-paid the 28-min morning-enrich. A fast-fail that fires 28 minutes deep is not a fast-fail. Changes: - spot_data_weekly.sh: add --morning-enrich-only / --phase1-only run modes (RUN_MODE morning-enrich-only / phase1-only). morning-enrich and phase1+prune are now independently gated by DO_MORNING_ENRICH / DO_PHASE1 derived from RUN_MODE. --data-only preserved (runs both) for manual/adhoc backward-compat. Per-mode MODE_LABEL feeds the spot-side S3 log key + heartbeat dimension so a morning-enrich-only run is not mislabeled data-phase1. Shared scaffolding (log capture, S3 EXIT-trap upload, watchdog, heartbeat) works for all three modes. - preflight.py: dedicated "morning_enrich" mode whose checks are the UNION of what _run_morning_enrich needs (AWS_REGION env, polygon + FRED secret presence + reachability probes, S3 bucket + writeable sentinel, ArcticDB libraries present). Deliberately NO ArcticDB- freshness check -- morning-enrich is part of what makes it fresh. weekly_collector.main() now maps --morning-enrich -> "morning_enrich" (was the dependency-blind "daily" which skipped polygon/FRED probes). - step_function.json: new MorningEnrich quartet (CheckSkipMorningEnrich / MorningEnrich / WaitForMorningEnrich / CheckMorningEnrichStatus + MorningEnrichWait + ExtractMorningEnrichError) inserted BEFORE DataPhase1, mirroring the RAGIngestion/DataPhase1 quartets exactly (same Retry/Catch/Heartbeat/Timeout/HandleFailure wiring + a skip_morning_enrich Choice). MorningEnrich runs --morning-enrich-only; DataPhase1 switched --data-only -> --phase1-only. Chain: InitializeInput -> CheckSkipMorningEnrich -> MorningEnrich -> ... -> CheckSkipDataPhase1 -> DataPhase1 -> (existing next, unchanged). All downstream states untouched. - tests: +44 tests across test_sf_morning_enrich_split_wiring.py, test_spot_data_weekly_run_modes.py, test_weekly_collector_preflight_mode_mapping.py, and extended test_preflight.py (morning_enrich mode: probes polygon+FRED, no arcticdb-freshness, fail-fast on missing secret). Full suite: 1094 passed, 1 skipped (clean-main baseline ~1050; +44 new). bash -n + SF JSON parse validated. DEPLOY HELD. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…it P1 Standing rule (preflight-task-split-260516.md): every preflight-bearing action is its own SF task; a downstream failure must never re-run a completed upstream task. Accept the extra spot-launch cost. Origin: the Saturday SF Backtester state ran `spot_backtest.sh --skip-stages=evaluator` = backtest (~121 min, 10y simulate + param sweep) THEN parity on one spot. Every parity recovery re-paid the 121-min backtest because backtest and parity are independent preflight-bearing actions bundled in one task. SF-WIRING-ONLY -- no backtester-repo change. spot_backtest.sh's --skip-stages already supports backtest/parity/evaluator independently (validated stage vocabulary _KNOWN_STAGES="backtest parity evaluator"). Naming: LOWER-CHURN option chosen -- the existing `Backtester` state name is KEPT for the backtest-stage state (its SSM command flips --skip-stages=evaluator -> --skip-stages=parity,evaluator so it runs ONLY the backtest stage); a NEW `Parity` quartet is added after it. Keeping `Backtester` avoids rewiring DriftDetection's two Next/Catch edges and all inbound references to CheckSkipBacktester (vs renaming to `Backtest`, which would touch 3+ unrelated states). Changes: - step_function.json: - Backtester: --skip-stages=evaluator -> --skip-stages=parity,evaluator (backtest stage only); Comment updated. CheckBacktesterStatus Success: CheckSkipEvaluator -> CheckSkipParity. - New Parity quartet inserted after BacktesterWait, before CheckSkipEvaluator: CheckSkipParity / Parity / WaitForParity / CheckParityStatus / ParityWait + ExtractParityError. Parity runs --skip-stages=backtest,evaluator. Mirrors the Backtester quartet exactly -- same Retry/Catch/HeartbeatSeconds/TimeoutSeconds/ HandleFailure-Catch/skip-Choice patterns; TimeoutSeconds 7260 / executionTimeout 7200 copied from the old combined Backtester (not under-sized). Parity success -> CheckSkipEvaluator (existing downstream chain UNCHANGED). - CheckSkipBacktester: {"skip_backtester": true} retains its original whole-pair semantics (skip BOTH backtest and parity -> route to CheckSkipEvaluator). New {"skip_parity": true} on CheckSkipParity skips ONLY parity. - Evaluator / MorningEnrich (P0) / DataPhase1 / all other states untouched. - tests/test_sf_backtest_parity_split_wiring.py: +41 tests mirroring test_sf_morning_enrich_split_wiring.py conventions (quartet presence, chain ordering, happy-path reachability Backtester<Parity<Evaluator, skip-stages argument correctness, no combined --skip-stages=evaluator anywhere, budget parity Parity==Backtester, HandleFailure Catch on both, result-path isolation, lower-churn naming pinned). - tests/test_sf_eval_judge_wiring.py: updated TestBacktesterTransition for the new transitive path (Backtester success -> CheckSkipParity -> ... -> CheckSkipEvaluator) -- the eval-judge-reachability invariant stays pinned. Full suite: 1135 passed, 1 skipped (post-P0 baseline 1094 passed; +41 new). SF JSON parses, no dangling Next/Default/Catch targets. DEPLOY HELD until the in-flight recovery Saturday SF run is green; SF redeploy must NOT happen while a recovery execution is live. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…e; rule shipped disabled) (#258) Foundational spine of ROADMAP "Scheduled Friday-PM 'shell run'" (P1, added 2026-05-16) — the *prevention* half of Saturday-SF reliability (the *containment* half, preflight-task-split, shipped 2026-05-16 in data #249/#250). Surfaces a Saturday-fatal bootstrap break ~11.5h before the unattended Sat 02:00 PT firing, inside an operator-awake Friday-evening fix window, instead of as a Saturday-morning-after lost-week incident. STRICT SUPERSET — shell_run absent/false ⇒ byte-identical to today's real Saturday run. Only two existing edges change, each routed through a new Choice whose Default is the pre-spine target: InitializeInput.Next: CheckSkipMorningEnrich -> CheckShellRun (Default -> CheckSkipMorningEnrich; unchanged for the real run) WaitForWeeklySubstrateHealthCheck.Next: NotifyComplete -> CheckShellRunNotify (Default -> NotifyComplete; the real Saturday SUCCESS email is untouched) shell_run propagation (mirrors the existing skip_*/JsonMerge precedent exactly — no new mechanism invented): CheckShellRun (Choice): {"shell_run": true} -> ApplyShellRunDefaults ApplyShellRunDefaults (Pass): States.JsonMerge(<all 16 skip_*=true>, $, false) layers every skip flag = true UNDER the execution input so an explicit per-flag override still wins (e.g. {"shell_run":true,"skip_research":false} still runs Research). Every workload state already has a Choice-gated skip_*, so the whole workload no-ops via the EXISTING skip mechanism. Per-state dry-vs-skip inventory under shell_run (spine = pure-skip; per-module --preflight-only/--dry-run "spots boot + smoke" are SCOPED FOLLOW-ONS): SKIPPED via existing skip_* gate (16): MorningEnrich, DataPhase1, RAGIngestion, RegimeSubstrate, RegimeRetrospectiveEval, Research, DataPhase2, EvalJudge(+RollingMean), RationaleClustering, ReplayConcordance, Counterfactual, PredictorTraining, DriftDetection, Backtester, Parity, Evaluator STILL RUNS (read-only, no skip gate by design — exactly the bootstrap/ transport smoke the shell run wants Friday PM): SaturdayHealthCheck, WeeklySubstrateHealthCheck. Their shell_run-aware missing-Friday-bar tolerance is ROADMAP owed-work item 5 (scoped follow-on). NOTIFY: NotifyShellRunComplete (shell-run-tagged Subject, reuses the exact NotifyComplete SNS substrate — alpha-engine-alerts topic, same Resource). Friday EventBridge rule (CFN, the documented infra-as-code home for EventBridge rules — SaturdayTrigger/WeekdayTrigger live there): FridayShellRunTrigger, cron(30 21 ? * FRI *) = 21:30 UTC Fri = 14:30 PT (PDT, dominant season) / 13:30 PT (PST). Chosen AFTER the Friday EOD SF (~1:25 PT) so it never collides with PostMarketData/EODReconcile/ StopTradingInstance on the trading instance, and ~11.5h BEFORE the real Sat 09:00 UTC firing. Targets the SAME alpha-engine-saturday-pipeline SF (NOT a parallel SF) with {"shell_run": true}, same EventBridgeSfnRoleArn — the existing states:StartExecution grant is SF-ARN-scoped so NO IAM change is needed. SHIPPED State: DISABLED — zero-risk merge. Additive observability, NOT a backstop (the "fail loud, no backstop" design decision stands). Operator enable step: aws events enable-rule --name alpha-engine-friday-shell-run --region us-east-1 Consolidated-notify decision: shell-run SUCCESS is delivered by reusing the existing NotifyComplete SNS pattern with a SHELL RUN-tagged Subject (zero new infra). A shell-run FAILURE reuses the unchanged HandleFailure (its 20 inbound error edges deliberately NOT re-pointed: high churn, zero added operator value, and would perturb the real Saturday failure path's risk surface — the FAILED alert's Friday execution timestamp/ID is the actionable signal). The richer per-state pass/fail report (ROADMAP design point 5) is a scoped follow-on. Scoped per-module follow-on PRs (repo -> state -> dry mode needed; NOT done here — these convert "skipped" to "spots boot + smoke"): alpha-engine-data -> DataPhase1/MorningEnrich -> spot_data_weekly.sh --preflight-only (preflight + universe-freshness scan, no polygon/FMP writes); shell_run-aware tolerance for "Friday bar not yet present" alpha-engine-data -> RAGIngestion -> spot_data_weekly.sh --rag-only --preflight-only (corpus reachability + secrets, no SEC/embedding writes) alpha-engine-predictor -> PredictorTraining -> spot_train.sh --preflight-only (load + WF-gate-shape check, NO predictor/weights/ promotion) alpha-engine-backtester -> Backtester/Parity/Evaluator -> spot_backtest.sh --mode=smoke + simulate-dry, NO config/*.json auto-apply (freeze_evaluator pattern is the model) alpha-engine-data -> SaturdayHealthCheck/WeeklySubstrateHealthCheck -> shell_run-aware missing-Friday-bar tolerance (ROADMAP owed-work item 5) (Research/predictor-inference/executor already have --dry-run/--simulate; wiring those into the SF states is part of the per-state follow-ons above.) Tests: tests/test_sf_friday_shell_run_wiring.py (23 cases — strict-superset edges, JsonMerge user-input-wins order, every skip-gate covered by the defaults blob, full happy-path traversal for shell_run true vs absent, Friday rule DISABLED + same-SF + shell_run=true + cron). Updated two pre-spine wiring tests (morning_enrich_split, substrate_check) to assert through the new gates while pinning Default == pre-spine target. Full suite: 1242 passed, 1 skipped (pre-existing, unrelated). No new pip deps. No secrets. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cipher813 and others added 2 commits May 16, 2026 07:12

Base automatically changed from feat/split-dataphase1-morningenrich to main May 16, 2026 14:53

Merge branch 'main' into feat/split-backtester-parity

cd67002

cipher813 merged commit 117f8b3 into main May 16, 2026
1 check passed

cipher813 deleted the feat/split-backtester-parity branch May 16, 2026 15:25

cipher813 mentioned this pull request May 18, 2026

feat(sf): Friday-PM shell_run dry-pass of the Saturday pipeline (spine; rule shipped disabled) #258

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sf): split Backtester → Backtest + Parity — preflight task split P1#250

feat(sf): split Backtester → Backtest + Parity — preflight task split P1#250
cipher813 merged 3 commits into
mainfrom
feat/split-backtester-parity

cipher813 commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cipher813 commented May 16, 2026

Stacked PR — merge AFTER #249

Rule

SF-wiring-only — no backtester-repo change

Naming: lower-churn option chosen

Changes

Validation

DEPLOY IS HELD

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant