Skip to content

feat(sf): split Backtester → Backtest + Parity — preflight task split P1#250

Merged
cipher813 merged 3 commits into
mainfrom
feat/split-backtester-parity
May 16, 2026
Merged

feat(sf): split Backtester → Backtest + Parity — preflight task split P1#250
cipher813 merged 3 commits into
mainfrom
feat/split-backtester-parity

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Stacked PR — merge AFTER #249

This PR STACKS ON the P0 branch feat/split-dataphase1-morningenrich (PR #249). Base is the P0 branch, NOT main — both PRs edit infrastructure/step_function.json and stacking avoids a merge conflict. Merge #249 first, then this PR (GitHub will auto-retarget this PR to main once #249 merges).

Rule

Per alpha-engine-docs/private/preflight-task-split-260516.md §5 (standing rule, stated twice by Brian with explicit irritation): every Step Function state that bundles more than one independent preflight-bearing action is split so each is its own SF task, and a failure in a later task never re-runs a completed earlier task. Accept the extra spot-launch cost.

Origin: the Saturday SF Backtester state ran spot_backtest.sh --skip-stages=evaluator = backtest (~121 min, 10y simulate + param sweep) then parity on one spot. Every parity recovery re-paid the 121-min backtest.

SF-wiring-only — no backtester-repo change

spot_backtest.sh's --skip-stages already supports backtest/parity/evaluator independently (validated stage vocabulary _KNOWN_STAGES="backtest parity evaluator"). This PR only re-wires the Step Function — no backtester-repo change is needed. Evaluator was already its own state (split 2026-05-07) and is untouched.

Naming: lower-churn option chosen

Kept the existing Backtester state name for the backtest-stage state (its SSM command flips --skip-stages=evaluator--skip-stages=parity,evaluator so it runs ONLY the backtest stage) and added a new Parity quartet after it. Keeping Backtester avoids rewiring DriftDetection's two Next/Catch edges and all inbound references to CheckSkipBacktester (renaming to Backtest would have touched 3+ unrelated states). The task spec explicitly permits this option.

Changes

infrastructure/step_function.json:

  • Backtester: SSM command --skip-stages=evaluator--skip-stages=parity,evaluator (backtest stage only); Comment updated. CheckBacktesterStatus Success: CheckSkipEvaluatorCheckSkipParity.
  • New Parity quartet inserted after BacktesterWait, before CheckSkipEvaluator: CheckSkipParity / Parity / WaitForParity / CheckParityStatus / ParityWait + ExtractParityError. Parity runs --skip-stages=backtest,evaluator. Mirrors the Backtester quartet exactly — same Retry/Catch/HeartbeatSeconds/TimeoutSeconds (7260) / executionTimeout (7200, copied from the old combined Backtester — not under-sized) / HandleFailure-Catch / skip-Choice patterns. Parity success → CheckSkipEvaluator (existing downstream chain UNCHANGED).
  • CheckSkipBacktester: {"skip_backtester": true} retains its original whole-pair semantics (skip BOTH backtest and parity → CheckSkipEvaluator). New {"skip_parity": true} on CheckSkipParity skips ONLY parity.
  • Evaluator / MorningEnrich (P0) / DataPhase1 / all other states untouched.

New chain:
... → CheckSkipBacktester → Backtester (--skip-stages=parity,evaluator) → WaitForBacktester → CheckBacktesterStatus(Success) → CheckSkipParity → Parity (--skip-stages=backtest,evaluator) → WaitForParity → CheckParityStatus(Success) → CheckSkipEvaluator → … (unchanged)

Tests:

  • tests/test_sf_backtest_parity_split_wiring.py: +41 tests mirroring test_sf_morning_enrich_split_wiring.py conventions — quartet presence, chain ordering, happy-path reachability (Backtester < Parity < Evaluator), skip-stages argument correctness per state, no combined --skip-stages=evaluator anywhere, budget parity (Parity == Backtester), HandleFailure Catch on both, result-path isolation, lower-churn naming pinned.
  • tests/test_sf_eval_judge_wiring.py: updated TestBacktesterTransition for the new transitive path (Backtester success → CheckSkipParity → … → CheckSkipEvaluator) — the eval-judge-reachability invariant stays pinned.

Validation

  • SF JSON parses; no dangling Next/Default/Catch.Next targets (full state-graph walk).
  • Full suite: 1135 passed, 1 skipped (post-P0 baseline 1094 passed; +41 new; zero new failures — the one eval-judge assertion my reroute invalidated was updated in place to the new transitive path).

DEPLOY IS HELD

DEPLOY IS HELD until the in-flight recovery Saturday SF run is green (proving #247/#248 end-to-end), and the SF redeploy must NOT happen while a recovery execution is live on the Saturday SF. No deploy script was run, no Step Function was redeployed, no SF was triggered.

🤖 Generated with Claude Code

cipher813 and others added 2 commits May 16, 2026 07:12
…reflight task split P0

Standing rule (preflight-task-split-260516.md): every preflight-bearing
action is its own SF task; a downstream failure must never re-run a
completed upstream task. Accept the extra spot-launch cost.

Origin: 2026-05-16 Saturday SF DataPhase1 ran spot_data_weekly.sh
--data-only = morning-enrich (~28 min) THEN phase1 on one spot, with
phase1's preflight buried 28 minutes behind a completed morning-enrich.
Every phase1 recovery re-paid the 28-min morning-enrich. A fast-fail
that fires 28 minutes deep is not a fast-fail.

Changes:
- spot_data_weekly.sh: add --morning-enrich-only / --phase1-only run
  modes (RUN_MODE morning-enrich-only / phase1-only). morning-enrich and
  phase1+prune are now independently gated by DO_MORNING_ENRICH /
  DO_PHASE1 derived from RUN_MODE. --data-only preserved (runs both) for
  manual/adhoc backward-compat. Per-mode MODE_LABEL feeds the spot-side
  S3 log key + heartbeat dimension so a morning-enrich-only run is not
  mislabeled data-phase1. Shared scaffolding (log capture, S3 EXIT-trap
  upload, watchdog, heartbeat) works for all three modes.
- preflight.py: dedicated "morning_enrich" mode whose checks are the
  UNION of what _run_morning_enrich needs (AWS_REGION env, polygon +
  FRED secret presence + reachability probes, S3 bucket + writeable
  sentinel, ArcticDB libraries present). Deliberately NO ArcticDB-
  freshness check -- morning-enrich is part of what makes it fresh.
  weekly_collector.main() now maps --morning-enrich -> "morning_enrich"
  (was the dependency-blind "daily" which skipped polygon/FRED probes).
- step_function.json: new MorningEnrich quartet (CheckSkipMorningEnrich
  / MorningEnrich / WaitForMorningEnrich / CheckMorningEnrichStatus +
  MorningEnrichWait + ExtractMorningEnrichError) inserted BEFORE
  DataPhase1, mirroring the RAGIngestion/DataPhase1 quartets exactly
  (same Retry/Catch/Heartbeat/Timeout/HandleFailure wiring + a
  skip_morning_enrich Choice). MorningEnrich runs
  --morning-enrich-only; DataPhase1 switched --data-only ->
  --phase1-only. Chain: InitializeInput -> CheckSkipMorningEnrich ->
  MorningEnrich -> ... -> CheckSkipDataPhase1 -> DataPhase1 ->
  (existing next, unchanged). All downstream states untouched.
- tests: +44 tests across test_sf_morning_enrich_split_wiring.py,
  test_spot_data_weekly_run_modes.py,
  test_weekly_collector_preflight_mode_mapping.py, and extended
  test_preflight.py (morning_enrich mode: probes polygon+FRED, no
  arcticdb-freshness, fail-fast on missing secret).

Full suite: 1094 passed, 1 skipped (clean-main baseline ~1050; +44 new).
bash -n + SF JSON parse validated. DEPLOY HELD.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…it P1

Standing rule (preflight-task-split-260516.md): every preflight-bearing
action is its own SF task; a downstream failure must never re-run a
completed upstream task. Accept the extra spot-launch cost.

Origin: the Saturday SF Backtester state ran
`spot_backtest.sh --skip-stages=evaluator` = backtest (~121 min, 10y
simulate + param sweep) THEN parity on one spot. Every parity recovery
re-paid the 121-min backtest because backtest and parity are independent
preflight-bearing actions bundled in one task.

SF-WIRING-ONLY -- no backtester-repo change. spot_backtest.sh's
--skip-stages already supports backtest/parity/evaluator independently
(validated stage vocabulary _KNOWN_STAGES="backtest parity evaluator").

Naming: LOWER-CHURN option chosen -- the existing `Backtester` state
name is KEPT for the backtest-stage state (its SSM command flips
--skip-stages=evaluator -> --skip-stages=parity,evaluator so it runs ONLY
the backtest stage); a NEW `Parity` quartet is added after it. Keeping
`Backtester` avoids rewiring DriftDetection's two Next/Catch edges and
all inbound references to CheckSkipBacktester (vs renaming to `Backtest`,
which would touch 3+ unrelated states).

Changes:
- step_function.json:
  - Backtester: --skip-stages=evaluator -> --skip-stages=parity,evaluator
    (backtest stage only); Comment updated. CheckBacktesterStatus
    Success: CheckSkipEvaluator -> CheckSkipParity.
  - New Parity quartet inserted after BacktesterWait, before
    CheckSkipEvaluator: CheckSkipParity / Parity / WaitForParity /
    CheckParityStatus / ParityWait + ExtractParityError. Parity runs
    --skip-stages=backtest,evaluator. Mirrors the Backtester quartet
    exactly -- same Retry/Catch/HeartbeatSeconds/TimeoutSeconds/
    HandleFailure-Catch/skip-Choice patterns; TimeoutSeconds 7260 /
    executionTimeout 7200 copied from the old combined Backtester (not
    under-sized). Parity success -> CheckSkipEvaluator (existing
    downstream chain UNCHANGED).
  - CheckSkipBacktester: {"skip_backtester": true} retains its original
    whole-pair semantics (skip BOTH backtest and parity -> route to
    CheckSkipEvaluator). New {"skip_parity": true} on CheckSkipParity
    skips ONLY parity.
  - Evaluator / MorningEnrich (P0) / DataPhase1 / all other states
    untouched.
- tests/test_sf_backtest_parity_split_wiring.py: +41 tests mirroring
  test_sf_morning_enrich_split_wiring.py conventions (quartet presence,
  chain ordering, happy-path reachability Backtester<Parity<Evaluator,
  skip-stages argument correctness, no combined --skip-stages=evaluator
  anywhere, budget parity Parity==Backtester, HandleFailure Catch on
  both, result-path isolation, lower-churn naming pinned).
- tests/test_sf_eval_judge_wiring.py: updated TestBacktesterTransition
  for the new transitive path (Backtester success -> CheckSkipParity ->
  ... -> CheckSkipEvaluator) -- the eval-judge-reachability invariant
  stays pinned.

Full suite: 1135 passed, 1 skipped (post-P0 baseline 1094 passed; +41
new). SF JSON parses, no dangling Next/Default/Catch targets. DEPLOY
HELD until the in-flight recovery Saturday SF run is green; SF redeploy
must NOT happen while a recovery execution is live.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Base automatically changed from feat/split-dataphase1-morningenrich to main May 16, 2026 14:53
@cipher813 cipher813 merged commit 117f8b3 into main May 16, 2026
1 check passed
@cipher813 cipher813 deleted the feat/split-backtester-parity branch May 16, 2026 15:25
cipher813 added a commit that referenced this pull request May 18, 2026
…e; rule shipped disabled) (#258)

Foundational spine of ROADMAP "Scheduled Friday-PM 'shell run'" (P1, added
2026-05-16) — the *prevention* half of Saturday-SF reliability (the
*containment* half, preflight-task-split, shipped 2026-05-16 in data
#249/#250). Surfaces a Saturday-fatal bootstrap break ~11.5h before the
unattended Sat 02:00 PT firing, inside an operator-awake Friday-evening fix
window, instead of as a Saturday-morning-after lost-week incident.

STRICT SUPERSET — shell_run absent/false ⇒ byte-identical to today's real
Saturday run. Only two existing edges change, each routed through a new
Choice whose Default is the pre-spine target:
  InitializeInput.Next: CheckSkipMorningEnrich -> CheckShellRun
    (Default -> CheckSkipMorningEnrich; unchanged for the real run)
  WaitForWeeklySubstrateHealthCheck.Next: NotifyComplete -> CheckShellRunNotify
    (Default -> NotifyComplete; the real Saturday SUCCESS email is untouched)

shell_run propagation (mirrors the existing skip_*/JsonMerge precedent
exactly — no new mechanism invented):
  CheckShellRun (Choice): {"shell_run": true} -> ApplyShellRunDefaults
  ApplyShellRunDefaults (Pass): States.JsonMerge(<all 16 skip_*=true>, $, false)
    layers every skip flag = true UNDER the execution input so an explicit
    per-flag override still wins (e.g. {"shell_run":true,"skip_research":false}
    still runs Research). Every workload state already has a Choice-gated
    skip_*, so the whole workload no-ops via the EXISTING skip mechanism.

Per-state dry-vs-skip inventory under shell_run (spine = pure-skip; per-module
--preflight-only/--dry-run "spots boot + smoke" are SCOPED FOLLOW-ONS):
  SKIPPED via existing skip_* gate (16): MorningEnrich, DataPhase1,
    RAGIngestion, RegimeSubstrate, RegimeRetrospectiveEval, Research,
    DataPhase2, EvalJudge(+RollingMean), RationaleClustering,
    ReplayConcordance, Counterfactual, PredictorTraining, DriftDetection,
    Backtester, Parity, Evaluator
  STILL RUNS (read-only, no skip gate by design — exactly the bootstrap/
    transport smoke the shell run wants Friday PM): SaturdayHealthCheck,
    WeeklySubstrateHealthCheck. Their shell_run-aware missing-Friday-bar
    tolerance is ROADMAP owed-work item 5 (scoped follow-on).
  NOTIFY: NotifyShellRunComplete (shell-run-tagged Subject, reuses the exact
    NotifyComplete SNS substrate — alpha-engine-alerts topic, same Resource).

Friday EventBridge rule (CFN, the documented infra-as-code home for
EventBridge rules — SaturdayTrigger/WeekdayTrigger live there):
  FridayShellRunTrigger, cron(30 21 ? * FRI *) = 21:30 UTC Fri =
  14:30 PT (PDT, dominant season) / 13:30 PT (PST). Chosen AFTER the Friday
  EOD SF (~1:25 PT) so it never collides with PostMarketData/EODReconcile/
  StopTradingInstance on the trading instance, and ~11.5h BEFORE the real
  Sat 09:00 UTC firing. Targets the SAME alpha-engine-saturday-pipeline SF
  (NOT a parallel SF) with {"shell_run": true}, same EventBridgeSfnRoleArn —
  the existing states:StartExecution grant is SF-ARN-scoped so NO IAM change
  is needed.
  SHIPPED State: DISABLED — zero-risk merge. Additive observability, NOT a
  backstop (the "fail loud, no backstop" design decision stands).
  Operator enable step:
    aws events enable-rule --name alpha-engine-friday-shell-run --region us-east-1

Consolidated-notify decision: shell-run SUCCESS is delivered by reusing the
existing NotifyComplete SNS pattern with a SHELL RUN-tagged Subject (zero new
infra). A shell-run FAILURE reuses the unchanged HandleFailure (its 20
inbound error edges deliberately NOT re-pointed: high churn, zero added
operator value, and would perturb the real Saturday failure path's risk
surface — the FAILED alert's Friday execution timestamp/ID is the actionable
signal). The richer per-state pass/fail report (ROADMAP design point 5) is a
scoped follow-on.

Scoped per-module follow-on PRs (repo -> state -> dry mode needed; NOT done
here — these convert "skipped" to "spots boot + smoke"):
  alpha-engine-data -> DataPhase1/MorningEnrich -> spot_data_weekly.sh
    --preflight-only (preflight + universe-freshness scan, no polygon/FMP
    writes); shell_run-aware tolerance for "Friday bar not yet present"
  alpha-engine-data -> RAGIngestion -> spot_data_weekly.sh --rag-only
    --preflight-only (corpus reachability + secrets, no SEC/embedding writes)
  alpha-engine-predictor -> PredictorTraining -> spot_train.sh --preflight-only
    (load + WF-gate-shape check, NO predictor/weights/ promotion)
  alpha-engine-backtester -> Backtester/Parity/Evaluator -> spot_backtest.sh
    --mode=smoke + simulate-dry, NO config/*.json auto-apply
    (freeze_evaluator pattern is the model)
  alpha-engine-data -> SaturdayHealthCheck/WeeklySubstrateHealthCheck ->
    shell_run-aware missing-Friday-bar tolerance (ROADMAP owed-work item 5)
  (Research/predictor-inference/executor already have --dry-run/--simulate;
   wiring those into the SF states is part of the per-state follow-ons above.)

Tests: tests/test_sf_friday_shell_run_wiring.py (23 cases — strict-superset
edges, JsonMerge user-input-wins order, every skip-gate covered by the
defaults blob, full happy-path traversal for shell_run true vs absent,
Friday rule DISABLED + same-SF + shell_run=true + cron). Updated two
pre-spine wiring tests (morning_enrich_split, substrate_check) to assert
through the new gates while pinning Default == pre-spine target. Full suite:
1242 passed, 1 skipped (pre-existing, unrelated). No new pip deps. No secrets.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant