feat(sf): shell-run keystone — spot --preflight-only + Lambda --dry-run instead of skip by cipher813 · Pull Request #260 · cipher813/alpha-engine-data

cipher813 · 2026-05-18T21:20:58Z

What

ROADMAP "Friday shell-run — per-module dry-path activation" (owed-item #4). Converts #258's pure-skip shell_run into actual boot + dry execution of the Saturday SF. One file: infrastructure/step_function.json (+ test updates).

ApplyShellRunDefaults no longer force-sets all 16 skip_*=true. It now sets a single preflight_args=" --preflight-only" suffix var (driving the 7 spot states' States.Format command), Lambda dry flags for the 4 verified-clean Lambdas, and hard-skips ONLY the 5 documented no-clean-dry-path exceptions. InitializeInput seeds the control vars at non-dry identity values.

Absolute invariant (preserved + test-proven)

shell_run absent OR false ⇒ the Saturday SF is byte-identical (spot commands) / behaviourally identical (Lambda payloads) to today's real run. The real Sat 02:00 PT firing passes no shell_run.

Byte-identical templating approach: preflight_args carries its leading space inside the variable (" --preflight-only" under shell_run, "" on the real run). The spot states' final command is a States.Format(...) whose {} is placed immediately after the mode token with NO literal space, e.g. 'bash infrastructure/spot_data_weekly.sh --morning-enrich-only{} 2>&1 | tee ...'. With preflight_args="" the formatted string is char-for-char the pre-change literal (proven: TestByteIdenticalAbsentPath resolves the States.Array/States.Format intrinsics for all 7 spot states at preflight_args="" and asserts equality against origin/main; under " --preflight-only" it asserts exactly one separating space + no double-space).

InitializeInput's nested JsonMerge is structurally unchanged (still 2-level, user input wins) — only 4 identity keys added to the innermost defaults blob.

State classification table

State	Spot/Lambda	Dry mechanism under shell_run
MorningEnrich	Spot	`spot_data_weekly.sh --morning-enrich-only --preflight-only`
DataPhase1	Spot	`spot_data_weekly.sh --phase1-only --preflight-only`
RAGIngestion	Spot	`spot_data_weekly.sh --rag-only --preflight-only`
PredictorTraining	Spot	`spot_train.sh --full-only --preflight-only` (last-flag-wins → MODE=preflight-only)
Backtester	Spot	`spot_backtest.sh --skip-stages=parity,evaluator --preflight-only`
Parity	Spot	`spot_backtest.sh --skip-stages=backtest,evaluator --preflight-only`
Evaluator	Spot	`spot_backtest.sh --skip-stages=backtest,parity --preflight-only`
Research	Lambda	`dry_run_llm=true` (via `$.research_dry`)
DataPhase2	Lambda	`dry_run=true` (via `$.data_phase2_dry`)
RegimeSubstrate	Lambda	`action=dry_run` (via `$.regime_action`)
RegimeRetrospectiveEval	Lambda	`action=dry_run` (via `$.regime_action`)
DriftDetection	Spot	KEPT SKIPPED — `spot_drift_detection.sh` has NO `--preflight-only` flag
EvalJudge chain	Lambda	KEPT SKIPPED — submit handler always `_persist_client_side_skips` + Anthropic Batch create; no handler-level dry param
RationaleClustering	Lambda	KEPT SKIPPED — `_persist_analysis` S3 `put_object` is NOT gated by `dry_run` (only the CloudWatch metric is)
ReplayConcordance	Lambda	KEPT SKIPPED — `alpha-engine-replay-concordance` handler source not present in any cloned repo; cannot verify a clean dry path
Counterfactual	Lambda	KEPT SKIPPED — `alpha-engine-replay-counterfactual` handler source not present in any cloned repo; cannot verify
SaturdayHealthCheck / WeeklySubstrateHealthCheck	Spot	Already run under #258 — left as-is (bootstrap smoke)

Per-Lambda dry-path verification evidence

Research (alpha-engine-research-runner) — VERIFIED. lambda/handler.py dry_run_llm=True rebuilds the graph after install_dry_run_stubs; the post-feat(morning-enrich): drift alarm for chronic_polygon_gaps allowlist #195 dry_run.py no-ops archive_writer, email_sender, upload_db, write_signals_json, save_sector_team_run, save_agent_run. No S3/email/DB write.
DataPhase2 (alpha-engine-data-collector, phase 2) — VERIFIED. lambda/handler.py health-marker write gated if not dry_run; collectors/alternative.collect(dry_run=True) returns ok_dry_run BEFORE any _fetch_all_alternative (external API) or S3 write.
RegimeSubstrate / RegimeRetrospectiveEval (alpha-engine-predictor-regime-*) — VERIFIED. action="dry_run" → produce_*(write=False) returns {"payload":…, "wrote": False} before any put_object; reads macro history + fits HMM in-memory only.
EvalJudge chain — NOT VERIFIED → KEPT SKIPPED. lambda/eval_judge_submit_handler.py always calls _persist_client_side_skips (S3) + submit_batch (Anthropic Batch create + S3 plan). The docstring's "dry_run smoke" is scripts/smoke_eval_judge.py, not a handler param.
RationaleClustering — NOT VERIFIED → KEPT SKIPPED. dry_run=True only sets emit_metrics=False, but compute_and_emit calls _persist_analysis (S3 put_object) unconditionally; only the CloudWatch metric is gated.
ReplayConcordance / Counterfactual — NOT VERIFIED → KEPT SKIPPED. Handler source for alpha-engine-replay-concordance/-counterfactual is not in any cloned repo (research/predictor/data/executor); routing to an unverified dry path is forbidden by the task discipline.

Owed-item #5 finding (universe-freshness shell_run-aware tolerance)

Non-issue for the SF; not in this PR. SaturdayHealthCheck + WeeklySubstrateHealthCheck are non-blocking (Catch: States.ALL → NotifyComplete). health_checker.py does sys.exit(1 if failures else 0), so under shell_run (spots ran preflight-only → no Friday data refresh) a stale/missing bar makes the SSM command exit 1 — but the non-blocking Catch absorbs it, so the SF does not spuriously fail. The only artifact is a clearly-Friday-timestamped alert email (cosmetic). The task says add tolerance ONLY if a real SF-fatal spurious-fail path exists; it does not. A --shell-run-aware staleness tolerance in alpha-engine-dashboard/health_checker.py is a scoped cross-repo follow-on, deliberately out of this single-file SF PR.

Tests

tests/test_sf_friday_shell_run_wiring.py extended (mirrors #258's; 43 tests in-file):

TestByteIdenticalAbsentPath — ASL-intrinsic resolver; per-spot byte-identical proof at preflight_args="" vs origin/main; --preflight-only correctness at shell_run; Lambda payload $.var routing.
TestApplyShellRunDefaults — dry control vars set; leading-space invariant; skip-set == the 5 documented exceptions ONLY; verified-clean states NOT skipped; InitializeInput identity seeding; feat(sf): Friday-PM shell_run dry-pass of the Saturday pipeline (spine; rule shipped disabled) #258 skip gates left intact.
TestHappyPathTraversal — shell_run=true visits the dry workloads + skips only exceptions + reaches NotifyShellRunComplete; absent path is pre-keystone.

Also fixed 6 pre-existing tests in 2 other SF test files that asserted the old structure (literal commands array / hardcoded action: produce) — now keystone-aware via the shared extract_commands helper / $.regime_action routing.

Results: full pytest tests/ → 1337 passed, 1 skipped (skip + pandas FutureWarnings pre-existing/unrelated). infrastructure/step_function.json parses (58 states). Byte-identical proof: all 7 spot states BYTE-IDENTICAL on the absent path.

🤖 Generated with Claude Code

…un instead of skip Converts #258's pure-skip shell_run into actual boot+dry execution of the Saturday SF workload. ApplyShellRunDefaults no longer force-sets all 16 skip_* true; it now sets a single preflight_args=" --preflight-only" suffix var (driving the 7 spot states' States.Format command), Lambda dry flags for the 4 verified-clean Lambda states, and hard-skips ONLY the 5 documented no-clean-dry-path exceptions. InitializeInput seeds the control vars at non-dry identity values so the shell_run-absent path is byte-identical (spots) / behaviourally identical (Lambdas) to today's real Saturday run. Invariant preserved + test-proven: shell_run absent/false ⇒ every spot command string char-for-char unchanged (TestByteIdenticalAbsentPath resolves the States.Array/States.Format intrinsics with preflight_args="" and asserts equality against origin/main). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…in shallow PR checkout) The keystone byte-identical proof shelled out to `git show origin/main:infrastructure/step_function.json` at test time. GitHub Actions' shallow PR checkout has no `origin/main` local ref → `subprocess.CalledProcessError ... exit status 128` → `test` check failed. Replace the live-git `orig_sf` fixture with a committed frozen baseline `tests/fixtures/sf_prekeystone_spot_commands.json` (the RESOLVED pre-keystone spot command lists captured from origin/main; handles the states already on commands.$ — Backtester/Parity/Evaluator). The proof is now hermetic and still a true regression guard against the strict-superset invariant. Docstring documents deliberate-regeneration. Suite: 1337 passed, 1 skipped (unchanged). Keystone file 43/43. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n dry path — closes DriftDetection skip-exception) (#261) Adds a `--preflight-only` modifier to infrastructure/spot_drift_detection.sh, mirroring the merged #259 (spot_data_weekly.sh) / predictor #175 / backtester #224 pattern. Closes the DriftDetection skip-exception in ROADMAP "Friday shell-run — per-module dry-path activation" — the one per-module SF step still SKIPPED rather than dry-run on the Friday shell_run. Insertion point --------------- `PREFLIGHT_ONLY=0` modifier var initialised before the arg-parse loop (orthogonal to RUN_MODE, `set -u` safe); `--preflight-only) PREFLIGHT_ONLY=1` added to the case loop. The guard block is inserted AFTER the smoke-only block and strictly BEFORE the "# ── Full drift detection ──" section (the `run_remote bash -s <<DRIFT` heredoc) and before the trailing `aws cloudwatch put-metric-data` heartbeat. No-scan / no-write proof ------------------------ `monitoring.drift_detector` (in alpha-engine-predictor, on the sibling-clone PYTHONPATH) is the SOLE code path that does any S3 get_object/put_object of the drift report or SNS publish on alert; the launcher's CloudWatch put-metric-data heartbeat trails it. The PREFLIGHT_ONLY guard `exit 0`s strictly before the `<<DRIFT` heredoc, so the scan, the SNS publish, the S3 put_object, and the CloudWatch emit are all statically unreachable. The preflight itself runs only BasePreflight.check_env_vars (env read) + BasePreflight.check_s3_bucket (bucket HEAD) + an `importlib.import_module` of the drift module (import-only — boto3 clients + check_drift()/main() sit behind `if __name__ == "__main__"`, which an import does not trigger). Zero external API data fetch, zero S3/CW/SNS/config mutation; exit 0 because a passed preflight is a healthy outcome (SSM/SF report Success). Preflight substrate reused -------------------------- The drift workload binary lives in alpha-engine-predictor (no --preflight-only of its own; out of scope to modify here) and this repo's preflight.py DataPreflight modes (daily/morning_enrich/phase1/phase2) are data-collection scoped — none maps to drift. Per the canonical-lib fallback the preflight composes `alpha_engine_lib.preflight.BasePreflight` DIRECTLY (env-vars + S3 HEAD) — no bespoke preflight scaffolding duplicated. Verbatim flag name: `--preflight-only` Tests ----- New tests/test_spot_drift_detection_preflight_only.py (5 static greps/source-position assertions, mirroring tests/test_preflight_only_dry_path.py): flag parses as a modifier; guard precedes DRIFT + heartbeat; exit 0 before DRIFT; no scan/S3/CW/SNS in block; canonical BasePreflight reused (no scaffolding). `bash -n` clean. Full data suite: 1342 passed, 1 skipped (pre-existing), 5 pre-existing warnings. Independent of #260: that PR touches spot_data_weekly.sh + the Lambda dry-run keystone (a different file); the Saturday/Friday SF rewire to route the DriftDetection state at this `--preflight-only` flag under the Friday shell_run is a separate follow-on (no step_function.json change here). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ht + EvalJudge/Rationale/Replay/CF dry_run_llm) (#263) Closes the keystone gap: the 5 documented shell-run skip-exceptions are flipped skip→dry. Under shell_run EVERY substantive workload now boots + runs dry; ZERO skip-exceptions remain. All prerequisite dry flags were already MERGED on origin/main of their repos. Per-state mechanism: | State | Type | Mechanism (under shell_run) | |-----------------------------|--------|------------------------------------------------------| | DriftDetection | spot | commands.$ States.Format($.preflight_args) → ` --preflight-only` (data #261) | | EvalJudgeSubmitFirstSaturday| Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | EvalJudgeSubmitWeekly | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | EvalJudgePoll | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | EvalJudgeProcess | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | RationaleClustering | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | ReplayConcordance | Lambda | Payload "dry_run_llm.$": "$.research_dry" (backtester #225) | | Counterfactual | Lambda | Payload "dry_run_llm.$": "$.research_dry" (backtester #225) | Exact canonical dry var: $.research_dry. It is THE canonical shell-run LLM-dry signal — InitializeInput seeds it false on every run (so the absent path / real Sat 02:00 PT firing is unchanged); ApplyShellRunDefaults already sets it true under shell_run (it backed Research from the keystone). No new var invented — research #202 / backtester #225 PR bodies specify dry_run_llm, and reusing $.research_dry keeps the absent-path guarantee automatic (no extra seeding needed; the seed already exists). Changes: - ApplyShellRunDefaults: removed skip_drift_detection / skip_eval_judge / skip_rationale_clustering / skip_replay_concordance / skip_counterfactual from the force-set JsonMerge blob. It now force-sets ZERO skip_*. Per-flag user overrides still win (merge order unchanged). The Choice-gated CheckSkip<State> gates are LEFT INTACT (still valid for targeted operator skips — verified by test_skip_gates_still_intact). - DriftDetection: literal `commands` array → `commands.$` States.Array whose final entry is States.Format('bash infrastructure/ spot_drift_detection.sh{} 2>&1 | tee /var/log/drift-detection.log', $.preflight_args). {} sits immediately after the script token with no literal space; preflight_args carries its leading space inside the var, so preflight_args="" reproduces the origin/main command char-for-char and " --preflight-only" yields exactly one separating space. - 7 eval Lambda Payloads: added "dry_run_llm.$": "$.research_dry". EvalRollingMean (alpha-engine-research-eval-rolling-mean) was NOT touched — it has no skip gate, was never a keystone exception, and is a pure historical-metric reader (out of scope). Byte-identical proof approach: - shell_run absent ⇒ CheckShellRun.Default = CheckSkipMorningEnrich (unchanged); InitializeInput seeds preflight_args="", research_dry=false. Every spot States.Format resolves char-for-char to the frozen origin/main literal; every eval Lambda dry_run_llm.$ resolves to false (handlers default it false ⇒ behaviourally identical to pre-rewire). - The frozen baseline fixture tests/fixtures/sf_prekeystone_spot_commands .json now INCLUDES DriftDetection's pre-rewire origin/main literal command (regenerated via the established generator at preflight_args=""; the existing 7 entries are unchanged). The byte-identical test asserts DriftDetection's resolved command at preflight_args="" equals that frozen baseline and carries --preflight-only (single space) under shell_run. - CI-safe: tests read only the committed fixture (no `git show origin/main` shell-out — that was the #260 CI failure). Tests: - _SPOT_STATES grew to 8 (added DriftDetection); _DRY_LAMBDA_STATES grew to 11 (added the 7 eval states); _KEYSTONE_SKIP_EXCEPTIONS = empty set. - test_shell_defaults_force_set_ZERO_skip_exceptions asserts the blob force-sets no skip_* and none of the 16 workload skips (incl. the 5 ex-exceptions) appear. - TestHappyPathTraversal: under shell_run nothing is skipped (skipped == set()); DriftDetection is VISITED (runs dry), not jumped past. - Module + class docstrings updated to the rewire semantics. JSON valid (58 top-level states, 91 incl. parallel branches). Full alpha-engine-data suite: 1351 passed, 1 skipped, 0 failed. Zero skip-exceptions remain — every substantive task runs dry under shell_run (spots → --preflight-only, Lambdas → dry_run_llm). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cipher813 and others added 2 commits May 18, 2026 14:20

cipher813 merged commit 438d300 into main May 18, 2026
1 check passed

cipher813 deleted the feat/sf-shell-run-keystone branch May 18, 2026 21:29

cipher813 mentioned this pull request May 18, 2026

feat(data): spot_drift_detection.sh --preflight-only (Friday shell-run dry path — closes DriftDetection skip-exception) #261

Merged

cipher813 mentioned this pull request May 18, 2026

feat(sf): rewire last 5 skip-exceptions → dry (DriftDetection preflight + EvalJudge/Rationale/Replay/CF dry_run_llm) #263

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sf): shell-run keystone — spot --preflight-only + Lambda --dry-run instead of skip#260

feat(sf): shell-run keystone — spot --preflight-only + Lambda --dry-run instead of skip#260
cipher813 merged 2 commits into
mainfrom
feat/sf-shell-run-keystone

cipher813 commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cipher813 commented May 18, 2026

What

Absolute invariant (preserved + test-proven)

State classification table

Per-Lambda dry-path verification evidence

Owed-item #5 finding (universe-freshness shell_run-aware tolerance)

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant