feat(sf): shell-run keystone — spot --preflight-only + Lambda --dry-run instead of skip#260
Merged
Merged
Conversation
…un instead of skip Converts #258's pure-skip shell_run into actual boot+dry execution of the Saturday SF workload. ApplyShellRunDefaults no longer force-sets all 16 skip_* true; it now sets a single preflight_args=" --preflight-only" suffix var (driving the 7 spot states' States.Format command), Lambda dry flags for the 4 verified-clean Lambda states, and hard-skips ONLY the 5 documented no-clean-dry-path exceptions. InitializeInput seeds the control vars at non-dry identity values so the shell_run-absent path is byte-identical (spots) / behaviourally identical (Lambdas) to today's real Saturday run. Invariant preserved + test-proven: shell_run absent/false ⇒ every spot command string char-for-char unchanged (TestByteIdenticalAbsentPath resolves the States.Array/States.Format intrinsics with preflight_args="" and asserts equality against origin/main). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…in shallow PR checkout) The keystone byte-identical proof shelled out to `git show origin/main:infrastructure/step_function.json` at test time. GitHub Actions' shallow PR checkout has no `origin/main` local ref → `subprocess.CalledProcessError ... exit status 128` → `test` check failed. Replace the live-git `orig_sf` fixture with a committed frozen baseline `tests/fixtures/sf_prekeystone_spot_commands.json` (the RESOLVED pre-keystone spot command lists captured from origin/main; handles the states already on commands.$ — Backtester/Parity/Evaluator). The proof is now hermetic and still a true regression guard against the strict-superset invariant. Docstring documents deliberate-regeneration. Suite: 1337 passed, 1 skipped (unchanged). Keystone file 43/43. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813
added a commit
that referenced
this pull request
May 18, 2026
…n dry path — closes DriftDetection skip-exception) (#261) Adds a `--preflight-only` modifier to infrastructure/spot_drift_detection.sh, mirroring the merged #259 (spot_data_weekly.sh) / predictor #175 / backtester #224 pattern. Closes the DriftDetection skip-exception in ROADMAP "Friday shell-run — per-module dry-path activation" — the one per-module SF step still SKIPPED rather than dry-run on the Friday shell_run. Insertion point --------------- `PREFLIGHT_ONLY=0` modifier var initialised before the arg-parse loop (orthogonal to RUN_MODE, `set -u` safe); `--preflight-only) PREFLIGHT_ONLY=1` added to the case loop. The guard block is inserted AFTER the smoke-only block and strictly BEFORE the "# ── Full drift detection ──" section (the `run_remote bash -s <<DRIFT` heredoc) and before the trailing `aws cloudwatch put-metric-data` heartbeat. No-scan / no-write proof ------------------------ `monitoring.drift_detector` (in alpha-engine-predictor, on the sibling-clone PYTHONPATH) is the SOLE code path that does any S3 get_object/put_object of the drift report or SNS publish on alert; the launcher's CloudWatch put-metric-data heartbeat trails it. The PREFLIGHT_ONLY guard `exit 0`s strictly before the `<<DRIFT` heredoc, so the scan, the SNS publish, the S3 put_object, and the CloudWatch emit are all statically unreachable. The preflight itself runs only BasePreflight.check_env_vars (env read) + BasePreflight.check_s3_bucket (bucket HEAD) + an `importlib.import_module` of the drift module (import-only — boto3 clients + check_drift()/main() sit behind `if __name__ == "__main__"`, which an import does not trigger). Zero external API data fetch, zero S3/CW/SNS/config mutation; exit 0 because a passed preflight is a healthy outcome (SSM/SF report Success). Preflight substrate reused -------------------------- The drift workload binary lives in alpha-engine-predictor (no --preflight-only of its own; out of scope to modify here) and this repo's preflight.py DataPreflight modes (daily/morning_enrich/phase1/phase2) are data-collection scoped — none maps to drift. Per the canonical-lib fallback the preflight composes `alpha_engine_lib.preflight.BasePreflight` DIRECTLY (env-vars + S3 HEAD) — no bespoke preflight scaffolding duplicated. Verbatim flag name: `--preflight-only` Tests ----- New tests/test_spot_drift_detection_preflight_only.py (5 static greps/source-position assertions, mirroring tests/test_preflight_only_dry_path.py): flag parses as a modifier; guard precedes DRIFT + heartbeat; exit 0 before DRIFT; no scan/S3/CW/SNS in block; canonical BasePreflight reused (no scaffolding). `bash -n` clean. Full data suite: 1342 passed, 1 skipped (pre-existing), 5 pre-existing warnings. Independent of #260: that PR touches spot_data_weekly.sh + the Lambda dry-run keystone (a different file); the Saturday/Friday SF rewire to route the DriftDetection state at this `--preflight-only` flag under the Friday shell_run is a separate follow-on (no step_function.json change here). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813
added a commit
that referenced
this pull request
May 18, 2026
…ht + EvalJudge/Rationale/Replay/CF dry_run_llm) (#263) Closes the keystone gap: the 5 documented shell-run skip-exceptions are flipped skip→dry. Under shell_run EVERY substantive workload now boots + runs dry; ZERO skip-exceptions remain. All prerequisite dry flags were already MERGED on origin/main of their repos. Per-state mechanism: | State | Type | Mechanism (under shell_run) | |-----------------------------|--------|------------------------------------------------------| | DriftDetection | spot | commands.$ States.Format($.preflight_args) → ` --preflight-only` (data #261) | | EvalJudgeSubmitFirstSaturday| Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | EvalJudgeSubmitWeekly | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | EvalJudgePoll | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | EvalJudgeProcess | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | RationaleClustering | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | ReplayConcordance | Lambda | Payload "dry_run_llm.$": "$.research_dry" (backtester #225) | | Counterfactual | Lambda | Payload "dry_run_llm.$": "$.research_dry" (backtester #225) | Exact canonical dry var: $.research_dry. It is THE canonical shell-run LLM-dry signal — InitializeInput seeds it false on every run (so the absent path / real Sat 02:00 PT firing is unchanged); ApplyShellRunDefaults already sets it true under shell_run (it backed Research from the keystone). No new var invented — research #202 / backtester #225 PR bodies specify dry_run_llm, and reusing $.research_dry keeps the absent-path guarantee automatic (no extra seeding needed; the seed already exists). Changes: - ApplyShellRunDefaults: removed skip_drift_detection / skip_eval_judge / skip_rationale_clustering / skip_replay_concordance / skip_counterfactual from the force-set JsonMerge blob. It now force-sets ZERO skip_*. Per-flag user overrides still win (merge order unchanged). The Choice-gated CheckSkip<State> gates are LEFT INTACT (still valid for targeted operator skips — verified by test_skip_gates_still_intact). - DriftDetection: literal `commands` array → `commands.$` States.Array whose final entry is States.Format('bash infrastructure/ spot_drift_detection.sh{} 2>&1 | tee /var/log/drift-detection.log', $.preflight_args). {} sits immediately after the script token with no literal space; preflight_args carries its leading space inside the var, so preflight_args="" reproduces the origin/main command char-for-char and " --preflight-only" yields exactly one separating space. - 7 eval Lambda Payloads: added "dry_run_llm.$": "$.research_dry". EvalRollingMean (alpha-engine-research-eval-rolling-mean) was NOT touched — it has no skip gate, was never a keystone exception, and is a pure historical-metric reader (out of scope). Byte-identical proof approach: - shell_run absent ⇒ CheckShellRun.Default = CheckSkipMorningEnrich (unchanged); InitializeInput seeds preflight_args="", research_dry=false. Every spot States.Format resolves char-for-char to the frozen origin/main literal; every eval Lambda dry_run_llm.$ resolves to false (handlers default it false ⇒ behaviourally identical to pre-rewire). - The frozen baseline fixture tests/fixtures/sf_prekeystone_spot_commands .json now INCLUDES DriftDetection's pre-rewire origin/main literal command (regenerated via the established generator at preflight_args=""; the existing 7 entries are unchanged). The byte-identical test asserts DriftDetection's resolved command at preflight_args="" equals that frozen baseline and carries --preflight-only (single space) under shell_run. - CI-safe: tests read only the committed fixture (no `git show origin/main` shell-out — that was the #260 CI failure). Tests: - _SPOT_STATES grew to 8 (added DriftDetection); _DRY_LAMBDA_STATES grew to 11 (added the 7 eval states); _KEYSTONE_SKIP_EXCEPTIONS = empty set. - test_shell_defaults_force_set_ZERO_skip_exceptions asserts the blob force-sets no skip_* and none of the 16 workload skips (incl. the 5 ex-exceptions) appear. - TestHappyPathTraversal: under shell_run nothing is skipped (skipped == set()); DriftDetection is VISITED (runs dry), not jumped past. - Module + class docstrings updated to the rewire semantics. JSON valid (58 top-level states, 91 incl. parallel branches). Full alpha-engine-data suite: 1351 passed, 1 skipped, 0 failed. Zero skip-exceptions remain — every substantive task runs dry under shell_run (spots → --preflight-only, Lambdas → dry_run_llm). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
ROADMAP "Friday shell-run — per-module dry-path activation" (owed-item #4). Converts #258's pure-skip
shell_runinto actual boot + dry execution of the Saturday SF. One file:infrastructure/step_function.json(+ test updates).ApplyShellRunDefaultsno longer force-sets all 16skip_*=true. It now sets a singlepreflight_args=" --preflight-only"suffix var (driving the 7 spot states'States.Formatcommand), Lambda dry flags for the 4 verified-clean Lambdas, and hard-skips ONLY the 5 documented no-clean-dry-path exceptions.InitializeInputseeds the control vars at non-dry identity values.Absolute invariant (preserved + test-proven)
shell_runabsent OR false ⇒ the Saturday SF is byte-identical (spot commands) / behaviourally identical (Lambda payloads) to today's real run. The real Sat 02:00 PT firing passes noshell_run.Byte-identical templating approach:
preflight_argscarries its leading space inside the variable (" --preflight-only"under shell_run,""on the real run). The spot states' final command is aStates.Format(...)whose{}is placed immediately after the mode token with NO literal space, e.g.'bash infrastructure/spot_data_weekly.sh --morning-enrich-only{} 2>&1 | tee ...'. Withpreflight_args=""the formatted string is char-for-char the pre-change literal (proven:TestByteIdenticalAbsentPathresolves theStates.Array/States.Formatintrinsics for all 7 spot states atpreflight_args=""and asserts equality againstorigin/main; under" --preflight-only"it asserts exactly one separating space + no double-space).InitializeInput's nestedJsonMergeis structurally unchanged (still 2-level, user input wins) — only 4 identity keys added to the innermost defaults blob.State classification table
spot_data_weekly.sh --morning-enrich-only --preflight-onlyspot_data_weekly.sh --phase1-only --preflight-onlyspot_data_weekly.sh --rag-only --preflight-onlyspot_train.sh --full-only --preflight-only(last-flag-wins → MODE=preflight-only)spot_backtest.sh --skip-stages=parity,evaluator --preflight-onlyspot_backtest.sh --skip-stages=backtest,evaluator --preflight-onlyspot_backtest.sh --skip-stages=backtest,parity --preflight-onlydry_run_llm=true(via$.research_dry)dry_run=true(via$.data_phase2_dry)action=dry_run(via$.regime_action)action=dry_run(via$.regime_action)spot_drift_detection.shhas NO--preflight-onlyflag_persist_client_side_skips+ Anthropic Batch create; no handler-level dry param_persist_analysisS3put_objectis NOT gated bydry_run(only the CloudWatch metric is)alpha-engine-replay-concordancehandler source not present in any cloned repo; cannot verify a clean dry pathalpha-engine-replay-counterfactualhandler source not present in any cloned repo; cannot verifyPer-Lambda dry-path verification evidence
alpha-engine-research-runner) — VERIFIED.lambda/handler.pydry_run_llm=Truerebuilds the graph afterinstall_dry_run_stubs; the post-feat(morning-enrich): drift alarm for chronic_polygon_gaps allowlist #195dry_run.pyno-opsarchive_writer,email_sender,upload_db,write_signals_json,save_sector_team_run,save_agent_run. No S3/email/DB write.alpha-engine-data-collector, phase 2) — VERIFIED.lambda/handler.pyhealth-marker write gatedif not dry_run;collectors/alternative.collect(dry_run=True)returnsok_dry_runBEFORE any_fetch_all_alternative(external API) or S3 write.alpha-engine-predictor-regime-*) — VERIFIED.action="dry_run"→produce_*(write=False)returns{"payload":…, "wrote": False}before anyput_object; reads macro history + fits HMM in-memory only.lambda/eval_judge_submit_handler.pyalways calls_persist_client_side_skips(S3) +submit_batch(Anthropic Batch create + S3 plan). The docstring's "dry_run smoke" isscripts/smoke_eval_judge.py, not a handler param.dry_run=Trueonly setsemit_metrics=False, butcompute_and_emitcalls_persist_analysis(S3put_object) unconditionally; only the CloudWatch metric is gated.alpha-engine-replay-concordance/-counterfactualis not in any cloned repo (research/predictor/data/executor); routing to an unverified dry path is forbidden by the task discipline.Owed-item #5 finding (universe-freshness shell_run-aware tolerance)
Non-issue for the SF; not in this PR.
SaturdayHealthCheck+WeeklySubstrateHealthCheckare non-blocking (Catch: States.ALL → NotifyComplete).health_checker.pydoessys.exit(1 if failures else 0), so under shell_run (spots ran preflight-only → no Friday data refresh) a stale/missing bar makes the SSM command exit 1 — but the non-blocking Catch absorbs it, so the SF does not spuriously fail. The only artifact is a clearly-Friday-timestamped alert email (cosmetic). The task says add tolerance ONLY if a real SF-fatal spurious-fail path exists; it does not. A--shell-run-aware staleness tolerance inalpha-engine-dashboard/health_checker.pyis a scoped cross-repo follow-on, deliberately out of this single-file SF PR.Tests
tests/test_sf_friday_shell_run_wiring.pyextended (mirrors #258's; 43 tests in-file):TestByteIdenticalAbsentPath— ASL-intrinsic resolver; per-spot byte-identical proof atpreflight_args=""vsorigin/main;--preflight-onlycorrectness at shell_run; Lambda payload$.varrouting.TestApplyShellRunDefaults— dry control vars set; leading-space invariant; skip-set == the 5 documented exceptions ONLY; verified-clean states NOT skipped; InitializeInput identity seeding; feat(sf): Friday-PM shell_run dry-pass of the Saturday pipeline (spine; rule shipped disabled) #258 skip gates left intact.TestHappyPathTraversal— shell_run=true visits the dry workloads + skips only exceptions + reaches NotifyShellRunComplete; absent path is pre-keystone.Also fixed 6 pre-existing tests in 2 other SF test files that asserted the old structure (literal
commandsarray / hardcodedaction: produce) — now keystone-aware via the sharedextract_commandshelper /$.regime_actionrouting.Results: full
pytest tests/→ 1337 passed, 1 skipped (skip + pandas FutureWarnings pre-existing/unrelated).infrastructure/step_function.jsonparses (58 states). Byte-identical proof: all 7 spot states BYTE-IDENTICAL on the absent path.🤖 Generated with Claude Code