Skip to content

feat(sf): shell-run keystone — spot --preflight-only + Lambda --dry-run instead of skip#260

Merged
cipher813 merged 2 commits into
mainfrom
feat/sf-shell-run-keystone
May 18, 2026
Merged

feat(sf): shell-run keystone — spot --preflight-only + Lambda --dry-run instead of skip#260
cipher813 merged 2 commits into
mainfrom
feat/sf-shell-run-keystone

Conversation

@cipher813
Copy link
Copy Markdown
Owner

What

ROADMAP "Friday shell-run — per-module dry-path activation" (owed-item #4). Converts #258's pure-skip shell_run into actual boot + dry execution of the Saturday SF. One file: infrastructure/step_function.json (+ test updates).

ApplyShellRunDefaults no longer force-sets all 16 skip_*=true. It now sets a single preflight_args=" --preflight-only" suffix var (driving the 7 spot states' States.Format command), Lambda dry flags for the 4 verified-clean Lambdas, and hard-skips ONLY the 5 documented no-clean-dry-path exceptions. InitializeInput seeds the control vars at non-dry identity values.

Absolute invariant (preserved + test-proven)

shell_run absent OR false ⇒ the Saturday SF is byte-identical (spot commands) / behaviourally identical (Lambda payloads) to today's real run. The real Sat 02:00 PT firing passes no shell_run.

Byte-identical templating approach: preflight_args carries its leading space inside the variable (" --preflight-only" under shell_run, "" on the real run). The spot states' final command is a States.Format(...) whose {} is placed immediately after the mode token with NO literal space, e.g. 'bash infrastructure/spot_data_weekly.sh --morning-enrich-only{} 2>&1 | tee ...'. With preflight_args="" the formatted string is char-for-char the pre-change literal (proven: TestByteIdenticalAbsentPath resolves the States.Array/States.Format intrinsics for all 7 spot states at preflight_args="" and asserts equality against origin/main; under " --preflight-only" it asserts exactly one separating space + no double-space).

InitializeInput's nested JsonMerge is structurally unchanged (still 2-level, user input wins) — only 4 identity keys added to the innermost defaults blob.

State classification table

State Spot/Lambda Dry mechanism under shell_run
MorningEnrich Spot spot_data_weekly.sh --morning-enrich-only --preflight-only
DataPhase1 Spot spot_data_weekly.sh --phase1-only --preflight-only
RAGIngestion Spot spot_data_weekly.sh --rag-only --preflight-only
PredictorTraining Spot spot_train.sh --full-only --preflight-only (last-flag-wins → MODE=preflight-only)
Backtester Spot spot_backtest.sh --skip-stages=parity,evaluator --preflight-only
Parity Spot spot_backtest.sh --skip-stages=backtest,evaluator --preflight-only
Evaluator Spot spot_backtest.sh --skip-stages=backtest,parity --preflight-only
Research Lambda dry_run_llm=true (via $.research_dry)
DataPhase2 Lambda dry_run=true (via $.data_phase2_dry)
RegimeSubstrate Lambda action=dry_run (via $.regime_action)
RegimeRetrospectiveEval Lambda action=dry_run (via $.regime_action)
DriftDetection Spot KEPT SKIPPEDspot_drift_detection.sh has NO --preflight-only flag
EvalJudge chain Lambda KEPT SKIPPED — submit handler always _persist_client_side_skips + Anthropic Batch create; no handler-level dry param
RationaleClustering Lambda KEPT SKIPPED_persist_analysis S3 put_object is NOT gated by dry_run (only the CloudWatch metric is)
ReplayConcordance Lambda KEPT SKIPPEDalpha-engine-replay-concordance handler source not present in any cloned repo; cannot verify a clean dry path
Counterfactual Lambda KEPT SKIPPEDalpha-engine-replay-counterfactual handler source not present in any cloned repo; cannot verify
SaturdayHealthCheck / WeeklySubstrateHealthCheck Spot Already run under #258 — left as-is (bootstrap smoke)

Per-Lambda dry-path verification evidence

  • Research (alpha-engine-research-runner) — VERIFIED. lambda/handler.py dry_run_llm=True rebuilds the graph after install_dry_run_stubs; the post-feat(morning-enrich): drift alarm for chronic_polygon_gaps allowlist #195 dry_run.py no-ops archive_writer, email_sender, upload_db, write_signals_json, save_sector_team_run, save_agent_run. No S3/email/DB write.
  • DataPhase2 (alpha-engine-data-collector, phase 2) — VERIFIED. lambda/handler.py health-marker write gated if not dry_run; collectors/alternative.collect(dry_run=True) returns ok_dry_run BEFORE any _fetch_all_alternative (external API) or S3 write.
  • RegimeSubstrate / RegimeRetrospectiveEval (alpha-engine-predictor-regime-*) — VERIFIED. action="dry_run"produce_*(write=False) returns {"payload":…, "wrote": False} before any put_object; reads macro history + fits HMM in-memory only.
  • EvalJudge chain — NOT VERIFIED → KEPT SKIPPED. lambda/eval_judge_submit_handler.py always calls _persist_client_side_skips (S3) + submit_batch (Anthropic Batch create + S3 plan). The docstring's "dry_run smoke" is scripts/smoke_eval_judge.py, not a handler param.
  • RationaleClustering — NOT VERIFIED → KEPT SKIPPED. dry_run=True only sets emit_metrics=False, but compute_and_emit calls _persist_analysis (S3 put_object) unconditionally; only the CloudWatch metric is gated.
  • ReplayConcordance / Counterfactual — NOT VERIFIED → KEPT SKIPPED. Handler source for alpha-engine-replay-concordance/-counterfactual is not in any cloned repo (research/predictor/data/executor); routing to an unverified dry path is forbidden by the task discipline.

Owed-item #5 finding (universe-freshness shell_run-aware tolerance)

Non-issue for the SF; not in this PR. SaturdayHealthCheck + WeeklySubstrateHealthCheck are non-blocking (Catch: States.ALL → NotifyComplete). health_checker.py does sys.exit(1 if failures else 0), so under shell_run (spots ran preflight-only → no Friday data refresh) a stale/missing bar makes the SSM command exit 1 — but the non-blocking Catch absorbs it, so the SF does not spuriously fail. The only artifact is a clearly-Friday-timestamped alert email (cosmetic). The task says add tolerance ONLY if a real SF-fatal spurious-fail path exists; it does not. A --shell-run-aware staleness tolerance in alpha-engine-dashboard/health_checker.py is a scoped cross-repo follow-on, deliberately out of this single-file SF PR.

Tests

tests/test_sf_friday_shell_run_wiring.py extended (mirrors #258's; 43 tests in-file):

  • TestByteIdenticalAbsentPath — ASL-intrinsic resolver; per-spot byte-identical proof at preflight_args="" vs origin/main; --preflight-only correctness at shell_run; Lambda payload $.var routing.
  • TestApplyShellRunDefaults — dry control vars set; leading-space invariant; skip-set == the 5 documented exceptions ONLY; verified-clean states NOT skipped; InitializeInput identity seeding; feat(sf): Friday-PM shell_run dry-pass of the Saturday pipeline (spine; rule shipped disabled) #258 skip gates left intact.
  • TestHappyPathTraversal — shell_run=true visits the dry workloads + skips only exceptions + reaches NotifyShellRunComplete; absent path is pre-keystone.

Also fixed 6 pre-existing tests in 2 other SF test files that asserted the old structure (literal commands array / hardcoded action: produce) — now keystone-aware via the shared extract_commands helper / $.regime_action routing.

Results: full pytest tests/1337 passed, 1 skipped (skip + pandas FutureWarnings pre-existing/unrelated). infrastructure/step_function.json parses (58 states). Byte-identical proof: all 7 spot states BYTE-IDENTICAL on the absent path.

🤖 Generated with Claude Code

cipher813 and others added 2 commits May 18, 2026 14:20
…un instead of skip

Converts #258's pure-skip shell_run into actual boot+dry execution of the
Saturday SF workload. ApplyShellRunDefaults no longer force-sets all 16
skip_* true; it now sets a single preflight_args=" --preflight-only"
suffix var (driving the 7 spot states' States.Format command), Lambda dry
flags for the 4 verified-clean Lambda states, and hard-skips ONLY the 5
documented no-clean-dry-path exceptions. InitializeInput seeds the control
vars at non-dry identity values so the shell_run-absent path is
byte-identical (spots) / behaviourally identical (Lambdas) to today's real
Saturday run.

Invariant preserved + test-proven: shell_run absent/false ⇒ every spot
command string char-for-char unchanged (TestByteIdenticalAbsentPath
resolves the States.Array/States.Format intrinsics with preflight_args=""
and asserts equality against origin/main).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…in shallow PR checkout)

The keystone byte-identical proof shelled out to
`git show origin/main:infrastructure/step_function.json` at test time.
GitHub Actions' shallow PR checkout has no `origin/main` local ref →
`subprocess.CalledProcessError ... exit status 128` → `test` check failed.

Replace the live-git `orig_sf` fixture with a committed frozen baseline
`tests/fixtures/sf_prekeystone_spot_commands.json` (the RESOLVED
pre-keystone spot command lists captured from origin/main; handles the
states already on commands.$ — Backtester/Parity/Evaluator). The proof
is now hermetic and still a true regression guard against the
strict-superset invariant. Docstring documents deliberate-regeneration.

Suite: 1337 passed, 1 skipped (unchanged). Keystone file 43/43.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 438d300 into main May 18, 2026
1 check passed
@cipher813 cipher813 deleted the feat/sf-shell-run-keystone branch May 18, 2026 21:29
cipher813 added a commit that referenced this pull request May 18, 2026
…n dry path — closes DriftDetection skip-exception) (#261)

Adds a `--preflight-only` modifier to infrastructure/spot_drift_detection.sh,
mirroring the merged #259 (spot_data_weekly.sh) / predictor #175 /
backtester #224 pattern. Closes the DriftDetection skip-exception in
ROADMAP "Friday shell-run — per-module dry-path activation" — the one
per-module SF step still SKIPPED rather than dry-run on the Friday shell_run.

Insertion point
---------------
`PREFLIGHT_ONLY=0` modifier var initialised before the arg-parse loop
(orthogonal to RUN_MODE, `set -u` safe); `--preflight-only) PREFLIGHT_ONLY=1`
added to the case loop. The guard block is inserted AFTER the smoke-only
block and strictly BEFORE the "# ── Full drift detection ──" section (the
`run_remote bash -s <<DRIFT` heredoc) and before the trailing
`aws cloudwatch put-metric-data` heartbeat.

No-scan / no-write proof
------------------------
`monitoring.drift_detector` (in alpha-engine-predictor, on the sibling-clone
PYTHONPATH) is the SOLE code path that does any S3 get_object/put_object of
the drift report or SNS publish on alert; the launcher's CloudWatch
put-metric-data heartbeat trails it. The PREFLIGHT_ONLY guard `exit 0`s
strictly before the `<<DRIFT` heredoc, so the scan, the SNS publish, the S3
put_object, and the CloudWatch emit are all statically unreachable. The
preflight itself runs only BasePreflight.check_env_vars (env read) +
BasePreflight.check_s3_bucket (bucket HEAD) + an `importlib.import_module`
of the drift module (import-only — boto3 clients + check_drift()/main()
sit behind `if __name__ == "__main__"`, which an import does not trigger).
Zero external API data fetch, zero S3/CW/SNS/config mutation; exit 0
because a passed preflight is a healthy outcome (SSM/SF report Success).

Preflight substrate reused
--------------------------
The drift workload binary lives in alpha-engine-predictor (no
--preflight-only of its own; out of scope to modify here) and this repo's
preflight.py DataPreflight modes (daily/morning_enrich/phase1/phase2) are
data-collection scoped — none maps to drift. Per the canonical-lib
fallback the preflight composes `alpha_engine_lib.preflight.BasePreflight`
DIRECTLY (env-vars + S3 HEAD) — no bespoke preflight scaffolding duplicated.

Verbatim flag name: `--preflight-only`

Tests
-----
New tests/test_spot_drift_detection_preflight_only.py (5 static
greps/source-position assertions, mirroring
tests/test_preflight_only_dry_path.py): flag parses as a modifier;
guard precedes DRIFT + heartbeat; exit 0 before DRIFT; no scan/S3/CW/SNS
in block; canonical BasePreflight reused (no scaffolding). `bash -n`
clean. Full data suite: 1342 passed, 1 skipped (pre-existing), 5
pre-existing warnings.

Independent of #260: that PR touches spot_data_weekly.sh + the Lambda
dry-run keystone (a different file); the Saturday/Friday SF rewire to
route the DriftDetection state at this `--preflight-only` flag under the
Friday shell_run is a separate follow-on (no step_function.json change here).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813 added a commit that referenced this pull request May 18, 2026
…ht + EvalJudge/Rationale/Replay/CF dry_run_llm) (#263)

Closes the keystone gap: the 5 documented shell-run skip-exceptions are
flipped skip→dry. Under shell_run EVERY substantive workload now boots +
runs dry; ZERO skip-exceptions remain. All prerequisite dry flags were
already MERGED on origin/main of their repos.

Per-state mechanism:

| State                       | Type   | Mechanism (under shell_run)                          |
|-----------------------------|--------|------------------------------------------------------|
| DriftDetection              | spot   | commands.$ States.Format($.preflight_args) → ` --preflight-only` (data #261) |
| EvalJudgeSubmitFirstSaturday| Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) |
| EvalJudgeSubmitWeekly       | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) |
| EvalJudgePoll               | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) |
| EvalJudgeProcess            | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) |
| RationaleClustering         | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) |
| ReplayConcordance           | Lambda | Payload "dry_run_llm.$": "$.research_dry" (backtester #225) |
| Counterfactual              | Lambda | Payload "dry_run_llm.$": "$.research_dry" (backtester #225) |

Exact canonical dry var: $.research_dry. It is THE canonical shell-run
LLM-dry signal — InitializeInput seeds it false on every run (so the
absent path / real Sat 02:00 PT firing is unchanged); ApplyShellRunDefaults
already sets it true under shell_run (it backed Research from the
keystone). No new var invented — research #202 / backtester #225 PR bodies
specify dry_run_llm, and reusing $.research_dry keeps the absent-path
guarantee automatic (no extra seeding needed; the seed already exists).

Changes:
- ApplyShellRunDefaults: removed skip_drift_detection / skip_eval_judge /
  skip_rationale_clustering / skip_replay_concordance / skip_counterfactual
  from the force-set JsonMerge blob. It now force-sets ZERO skip_*.
  Per-flag user overrides still win (merge order unchanged). The
  Choice-gated CheckSkip<State> gates are LEFT INTACT (still valid for
  targeted operator skips — verified by test_skip_gates_still_intact).
- DriftDetection: literal `commands` array → `commands.$` States.Array
  whose final entry is States.Format('bash infrastructure/
  spot_drift_detection.sh{} 2>&1 | tee /var/log/drift-detection.log',
  $.preflight_args). {} sits immediately after the script token with no
  literal space; preflight_args carries its leading space inside the var,
  so preflight_args="" reproduces the origin/main command char-for-char
  and " --preflight-only" yields exactly one separating space.
- 7 eval Lambda Payloads: added "dry_run_llm.$": "$.research_dry".
  EvalRollingMean (alpha-engine-research-eval-rolling-mean) was NOT touched
  — it has no skip gate, was never a keystone exception, and is a pure
  historical-metric reader (out of scope).

Byte-identical proof approach:
- shell_run absent ⇒ CheckShellRun.Default = CheckSkipMorningEnrich
  (unchanged); InitializeInput seeds preflight_args="", research_dry=false.
  Every spot States.Format resolves char-for-char to the frozen
  origin/main literal; every eval Lambda dry_run_llm.$ resolves to false
  (handlers default it false ⇒ behaviourally identical to pre-rewire).
- The frozen baseline fixture tests/fixtures/sf_prekeystone_spot_commands
  .json now INCLUDES DriftDetection's pre-rewire origin/main literal
  command (regenerated via the established generator at preflight_args="";
  the existing 7 entries are unchanged). The byte-identical test asserts
  DriftDetection's resolved command at preflight_args="" equals that
  frozen baseline and carries --preflight-only (single space) under
  shell_run.
- CI-safe: tests read only the committed fixture (no `git show
  origin/main` shell-out — that was the #260 CI failure).

Tests:
- _SPOT_STATES grew to 8 (added DriftDetection); _DRY_LAMBDA_STATES grew
  to 11 (added the 7 eval states); _KEYSTONE_SKIP_EXCEPTIONS = empty set.
- test_shell_defaults_force_set_ZERO_skip_exceptions asserts the blob
  force-sets no skip_* and none of the 16 workload skips (incl. the 5
  ex-exceptions) appear.
- TestHappyPathTraversal: under shell_run nothing is skipped (skipped ==
  set()); DriftDetection is VISITED (runs dry), not jumped past.
- Module + class docstrings updated to the rewire semantics.

JSON valid (58 top-level states, 91 incl. parallel branches). Full
alpha-engine-data suite: 1351 passed, 1 skipped, 0 failed.

Zero skip-exceptions remain — every substantive task runs dry under
shell_run (spots → --preflight-only, Lambdas → dry_run_llm).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant