feat(data): spot_data_weekly.sh --preflight-only (Friday shell-run dry path)#259
Merged
Merged
Conversation
…y path) ROADMAP "Friday shell-run — per-module dry-path activation" owed-item #1. Under the Friday shell_run, the DataPhase1/MorningEnrich + RAGIngestion spot states now boot the spot for real, run their EXISTING preflight, then exit 0 with ZERO external API data fetch and ZERO S3/ArcticDB/config/email/SNS writes — catching bootstrap-class breakage (lib-pin drift, sys.path collision, stale ArcticDB symbol, SSM timeout, Dockerfile/image gap) ~12h before the real Saturday run. Reuses the existing preflight substrate; no parallel preflight written. Where the gate sits / zero-fetch zero-write proof: - weekly_collector.py: new `--preflight-only` argparse flag. main() exits HERE — `raise SystemExit(0)` immediately after the existing `DataPreflight(config["bucket"], mode).run()` and strictly BEFORE `run_weekly(config, args)`. run_weekly() is the SOLE function in the module that performs ANY collector fetch (polygon/FMP/FRED/yfinance) or ANY S3/ArcticDB/parquet/config/module-health write — gating in front of it makes every fetch/write code path statically unreachable. The preflight itself only does read-only/auth probes (S3 HEAD, polygon/FRED reference-data auth calls that fetch no collector data, ArcticDB list_libraries) plus a self-cleaning S3 PUT+DELETE sentinel under preflight/ (the preflight's own liveness probe, not a data write). Ordering pinned by an AST-source test. - rag/pipelines/run_weekly_ingestion.sh: new `--preflight-only` flag. Exits 0 after Step 0 (`python -m rag.preflight`: check_env_vars + check_s3_bucket HEAD — read-only, zero fetch, zero write) and strictly BEFORE Step 1 (ingest_sec_filings). Every ingest_* pipeline, Voyage embedding call, and Postgres/pgvector + parquet write lives in Steps 1-9 — all unreachable once the guard exits. - infrastructure/spot_data_weekly.sh: new `--preflight-only` flag sets PREFLIGHT_ONLY=1, a MODIFIER orthogonal to RUN_MODE so it composes with the data path AND --rag-only. A dedicated data-path block runs `weekly_collector.py --morning-enrich --preflight-only` and/or `weekly_collector.py --phase 1 --preflight-only` (gated by the existing DO_MORNING_ENRICH/DO_PHASE1 split) then exit 0 before the real WORKLOADS heredoc — no prune (prune-audit JSON write), no RAG, no CloudWatch heartbeat, no S3 log upload. --rag-only --preflight-only behavior: runs ONLY the RAG-path preflight (boot + SSM secret fetch so rag.preflight's check_env_vars sees them + `run_weekly_ingestion.sh --preflight-only` = step-0-only + exit 0). No real RAG ingestion, no rag-ingestion heartbeat. `--preflight-only` alone runs ONLY the DataPhase1/MorningEnrich preflight. Universe-freshness tolerance note (ROADMAP owed-item #5): the Friday shell-run uses the phase1 / morning_enrich preflight modes. Per preflight.py::DataPreflight.run, NEITHER mode runs check_arcticdb_fresh — they only do _check_arcticdb_libraries_present (a presence read, not a freshness gate). morning_enrich deliberately omits freshness (it is part of what *makes* ArcticDB fresh); phase1 *populates* ArcticDB. The only freshness gate (check_arcticdb_fresh macro/SPY 4d) lives in the "daily" mode, which the Saturday/Friday data path never selects. So a Friday run predating Friday's settled polygon aggregate does NOT spuriously fail on a Thursday-last-bar — no --preflight-only-scoped tolerance code is required for the data path. Documented inline so a future mode-mapping change re-audits this invariant. Tests: new tests/test_preflight_only_dry_path.py (10 tests, static greps + AST-source assertions, matching the existing test_spot_data_weekly_run_modes.py / test_weekly_collector_preflight_ mode_mapping.py convention) pins: flag parsing on all 3 files, the exit-0-after-preflight-before-fetch/write ordering invariant, --rag-only --preflight-only step-0-only behavior, and the no-prune/no-RAG/no-heartbeat/no-S3-upload hard invariant. Full suite: 1229 passed, 1 skipped (pre-existing). bash -n clean on both shell scripts. No new deps, no secrets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813
added a commit
that referenced
this pull request
May 18, 2026
…n dry path — closes DriftDetection skip-exception) (#261) Adds a `--preflight-only` modifier to infrastructure/spot_drift_detection.sh, mirroring the merged #259 (spot_data_weekly.sh) / predictor #175 / backtester #224 pattern. Closes the DriftDetection skip-exception in ROADMAP "Friday shell-run — per-module dry-path activation" — the one per-module SF step still SKIPPED rather than dry-run on the Friday shell_run. Insertion point --------------- `PREFLIGHT_ONLY=0` modifier var initialised before the arg-parse loop (orthogonal to RUN_MODE, `set -u` safe); `--preflight-only) PREFLIGHT_ONLY=1` added to the case loop. The guard block is inserted AFTER the smoke-only block and strictly BEFORE the "# ── Full drift detection ──" section (the `run_remote bash -s <<DRIFT` heredoc) and before the trailing `aws cloudwatch put-metric-data` heartbeat. No-scan / no-write proof ------------------------ `monitoring.drift_detector` (in alpha-engine-predictor, on the sibling-clone PYTHONPATH) is the SOLE code path that does any S3 get_object/put_object of the drift report or SNS publish on alert; the launcher's CloudWatch put-metric-data heartbeat trails it. The PREFLIGHT_ONLY guard `exit 0`s strictly before the `<<DRIFT` heredoc, so the scan, the SNS publish, the S3 put_object, and the CloudWatch emit are all statically unreachable. The preflight itself runs only BasePreflight.check_env_vars (env read) + BasePreflight.check_s3_bucket (bucket HEAD) + an `importlib.import_module` of the drift module (import-only — boto3 clients + check_drift()/main() sit behind `if __name__ == "__main__"`, which an import does not trigger). Zero external API data fetch, zero S3/CW/SNS/config mutation; exit 0 because a passed preflight is a healthy outcome (SSM/SF report Success). Preflight substrate reused -------------------------- The drift workload binary lives in alpha-engine-predictor (no --preflight-only of its own; out of scope to modify here) and this repo's preflight.py DataPreflight modes (daily/morning_enrich/phase1/phase2) are data-collection scoped — none maps to drift. Per the canonical-lib fallback the preflight composes `alpha_engine_lib.preflight.BasePreflight` DIRECTLY (env-vars + S3 HEAD) — no bespoke preflight scaffolding duplicated. Verbatim flag name: `--preflight-only` Tests ----- New tests/test_spot_drift_detection_preflight_only.py (5 static greps/source-position assertions, mirroring tests/test_preflight_only_dry_path.py): flag parses as a modifier; guard precedes DRIFT + heartbeat; exit 0 before DRIFT; no scan/S3/CW/SNS in block; canonical BasePreflight reused (no scaffolding). `bash -n` clean. Full data suite: 1342 passed, 1 skipped (pre-existing), 5 pre-existing warnings. Independent of #260: that PR touches spot_data_weekly.sh + the Lambda dry-run keystone (a different file); the Saturday/Friday SF rewire to route the DriftDetection state at this `--preflight-only` flag under the Friday shell_run is a separate follow-on (no step_function.json change here). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ROADMAP "Friday shell-run — per-module dry-path activation" owed-item #1.
Under the Friday
shell_run, the DataPhase1/MorningEnrich + RAGIngestion spot states now boot the spot for real, run their EXISTING preflight + imports + AWS/SSM/ArcticDB connectivity checks, then exit 0 with ZERO external API data fetch and ZERO S3/ArcticDB/config/email/SNS writes — catching bootstrap-class breakage (lib-pin drift, sys.path collision, stale ArcticDB symbol, SSM timeout, Dockerfile/image gap) ~12h before the real Saturday run. Reuses the existing preflight substrate; no parallel preflight written.Where the gate sits + zero-fetch / zero-write proof
weekly_collector.py— new--preflight-onlyargparse flag.main()exits immediately after the existingDataPreflight(config["bucket"], mode).run()and strictly beforerun_weekly(config, args)viaraise SystemExit(0).run_weekly()is the sole function in the module that performs any collector fetch (polygon/FMP/FRED/yfinance) or any S3/ArcticDB/parquet/config/module-health write — gating in front of it makes every fetch/write code path statically unreachable. The preflight itself only does read-only/auth probes (S3 HEAD, polygon/FRED reference-data auth calls that fetch no collector data, ArcticDBlist_libraries) plus a self-cleaning S3 PUT+DELETE sentinel underpreflight/(the preflight's own liveness probe, not a data write). Ordering pinned by an AST-source test.rag/pipelines/run_weekly_ingestion.sh— new--preflight-onlyflag. Exits 0 after Step 0 (python -m rag.preflight:check_env_vars+check_s3_bucketHEAD — read-only) and strictly before Step 1 (ingest_sec_filings). Everyingest_*pipeline, Voyage embedding call, and Postgres/pgvector + parquet write lives in Steps 1-9 — all unreachable once the guard exits.infrastructure/spot_data_weekly.sh— new--preflight-onlyflag setsPREFLIGHT_ONLY=1, a modifier orthogonal toRUN_MODEso it composes with the data path AND--rag-only. A dedicated data-path block runsweekly_collector.py --morning-enrich --preflight-onlyand/orweekly_collector.py --phase 1 --preflight-only(gated by the existingDO_MORNING_ENRICH/DO_PHASE1preflight-task-split) thenexit 0before the real WORKLOADS heredoc — no prune (prune-audit JSON write), no RAG, no CloudWatch heartbeat, no S3 log upload.--rag-only --preflight-onlybehaviorRuns ONLY the RAG-path preflight: boot + SSM secret fetch (so
rag.preflight'scheck_env_varssees the 4 RAG secrets) +run_weekly_ingestion.sh --preflight-only(step-0-only + exit 0). No real RAG ingestion, no rag-ingestion heartbeat.--preflight-onlyalone → ONLY the DataPhase1/MorningEnrich preflight.Universe-freshness tolerance note (ROADMAP owed-item #5)
The Friday shell-run uses the phase1 / morning_enrich preflight modes. Per
preflight.py::DataPreflight.run, neither runscheck_arcticdb_fresh— they only do_check_arcticdb_libraries_present(a presence read, not a freshness gate).morning_enrichdeliberately omits freshness (it is part of what makes ArcticDB fresh);phase1populates ArcticDB. The only freshness gate (check_arcticdb_freshmacro/SPY 4d) lives in thedailymode, which the Saturday/Friday data path never selects. So a Friday run predating Friday's settled polygon aggregate does not spuriously fail on a Thursday-last-bar — no--preflight-only-scoped tolerance code is required for the data path. Documented inline so a future mode-mapping change re-audits this invariant.Tests
New
tests/test_preflight_only_dry_path.py(10 tests, static greps + AST-source assertions, matching the existingtest_spot_data_weekly_run_modes.py/test_weekly_collector_preflight_mode_mapping.pyconvention): flag parsing on all 3 files, the exit-0-after-preflight-before-fetch/write ordering invariant,--rag-only --preflight-onlystep-0-only behavior, and the no-prune/no-RAG/no-heartbeat/no-S3-upload hard invariant.bash -nclean on both shell scriptsFlag name (verbatim, for the SF keystone follow-on):
--preflight-only🤖 Generated with Claude Code