fix(rag): resolve PYTHON_BIN on spot instances (AL2023 has no `python…#59
Merged
Conversation
…` symlink)
Saturday 2026-04-17 Step Function failed at RAG step 0 preflight:
rag/pipelines/run_weekly_ingestion.sh: line 56: python: command not found
Root cause: spot_data_weekly.sh bootstraps python3.12 explicitly and uses
$REMOTE_PYTHON to invoke weekly_collector.py, but the downstream RAG bash
script calls bare `python`. Amazon Linux 2023 has no `python` symlink by
default (only python3, python3.12 after our dnf install), so the RAG step
crashes immediately after a successful DataPhase1.
Two-part fix:
1. rag/pipelines/run_weekly_ingestion.sh:
- Resolve PYTHON_BIN at top of script: honor caller's export, else
fall back through python3 → python3.12 → python → hard-fail.
- Replace all 6 bare `python -m ...` invocations with $PYTHON_BIN -m.
- Remove the duplicate inline default that was only set before the
final completion-email python call.
2. infrastructure/spot_data_weekly.sh:
- Export PYTHON_BIN=$REMOTE_PYTHON via ENV_SOURCE so the RAG heredoc
inherits the interpreter we actually bootstrapped (python3.12, not
the python3 default which is 3.9 on AL2023.0).
Validated: DataPhase1 completed successfully at 35.5 min tonight (polygon
cache fix working). Only RAG failed — this is the fix.
Tests: 89/89 green.
4 tasks
cipher813
added a commit
that referenced
this pull request
May 24, 2026
…c (lib v0.27.0) (#297) Bumps alpha-engine-lib pin v0.26.0 → v0.27.0 and migrates every data-module freshness-check site to use the new chokepoint helpers ``alpha_engine_lib.dates.{trading_days_stale, is_fresh_in_trading_days, expected_last_close}`` introduced in lib #59. Sites migrated (5 in data-repo Python code + matching test fixtures): - validators/postflight.py::_check_macro_spy_fresh — the gate that blocked the 2026-05-24 Sunday SF recovery. Threshold flips from `_MACRO_SPY_MAX_STALE_DAYS=1 (calendar)` to `_MACRO_SPY_MAX_STALE_TRADING_DAYS=0` via is_fresh_in_trading_days; error message updated to "is N trading-day(s) behind …". - validators/postflight.py::_check_universe_sample_fresh — relative- to-SPY ticker staleness now uses trading_days_stale. Constant renamed `_UNIVERSE_MAX_STALE_VS_SPY_DAYS` → `_UNIVERSE_MAX_STALE_VS_SPY_TRADING_DAYS`. - preflight.py::DataPreflight (daily mode) — bypasses lib's calendar- day check_arcticdb_fresh helper and calls a new `_check_macro_spy_fresh_trading_days(max_stale=1)` method that delegates to the lib chokepoint. max_stale=1 tolerates polygon's T+1 publish latency (yesterday's close may not be in arctic at preflight time on a weekday morning). - sf_preflight.py::_check_prune_safety (2 sites) — both per-ticker "days_stale" calculations now use trading_days_stale. Threshold flipped from 5 calendar days to 3 trading days (≈ equivalent semantic; the calendar threshold absorbed a weekend buffer that trading-day arithmetic handles natively). - builders/daily_append.py::_scan_universe_and_emit_freshness_receipt — `UNIVERSE_FRESHNESS_MAX_STALE_DAYS` renamed `UNIVERSE_FRESHNESS_MAX_STALE_TRADING_DAYS` (5 → 3). Receipt JSON schema updated: `max_stale_days_threshold` → `max_stale_trading_days_threshold`, `stalest_age_days` → `stalest_age_trading_days`. Downstream consumers reading the receipt under universe_freshness.json must update their field names (predictor / executor / backtester preflights — separate PRs). NOT migrated (deliberately — different semantic): - collectors/prices.py::_find_stale_fast — checks S3 LastModified (wall-clock write timestamp), not data-freshness. Asks "have we re-written this parquet recently?", correctly calendar-based. weekly_collector.py:208 `staleness_threshold_days=3` retained. Test suite (1432 → 1440 passing): - test_postflight.py: 2 new cases (Sunday redrive with Friday macro, Memorial-Day-Monday redrive with Friday macro) - test_daily_append_universe_freshness.py: threshold + receipt-field references updated; stalest_symbol assertion relaxed since trading-day arithmetic collapses adjacent calendar offsets to the same trading-day bucket on weekends (test no longer flaps by day-of-run). Per [[feedback_lift_invariants_to_chokepoint_after_second_recurrence]] + [[feedback_sota_institutional_default_no_shortcuts]] — closes 5 of the 7 data/predictor/research calendar-day freshness sites surfaced by the 2026-05-24 audit. Predictor + research migrations follow in parallel PRs against the same lib v0.27.0 tag. Lib helper `check_arcticdb_fresh` remains calendar-day; tracked for retirement in a separate lib follow-up PR once all consumers are off it. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813
added a commit
that referenced
this pull request
May 24, 2026
…ctions (#298) AST-walk regression pin: every production function whose name matches fresh|stale|preflight|postflight must not contain ``.days`` calendar arithmetic. Closes the cross-repo defect class surfaced by the 2026-05-24 Sunday SF recovery: calendar-day gates trip on every post-Saturday redrive even when data carries the most recent NYSE close. Escape hatch: inline ``# noqa: trading-day`` marker on the same line documents calendar-day correctness at that specific call site. Explicit allowlist: ``collectors/prices._find_stale_fast`` checks S3 LastModified timestamp (write-recency, not data-freshness) and is correctly calendar-day; allowlist verified by a second test that ensures the named function actually exists in the named file. Pin passes on current clean state (all 5 freshness sites migrated to ``alpha_engine_lib.dates.{trading_days_stale, is_fresh_in_trading_days}`` in PR #297). Would catch any future PR that adds calendar-day arithmetic to a freshness-named function. Composes with the lib v0.27.0 chokepoint (lib #59) + the cross-repo migration arc (predictor #191, research #222). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…` symlink)
Saturday 2026-04-17 Step Function failed at RAG step 0 preflight:
rag/pipelines/run_weekly_ingestion.sh: line 56: python: command not found
Root cause: spot_data_weekly.sh bootstraps python3.12 explicitly and uses $REMOTE_PYTHON to invoke weekly_collector.py, but the downstream RAG bash script calls bare
python. Amazon Linux 2023 has nopythonsymlink by default (only python3, python3.12 after our dnf install), so the RAG step crashes immediately after a successful DataPhase1.Two-part fix:
rag/pipelines/run_weekly_ingestion.sh:
python -m ...invocations with $PYTHON_BIN -m.infrastructure/spot_data_weekly.sh:
Validated: DataPhase1 completed successfully at 35.5 min tonight (polygon cache fix working). Only RAG failed — this is the fix.
Tests: 89/89 green.