Skip to content

fix(rag): resolve PYTHON_BIN on spot instances (AL2023 has no `python…#59

Merged
cipher813 merged 1 commit into
mainfrom
fix/rag-python-bin-on-spot
Apr 18, 2026
Merged

fix(rag): resolve PYTHON_BIN on spot instances (AL2023 has no `python…#59
cipher813 merged 1 commit into
mainfrom
fix/rag-python-bin-on-spot

Conversation

@cipher813
Copy link
Copy Markdown
Owner

…` symlink)

Saturday 2026-04-17 Step Function failed at RAG step 0 preflight:
rag/pipelines/run_weekly_ingestion.sh: line 56: python: command not found

Root cause: spot_data_weekly.sh bootstraps python3.12 explicitly and uses $REMOTE_PYTHON to invoke weekly_collector.py, but the downstream RAG bash script calls bare python. Amazon Linux 2023 has no python symlink by default (only python3, python3.12 after our dnf install), so the RAG step crashes immediately after a successful DataPhase1.

Two-part fix:

  1. rag/pipelines/run_weekly_ingestion.sh:

    • Resolve PYTHON_BIN at top of script: honor caller's export, else fall back through python3 → python3.12 → python → hard-fail.
    • Replace all 6 bare python -m ... invocations with $PYTHON_BIN -m.
    • Remove the duplicate inline default that was only set before the final completion-email python call.
  2. infrastructure/spot_data_weekly.sh:

    • Export PYTHON_BIN=$REMOTE_PYTHON via ENV_SOURCE so the RAG heredoc inherits the interpreter we actually bootstrapped (python3.12, not the python3 default which is 3.9 on AL2023.0).

Validated: DataPhase1 completed successfully at 35.5 min tonight (polygon cache fix working). Only RAG failed — this is the fix.

Tests: 89/89 green.

…` symlink)

Saturday 2026-04-17 Step Function failed at RAG step 0 preflight:
  rag/pipelines/run_weekly_ingestion.sh: line 56: python: command not found

Root cause: spot_data_weekly.sh bootstraps python3.12 explicitly and uses
$REMOTE_PYTHON to invoke weekly_collector.py, but the downstream RAG bash
script calls bare `python`. Amazon Linux 2023 has no `python` symlink by
default (only python3, python3.12 after our dnf install), so the RAG step
crashes immediately after a successful DataPhase1.

Two-part fix:

1. rag/pipelines/run_weekly_ingestion.sh:
   - Resolve PYTHON_BIN at top of script: honor caller's export, else
     fall back through python3 → python3.12 → python → hard-fail.
   - Replace all 6 bare `python -m ...` invocations with $PYTHON_BIN -m.
   - Remove the duplicate inline default that was only set before the
     final completion-email python call.

2. infrastructure/spot_data_weekly.sh:
   - Export PYTHON_BIN=$REMOTE_PYTHON via ENV_SOURCE so the RAG heredoc
     inherits the interpreter we actually bootstrapped (python3.12, not
     the python3 default which is 3.9 on AL2023.0).

Validated: DataPhase1 completed successfully at 35.5 min tonight (polygon
cache fix working). Only RAG failed — this is the fix.

Tests: 89/89 green.
@cipher813 cipher813 merged commit 73d3584 into main Apr 18, 2026
1 check passed
@cipher813 cipher813 deleted the fix/rag-python-bin-on-spot branch April 18, 2026 03:25
cipher813 added a commit that referenced this pull request May 24, 2026
…c (lib v0.27.0) (#297)

Bumps alpha-engine-lib pin v0.26.0 → v0.27.0 and migrates every
data-module freshness-check site to use the new chokepoint helpers
``alpha_engine_lib.dates.{trading_days_stale, is_fresh_in_trading_days,
expected_last_close}`` introduced in lib #59.

Sites migrated (5 in data-repo Python code + matching test fixtures):

  - validators/postflight.py::_check_macro_spy_fresh — the gate that
    blocked the 2026-05-24 Sunday SF recovery. Threshold flips from
    `_MACRO_SPY_MAX_STALE_DAYS=1 (calendar)` to
    `_MACRO_SPY_MAX_STALE_TRADING_DAYS=0` via is_fresh_in_trading_days;
    error message updated to "is N trading-day(s) behind …".

  - validators/postflight.py::_check_universe_sample_fresh — relative-
    to-SPY ticker staleness now uses trading_days_stale. Constant
    renamed `_UNIVERSE_MAX_STALE_VS_SPY_DAYS` →
    `_UNIVERSE_MAX_STALE_VS_SPY_TRADING_DAYS`.

  - preflight.py::DataPreflight (daily mode) — bypasses lib's calendar-
    day check_arcticdb_fresh helper and calls a new
    `_check_macro_spy_fresh_trading_days(max_stale=1)` method that
    delegates to the lib chokepoint. max_stale=1 tolerates polygon's
    T+1 publish latency (yesterday's close may not be in arctic at
    preflight time on a weekday morning).

  - sf_preflight.py::_check_prune_safety (2 sites) — both per-ticker
    "days_stale" calculations now use trading_days_stale. Threshold
    flipped from 5 calendar days to 3 trading days (≈ equivalent
    semantic; the calendar threshold absorbed a weekend buffer that
    trading-day arithmetic handles natively).

  - builders/daily_append.py::_scan_universe_and_emit_freshness_receipt —
    `UNIVERSE_FRESHNESS_MAX_STALE_DAYS` renamed
    `UNIVERSE_FRESHNESS_MAX_STALE_TRADING_DAYS` (5 → 3). Receipt JSON
    schema updated: `max_stale_days_threshold` →
    `max_stale_trading_days_threshold`, `stalest_age_days` →
    `stalest_age_trading_days`. Downstream consumers reading the
    receipt under universe_freshness.json must update their field
    names (predictor / executor / backtester preflights — separate PRs).

NOT migrated (deliberately — different semantic):

  - collectors/prices.py::_find_stale_fast — checks S3 LastModified
    (wall-clock write timestamp), not data-freshness. Asks "have we
    re-written this parquet recently?", correctly calendar-based.
    weekly_collector.py:208 `staleness_threshold_days=3` retained.

Test suite (1432 → 1440 passing):
  - test_postflight.py: 2 new cases (Sunday redrive with Friday macro,
    Memorial-Day-Monday redrive with Friday macro)
  - test_daily_append_universe_freshness.py: threshold + receipt-field
    references updated; stalest_symbol assertion relaxed since
    trading-day arithmetic collapses adjacent calendar offsets to the
    same trading-day bucket on weekends (test no longer flaps by
    day-of-run).

Per [[feedback_lift_invariants_to_chokepoint_after_second_recurrence]]
+ [[feedback_sota_institutional_default_no_shortcuts]] — closes 5 of
the 7 data/predictor/research calendar-day freshness sites surfaced by
the 2026-05-24 audit. Predictor + research migrations follow in
parallel PRs against the same lib v0.27.0 tag.

Lib helper `check_arcticdb_fresh` remains calendar-day; tracked for
retirement in a separate lib follow-up PR once all consumers are off it.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813 added a commit that referenced this pull request May 24, 2026
…ctions (#298)

AST-walk regression pin: every production function whose name matches
fresh|stale|preflight|postflight must not contain ``.days`` calendar
arithmetic. Closes the cross-repo defect class surfaced by the
2026-05-24 Sunday SF recovery: calendar-day gates trip on every
post-Saturday redrive even when data carries the most recent NYSE close.

Escape hatch: inline ``# noqa: trading-day`` marker on the same line
documents calendar-day correctness at that specific call site.

Explicit allowlist: ``collectors/prices._find_stale_fast`` checks S3
LastModified timestamp (write-recency, not data-freshness) and is
correctly calendar-day; allowlist verified by a second test that
ensures the named function actually exists in the named file.

Pin passes on current clean state (all 5 freshness sites migrated to
``alpha_engine_lib.dates.{trading_days_stale, is_fresh_in_trading_days}``
in PR #297). Would catch any future PR that adds calendar-day
arithmetic to a freshness-named function.

Composes with the lib v0.27.0 chokepoint (lib #59) + the cross-repo
migration arc (predictor #191, research #222).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant