fix(backfill): apply daily_closes delta + regression preflight by cipher813 · Pull Request #130 · cipher813/alpha-engine-data

cipher813 · 2026-05-02T13:14:57Z

Summary

DataPhase1 step 8 (builders.backfill) silently regressed ArcticDB macro+universe last_date by 1d on the 2026-05-02 Saturday SF run. Postflight caught it; SF halted.
Root cause: backfill loaded the 10y price cache (which passed the mtime "current" check today and skipped its weekly refresh, so internal data ended 4/30), computed features, and full-series lib.write() clobbered the 5/1 row MorningEnrich had appended at 09:18.
Fix adds two layers: (1) merge staging/daily_closes/{date}.parquet on top of the cache before any compute (mirrors features/compute.py::_apply_daily_delta), and (2) defense-in-depth _assert_no_arctic_regression preflight that hard-fails before compute if planned data is older than what's already in ArcticDB.

The 2026-04-22 SOLS regression was patched same-day with a ticker_filter-only skip_macro guard; this PR closes the full-universe path that the same bug class still ran through.

Test plan

tests/test_backfill_no_regression.py — 8 new tests cover the preflight (pass / macro-regression / universe-regression / first-write absent-from-arctic) and the wiring (delta call / preflight call / ticker_filter skip / dry_run skip)
tests/test_backfill_unified_and_macro_scoping.py — existing 9 tests still pass after extending mock setups with delta + preflight stubs
tests/test_daily_append_backfill_safe.py — 5 existing tests unaffected
Full suite green in CI (344 passed)
Live verification via Saturday SF redrive (after merge): MorningEnrich appends 5/1 → backfill applies delta + preflight passes → ArcticDB lands at 5/1 → postflight passes

🤖 Generated with Claude Code

DataPhase1 step 8 (builders.backfill) silently regressed ArcticDB macro and universe last_date by 1 day on the 2026-05-02 Saturday SF run. Root cause: backfill loaded the 10y price cache (which passed the mtime "current" check and skipped refresh, so its data ended 4/30), computed features over it, and full-series lib.write() clobbered the 5/1 row MorningEnrich had appended at 09:18. Postflight rejected the regression at 09:53; the SF halted at DataPhase1. Two layers gate this class of incident going forward: 1. Apply daily_closes delta before any compute. backfill now mirrors features/compute.py::_apply_daily_delta — staging/daily_closes/{date} parquets are merged on top of the cache so the source captures MorningEnrich's polygon-T+1 fill (and any other post-cache-refresh appends). This makes the source as fresh as ArcticDB. 2. Regression preflight against ArcticDB. _assert_no_arctic_regression reads SPY + a 20-symbol universe sample (matches postflight's _UNIVERSE_SAMPLE_SIZE) and refuses to run if planned data is older than what's already in ArcticDB. Hard-fails BEFORE the multi-minute feature compute with an actionable error message pointing at the recovery path (force a price-cache refresh). The 2026-04-22 SOLS regression was patched the same day with a ticker_filter-only guard (skip_macro path); this PR closes the full-universe path that the same bug class still ran through. 8 new tests in tests/test_backfill_no_regression.py cover the preflight (pass / macro-regression / universe-regression / first-write absent-from-arctic) and the wiring (delta call / preflight call / ticker_filter skip / dry_run skip). Existing test_backfill_unified_and_macro_scoping.py mock setups extended with delta + preflight stubs. Pre-existing failures in test_flow_doctor_wiring.py (test_enabled_attaches_flow_doctor_handler, test_exclude_patterns_plumbed_to_handler) are unrelated — flow-doctor 0.3.0 (pinned by alpha-engine-lib[flow_doctor]>=0.3.0,<0.4.0) doesn't support the s3 notifier type the production yaml + tests reference. That's a separate alpha-engine-lib version-coordination issue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…t list (#132) MorningEnrich's missing-from-closes hard-fail was tripping every S&P churn week. Today (2026-05-02): 8 tickers got dropped from the index this past week (ASGN, GTM, HOLX, KMPR, LW, MOH, MTCH, PAYC). They're still in ArcticDB universe (awaiting next prune cycle); they're absent from the new constituents.json Phase 1 wrote at 09:20; MorningEnrich no longer requests them from polygon. The pre-existing check (arctic_universe - closes) saw 8 churn-outs + 4 chronic polygon- coverage gaps = 12 missing > threshold of 5 → SF halt at MorningEnrich. Add an optional ``expected_tickers`` parameter to ``daily_append``. When the caller passes its request list, the check scopes to ``arctic ∩ expected`` instead of the full ArcticDB universe. Tickers absent from the request (S&P churn-out stragglers) are excluded from the alarm and logged at INFO so operators see drift building up between prune cycles. Backward compatible — callers that don't pass it retain the prior whole-universe behavior. Both call sites (``_run_morning_enrich`` and ``_run_daily``) now pass their constituents-derived ticker list. The ticker list was already in scope at both sites. Net effect on the 2026-05-02 SF redrive: missing-from-closes count drops 12 → 4 (only the chronic BF-B/BRK-B/MOG-A/PSTG remain — well under the 5-threshold WARN-only path), MorningEnrich completes, Phase 1 runs, PR #130's backfill regression preflight passes, ArcticDB lands at 5/1, postflight passes, downstream Research/Predictor Training/ Backtester all run. 5 new tests in tests/test_daily_append_missing_from_closes.py (360 total) cover: stragglers excluded, real constituents-gap still raises, caret-prefix stripping, None preserves legacy behavior, straggler-count INFO log fires. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(preflight): add sf_preflight.py — Saturday SF dry-rehearsal Predicts whether the Saturday SF would succeed BEFORE launching a spot. Today's recovery cycle (5 SF redrives, ~5 polygon API calls each) burned free-tier quota and operator hours discovering bugs sequentially. This module simulates the critical pre-Phase-1 path against real S3 + ArcticDB state and reports per-step pass/fail in ~30s with 1 polygon call total. Eight independent checks, mapped to today's incident stack: PR #130 (backfill regression) → check_backfill_source_freshness PR #131 (polygon coverage flake) → check_polygon_grouped_coverage PR #132 (missing-from-closes scoping) → check_predicted_missing_from_closes PR #133 (freshness scan scoping) → check_universe_sample_freshness PR #134 (workflow ordering) → check_universe_drift PR #135 (return shape) → check_constituents_fetch Postflight contracts → check_postflight_contracts ArcticDB reachability → check_arctic_connectivity Each check is a pure function taking a PreflightContext, returning a CheckResult. The orchestrator runs them all (catching per-check exceptions so one fail doesn't abort the suite) and emits human or JSON output. Exit code 1 on any failure. Two macOS-specific design notes: 1. ArcticDB libs are initialized once in check_arctic_connectivity and reused across downstream checks via the context — re-initializing adb.Arctic() crashes Aws::S3::S3Client::S3Client on macOS. 2. Checks are ordered with arctic_connectivity FIRST so its bundled AWS SDK loads before boto3 (which gets pulled in by collectors imports). Polygon check skips gracefully (WARN, not FAIL) when POLYGON_API_KEY is unset — supports laptop-side preflight where the .env isn't loaded. On the spot the key is present and the check fires. 18 tests in tests/test_sf_preflight.py — happy path + each failure mode each check is designed to catch + orchestrator isolation. 394 tests total. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(sf_preflight): set POLYGON_API_KEY in polygon-coverage tests CI runs without POLYGON_API_KEY in env, so the no-key skip-to-WARN guard short-circuited the 3 polygon-coverage tests before they reached the mocked client. Set the env var via monkeypatch so the guard passes through to the polygon mock. Also add explicit test for the no-key path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cipher813 merged commit 608b72d into main May 2, 2026
1 check passed

cipher813 deleted the fix/backfill-no-clobber-on-stale-source branch May 2, 2026 13:22

This was referenced May 2, 2026

fix(polygon): per-ticker fallback for grouped-daily coverage gaps #131

Merged

fix(daily_append): scope missing-from-closes check to caller's request list #132

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(backfill): apply daily_closes delta + regression preflight#130

fix(backfill): apply daily_closes delta + regression preflight#130
cipher813 merged 1 commit into
mainfrom
fix/backfill-no-clobber-on-stale-source

cipher813 commented May 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cipher813 commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cipher813 commented May 2, 2026 •

edited

Loading