fix(backfill): apply daily_closes delta + regression preflight#130
Merged
Conversation
DataPhase1 step 8 (builders.backfill) silently regressed ArcticDB macro
and universe last_date by 1 day on the 2026-05-02 Saturday SF run. Root
cause: backfill loaded the 10y price cache (which passed the mtime
"current" check and skipped refresh, so its data ended 4/30), computed
features over it, and full-series lib.write() clobbered the 5/1 row
MorningEnrich had appended at 09:18. Postflight rejected the regression
at 09:53; the SF halted at DataPhase1.
Two layers gate this class of incident going forward:
1. Apply daily_closes delta before any compute. backfill now mirrors
features/compute.py::_apply_daily_delta — staging/daily_closes/{date}
parquets are merged on top of the cache so the source captures
MorningEnrich's polygon-T+1 fill (and any other post-cache-refresh
appends). This makes the source as fresh as ArcticDB.
2. Regression preflight against ArcticDB. _assert_no_arctic_regression
reads SPY + a 20-symbol universe sample (matches postflight's
_UNIVERSE_SAMPLE_SIZE) and refuses to run if planned data is older
than what's already in ArcticDB. Hard-fails BEFORE the multi-minute
feature compute with an actionable error message pointing at the
recovery path (force a price-cache refresh).
The 2026-04-22 SOLS regression was patched the same day with a
ticker_filter-only guard (skip_macro path); this PR closes the
full-universe path that the same bug class still ran through.
8 new tests in tests/test_backfill_no_regression.py cover the
preflight (pass / macro-regression / universe-regression / first-write
absent-from-arctic) and the wiring (delta call / preflight call /
ticker_filter skip / dry_run skip). Existing
test_backfill_unified_and_macro_scoping.py mock setups extended with
delta + preflight stubs.
Pre-existing failures in test_flow_doctor_wiring.py
(test_enabled_attaches_flow_doctor_handler,
test_exclude_patterns_plumbed_to_handler) are unrelated — flow-doctor
0.3.0 (pinned by alpha-engine-lib[flow_doctor]>=0.3.0,<0.4.0) doesn't
support the s3 notifier type the production yaml + tests reference.
That's a separate alpha-engine-lib version-coordination issue.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 2, 2026
cipher813
added a commit
that referenced
this pull request
May 2, 2026
…t list (#132) MorningEnrich's missing-from-closes hard-fail was tripping every S&P churn week. Today (2026-05-02): 8 tickers got dropped from the index this past week (ASGN, GTM, HOLX, KMPR, LW, MOH, MTCH, PAYC). They're still in ArcticDB universe (awaiting next prune cycle); they're absent from the new constituents.json Phase 1 wrote at 09:20; MorningEnrich no longer requests them from polygon. The pre-existing check (arctic_universe - closes) saw 8 churn-outs + 4 chronic polygon- coverage gaps = 12 missing > threshold of 5 → SF halt at MorningEnrich. Add an optional ``expected_tickers`` parameter to ``daily_append``. When the caller passes its request list, the check scopes to ``arctic ∩ expected`` instead of the full ArcticDB universe. Tickers absent from the request (S&P churn-out stragglers) are excluded from the alarm and logged at INFO so operators see drift building up between prune cycles. Backward compatible — callers that don't pass it retain the prior whole-universe behavior. Both call sites (``_run_morning_enrich`` and ``_run_daily``) now pass their constituents-derived ticker list. The ticker list was already in scope at both sites. Net effect on the 2026-05-02 SF redrive: missing-from-closes count drops 12 → 4 (only the chronic BF-B/BRK-B/MOG-A/PSTG remain — well under the 5-threshold WARN-only path), MorningEnrich completes, Phase 1 runs, PR #130's backfill regression preflight passes, ArcticDB lands at 5/1, postflight passes, downstream Research/Predictor Training/ Backtester all run. 5 new tests in tests/test_daily_append_missing_from_closes.py (360 total) cover: stragglers excluded, real constituents-gap still raises, caret-prefix stripping, None preserves legacy behavior, straggler-count INFO log fires. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 2, 2026
cipher813
added a commit
that referenced
this pull request
May 2, 2026
* feat(preflight): add sf_preflight.py — Saturday SF dry-rehearsal Predicts whether the Saturday SF would succeed BEFORE launching a spot. Today's recovery cycle (5 SF redrives, ~5 polygon API calls each) burned free-tier quota and operator hours discovering bugs sequentially. This module simulates the critical pre-Phase-1 path against real S3 + ArcticDB state and reports per-step pass/fail in ~30s with 1 polygon call total. Eight independent checks, mapped to today's incident stack: PR #130 (backfill regression) → check_backfill_source_freshness PR #131 (polygon coverage flake) → check_polygon_grouped_coverage PR #132 (missing-from-closes scoping) → check_predicted_missing_from_closes PR #133 (freshness scan scoping) → check_universe_sample_freshness PR #134 (workflow ordering) → check_universe_drift PR #135 (return shape) → check_constituents_fetch Postflight contracts → check_postflight_contracts ArcticDB reachability → check_arctic_connectivity Each check is a pure function taking a PreflightContext, returning a CheckResult. The orchestrator runs them all (catching per-check exceptions so one fail doesn't abort the suite) and emits human or JSON output. Exit code 1 on any failure. Two macOS-specific design notes: 1. ArcticDB libs are initialized once in check_arctic_connectivity and reused across downstream checks via the context — re-initializing adb.Arctic() crashes Aws::S3::S3Client::S3Client on macOS. 2. Checks are ordered with arctic_connectivity FIRST so its bundled AWS SDK loads before boto3 (which gets pulled in by collectors imports). Polygon check skips gracefully (WARN, not FAIL) when POLYGON_API_KEY is unset — supports laptop-side preflight where the .env isn't loaded. On the spot the key is present and the check fires. 18 tests in tests/test_sf_preflight.py — happy path + each failure mode each check is designed to catch + orchestrator isolation. 394 tests total. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(sf_preflight): set POLYGON_API_KEY in polygon-coverage tests CI runs without POLYGON_API_KEY in env, so the no-key skip-to-WARN guard short-circuited the 3 polygon-coverage tests before they reached the mocked client. Set the env var via monkeypatch so the guard passes through to the polygon mock. Also add explicit test for the no-key path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
builders.backfill) silently regressed ArcticDB macro+universe last_date by 1d on the 2026-05-02 Saturday SF run. Postflight caught it; SF halted.lib.write()clobbered the 5/1 row MorningEnrich had appended at 09:18.staging/daily_closes/{date}.parqueton top of the cache before any compute (mirrorsfeatures/compute.py::_apply_daily_delta), and (2) defense-in-depth_assert_no_arctic_regressionpreflight that hard-fails before compute if planned data is older than what's already in ArcticDB.The 2026-04-22 SOLS regression was patched same-day with a
ticker_filter-onlyskip_macroguard; this PR closes the full-universe path that the same bug class still ran through.Test plan
tests/test_backfill_no_regression.py— 8 new tests cover the preflight (pass / macro-regression / universe-regression / first-write absent-from-arctic) and the wiring (delta call / preflight call / ticker_filter skip / dry_run skip)tests/test_backfill_unified_and_macro_scoping.py— existing 9 tests still pass after extending mock setups with delta + preflight stubstests/test_daily_append_backfill_safe.py— 5 existing tests unaffected🤖 Generated with Claude Code