fix(backfill): use min(valid_dates) so single fresh ticker can't suppress delta load#192
Merged
Merged
Conversation
…ress delta load
`_apply_daily_delta` used `max(valid_dates)` to compute `slim_last_date`, so
when `prices.collect` flagged a single ticker as stale and refreshed it via
yfinance to the prior trading day, that one ticker's date became
`slim_last_date`. On a Saturday SF run that turned `bdate_range(slim_last_date+1,
today)` into an empty range — the loader returned `{}`, every other cache
parquet stayed stuck at its older date, and the backfill regression preflight
rejected the write because planned (5/6) < existing-in-ArcticDB (5/8) across
SPY/VIX/XL*/sampled-universe.
Switched to `min(valid_dates)` so the delta load always covers the oldest
ticker; overlapping rows from any freshly-refreshed ticker dedupe via the
existing `keep="last"` step. Updated the preflight error message so the
recommended recovery actually addresses the underlying cause. New regression
test pins both the multi-mtime case and the all-tickers-current early-return.
Origin: 2026-05-09 weekly SF DataPhase1 PARTIAL — VEEV got refreshed to 5/8,
every other parquet still ended at 5/6, max picked 5/8, today=5/9 Sat,
empty bdate_range, 38 symbols flagged for regression.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_apply_daily_deltausedmax(valid_dates)to pickslim_last_date, so a single freshly-refreshed ticker (VEEV, refreshed byprices.collect's mtime check to 5/8 while every other parquet ended at 5/6) became the lookup anchor. On a Saturday run that turnedbdate_range(slim_last_date+1, today)into an empty range — the delta loader returned{}, every other cache parquet stayed at 5/6, and the backfill regression preflight rejected the write because planned 5/6 < existing-in-ArcticDB 5/8 across 38 symbols (SPY/VIX/XL*/sampled-universe).min(valid_dates)so the load always covers the oldest ticker; the existingkeep="last"dedupe in the combine step handles the overlap with any freshly-refreshed ticker.builders/backfill.pyso the recovery hint actually addresses the underlying cause (the legacy text blamed the cache mtime check, which was working as designed — the bug was downstream in delta loading).tests/test_apply_daily_delta_min_last_date.pypins the multi-mtime case + the all-tickers-current early-return.Test plan
pytest tests/test_apply_daily_delta_min_last_date.py -v(2 new tests pass)pytestfull suite (574 passed, 1 skipped — same as origin/main)ce500327-cf08-6731-5d44-882f5b380a30_0ae54554-5564-587a-5947-5cb99bf1130fand verify DataPhase1 → RAGIngestion → Research → ... chain completes_apply_daily_deltalog line shows non-empty delta range on next Sat SF firing🤖 Generated with Claude Code