fix(backfill): unified short-history path + scope macro rewrite to full-universe runs#85
Merged
cipher813 merged 1 commit intoApr 22, 2026
Conversation
…ll-universe runs Two ROADMAP P1 fixes landing together because they both live in builders/backfill.py and share regression-test scaffolding. 1. **Unified short-history path.** Dropped the `if ticker in tickers_with_features: <feature path> else: <OHLCV-only path>` fork. Post-PR-#78 `compute_features` returns rows with NaN for features whose rolling-window warmup exceeds available history, so the fork is unnecessary — every ticker writes the full OHLCV+FEATURE schema. Left in: an `n_short_history_in_scope` observability counter and a per-ticker `partial-features` log line when the last row has NaN features. The fork was a time-bomb for next Saturday's weekly backfill: it would regress PR #79's schema migration by writing stripped-column frames that daily_append's `lib.update()` then rejects with schema mismatch. 2. **Macro writes gated by --ticker.** A per-ticker backfill (`--ticker X`) now skips the macro library rewrite by default. The parquet price cache's macro series may be stale relative to what daily_append has been appending; rewriting from parquet during a per-ticker patch silently regresses SPY/VIX/XL* last_date. The 2026-04-22 SOLS patch knocked macro back from 4/20 to 4/17 by exactly this path and broke the predictor's macro-freshness preflight. Operators who genuinely want the macro rewrite during a ticker-scoped run can pass `--rebuild-macro` (explicit opt-in). Default full-universe backfill still rewrites macro as before. 8 regression tests added — source-text invariants + functional tests covering the three macro-scoping modes and the short-history schema contract. 151/151 full suite passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two ROADMAP P1 fixes to
builders/backfill.py:Unified short-history path. The
if ticker in tickers_with_features: <features> else: <OHLCV-only>fork is removed. Every ticker now goes throughcompute_features, which returns partial-NaN rows for features whose warmup exceeds available history (PR fix(features): per-feature graceful degrade, no whole-row dropna #78 contract). The fork would otherwise regress PR feat(data): one-shot migration for OHLCV-only ArcticDB symbols #79's schema migration on next Saturday's weekly backfill by writing stripped-column frames thatdaily_append.update()rejects.Macro writes gated by
--ticker. A per-ticker backfill now skips the macro library rewrite by default. New--rebuild-macroflag is the explicit opt-in for operators who actually want macro rewritten during a ticker-scoped run. Default full-universe backfill still rewrites macro as before.Why
The SOLS patch on 2026-04-22 ran
--ticker SOLS, which ran backfill's full side-effect macro rewrite from parquet. Parquet macro was stale (4/17), so this regressed ArcticDB macro SPY/VIX/XL* from 4/20 → 4/17 and broke the predictor's macro-freshness preflight. Same-shaped gap as the 2026-04-20 SOLS / Q universe-library gap that just got cleaned up.Next Saturday's weekly backfill run would also regress PR #79's schema migration — the OHLCV-only fork writes column sets that
daily_append.update()rejects with schema mismatch (same class as the PR #76/#77/#79 chain we just finished).Observability
Per-ticker
partial-features ticker=X rows=N nan_last_row=M/total features=[...]INFO log for any ticker whose last row has NaN features. Completion log reportsn_partialin addition ton_ok/n_skip/n_err.Test plan
n_ok_ohlcv_only+tickers_with_features = {+ "write raw OHLCV" all absent;MIN_ROWS_FOR_FEATURESnot present inside the write loop;skip_macro+ticker_filter is not None+rebuild_macropresent;--rebuild-macroCLI flag presentticker_filter="AAPL", rebuild_macro=False→macro_lib.writeNOT calledticker_filter="AAPL", rebuild_macro=True→macro_lib.writeIS calledticker_filter=None→macro_lib.writeIS called (default preserved)dist_from_52w_highcolumn with NaN preserved (unified schema)🤖 Generated with Claude Code