feat(data): one-shot migration for OHLCV-only ArcticDB symbols#79
Merged
Conversation
Adds builders/promote_ohlcv_only_schema.py — scans every symbol in the universe library and rewrites any whose stored schema lacks feature columns to the full OHLCV + FEATURE schema. Context: PR #76 (short-history first-class) persisted some symbols as OHLCV-only. PR #78 (per-feature graceful degrade) now writes full schema on every daily_append pass. ArcticDB update() enforces schema match, so the transitional OHLCV-only symbols fail today's daily_append with n_err and their row never lands. 2026-04-21 post-#78 run reported n_err=2. The migration reads each candidate's OHLCV history, runs compute_features (partial-feature semantics per PR #78), and calls lib.write() to replace the symbol. write() is authoritative for schema; update() is incremental and cannot widen columns. One-shot, idempotent — symbols already at full schema are skipped. Supports --dry-run for plan review and --ticker X for targeted retries. Regression tests lock: - write() (not update()) for the rewrite - exhaustive FEATURE-column detection (no heuristic subsets) - explicit error reason on empty compute_features (no silent skip) - --dry-run guards the write() call - partial-features structured log matches daily_append convention Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
cipher813
added a commit
that referenced
this pull request
Apr 21, 2026
PR #79's promote_ohlcv_only_schema.py passed None to _load_sector_map / _load_cached_fundamentals / _load_cached_alternative. Those loaders log a warning and return empty dicts when the client is None. compute_features then silently defaults ~15 features (every sector-relative, fundamental, and alternative feature) to 0 instead of their real values. Migration would have claimed success while persisting zeroed features for every candidate. 2026-04-21 dry-run caught it: Q and SOLS would have promoted to 55/56 "ok" features each, but several of those "ok" values were 0-defaults from empty sector_map / fundamentals / alt_data rather than real computations. Fix: construct boto3.client("s3") once and pass it through. The loaders' existing behavior (log warning, return empty) was designed as a defensive fallback for the live weekly pipeline, not a silent escape hatch for offline scripts. The migration must always receive real data or hard-fail. Regression test forbids passing None to any supporting-data loader and requires the boto3.client call. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 tasks
cipher813
added a commit
that referenced
this pull request
Apr 22, 2026
…ll-universe runs (#85) Two ROADMAP P1 fixes landing together because they both live in builders/backfill.py and share regression-test scaffolding. 1. **Unified short-history path.** Dropped the `if ticker in tickers_with_features: <feature path> else: <OHLCV-only path>` fork. Post-PR-#78 `compute_features` returns rows with NaN for features whose rolling-window warmup exceeds available history, so the fork is unnecessary — every ticker writes the full OHLCV+FEATURE schema. Left in: an `n_short_history_in_scope` observability counter and a per-ticker `partial-features` log line when the last row has NaN features. The fork was a time-bomb for next Saturday's weekly backfill: it would regress PR #79's schema migration by writing stripped-column frames that daily_append's `lib.update()` then rejects with schema mismatch. 2. **Macro writes gated by --ticker.** A per-ticker backfill (`--ticker X`) now skips the macro library rewrite by default. The parquet price cache's macro series may be stale relative to what daily_append has been appending; rewriting from parquet during a per-ticker patch silently regresses SPY/VIX/XL* last_date. The 2026-04-22 SOLS patch knocked macro back from 4/20 to 4/17 by exactly this path and broke the predictor's macro-freshness preflight. Operators who genuinely want the macro rewrite during a ticker-scoped run can pass `--rebuild-macro` (explicit opt-in). Default full-universe backfill still rewrites macro as before. 8 regression tests added — source-text invariants + functional tests covering the three macro-scoping modes and the short-history schema contract. 151/151 full suite passes. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds `builders/promote_ohlcv_only_schema.py` — scans every universe symbol and rewrites any whose stored schema lacks FEATURE columns to the full OHLCV + FEATURE schema.
Why
After PR #76 (short-history first-class) + PR #78 (per-feature graceful degrade), some symbols landed in a transitional state: PR #76 wrote them as OHLCV-only before PR #78 merged. PR #78 now writes the full FEATURE schema on every `daily_append` pass, and ArcticDB `update()` enforces schema match — so these transitional symbols fail with a column-mismatch and their today row never lands. The 2026-04-21 post-#78 `daily_append` reported `n_err=2`.
One-shot migration via `lib.write()` (authoritative for schema) rewrites each affected symbol with the canonical full-schema frame (NaN for features that can't be computed from available history).
How it works
Idempotent: symbols already at full schema are skipped (detector returns False). Safe to re-run.
Deployment
One-shot on `ae-trading`:
```bash
Plan review
python -m builders.promote_ohlcv_only_schema --dry-run
Apply
python -m builders.promote_ohlcv_only_schema
Verify — next daily_append should show n_err=0 for these symbols
python -m builders.daily_append --date 2026-04-21
```
Test plan
🤖 Generated with Claude Code