Skip to content

feat(data): one-shot migration for OHLCV-only ArcticDB symbols#79

Merged
cipher813 merged 1 commit into
mainfrom
fix/promote-ohlcv-only-schema
Apr 21, 2026
Merged

feat(data): one-shot migration for OHLCV-only ArcticDB symbols#79
cipher813 merged 1 commit into
mainfrom
fix/promote-ohlcv-only-schema

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

Adds `builders/promote_ohlcv_only_schema.py` — scans every universe symbol and rewrites any whose stored schema lacks FEATURE columns to the full OHLCV + FEATURE schema.

Why

After PR #76 (short-history first-class) + PR #78 (per-feature graceful degrade), some symbols landed in a transitional state: PR #76 wrote them as OHLCV-only before PR #78 merged. PR #78 now writes the full FEATURE schema on every `daily_append` pass, and ArcticDB `update()` enforces schema match — so these transitional symbols fail with a column-mismatch and their today row never lands. The 2026-04-21 post-#78 `daily_append` reported `n_err=2`.

One-shot migration via `lib.write()` (authoritative for schema) rewrites each affected symbol with the canonical full-schema frame (NaN for features that can't be computed from available history).

How it works

  1. `list_symbols()` on the universe library
  2. For each, read the stored frame; flag as candidate if `any(f not in df.columns for f in FEATURES)`
  3. For each candidate, recompute features against its stored OHLCV (using macro series from the macro library, same inputs `daily_append` uses)
  4. `lib.write(ticker, out)` to replace the symbol — `write()` replaces; `update()` couldn't widen columns
  5. Partial-feature coverage per ticker logged as `partial-features ticker=X nan=N/total features=[...]` (matches `daily_append`'s convention)

Idempotent: symbols already at full schema are skipped (detector returns False). Safe to re-run.

Deployment

One-shot on `ae-trading`:

```bash

Plan review

python -m builders.promote_ohlcv_only_schema --dry-run

Apply

python -m builders.promote_ohlcv_only_schema

Verify — next daily_append should show n_err=0 for these symbols

python -m builders.daily_append --date 2026-04-21
```

Test plan

  • `pytest tests/test_promote_ohlcv_only_schema.py -v` — 6/6 pass
  • Full repo suite: `pytest tests/` — 119/119 pass
  • Syntax check on the migration module
  • Post-merge: run `--dry-run` on ae-trading to identify affected symbols, confirm the expected ~2 candidates
  • Post-merge: apply + re-run daily_append for 2026-04-21; expect `n_err=0` for the previously-mismatched symbols

🤖 Generated with Claude Code

Adds builders/promote_ohlcv_only_schema.py — scans every symbol in the
universe library and rewrites any whose stored schema lacks feature
columns to the full OHLCV + FEATURE schema.

Context: PR #76 (short-history first-class) persisted some symbols as
OHLCV-only. PR #78 (per-feature graceful degrade) now writes full
schema on every daily_append pass. ArcticDB update() enforces schema
match, so the transitional OHLCV-only symbols fail today's daily_append
with n_err and their row never lands. 2026-04-21 post-#78 run reported
n_err=2.

The migration reads each candidate's OHLCV history, runs
compute_features (partial-feature semantics per PR #78), and calls
lib.write() to replace the symbol. write() is authoritative for schema;
update() is incremental and cannot widen columns.

One-shot, idempotent — symbols already at full schema are skipped.
Supports --dry-run for plan review and --ticker X for targeted retries.

Regression tests lock:
- write() (not update()) for the rewrite
- exhaustive FEATURE-column detection (no heuristic subsets)
- explicit error reason on empty compute_features (no silent skip)
- --dry-run guards the write() call
- partial-features structured log matches daily_append convention

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 0d1a246 into main Apr 21, 2026
1 check passed
@cipher813 cipher813 deleted the fix/promote-ohlcv-only-schema branch April 21, 2026 22:15
cipher813 added a commit that referenced this pull request Apr 21, 2026
PR #79's promote_ohlcv_only_schema.py passed None to
_load_sector_map / _load_cached_fundamentals / _load_cached_alternative.
Those loaders log a warning and return empty dicts when the client is
None. compute_features then silently defaults ~15 features (every
sector-relative, fundamental, and alternative feature) to 0 instead
of their real values. Migration would have claimed success while
persisting zeroed features for every candidate.

2026-04-21 dry-run caught it: Q and SOLS would have promoted to 55/56
"ok" features each, but several of those "ok" values were 0-defaults
from empty sector_map / fundamentals / alt_data rather than real
computations.

Fix: construct boto3.client("s3") once and pass it through. The
loaders' existing behavior (log warning, return empty) was designed
as a defensive fallback for the live weekly pipeline, not a silent
escape hatch for offline scripts. The migration must always receive
real data or hard-fail.

Regression test forbids passing None to any supporting-data loader and
requires the boto3.client call.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813 added a commit that referenced this pull request Apr 22, 2026
…ll-universe runs (#85)

Two ROADMAP P1 fixes landing together because they both live in
builders/backfill.py and share regression-test scaffolding.

1. **Unified short-history path.** Dropped the `if ticker in
   tickers_with_features: <feature path> else: <OHLCV-only path>` fork.
   Post-PR-#78 `compute_features` returns rows with NaN for features
   whose rolling-window warmup exceeds available history, so the fork
   is unnecessary — every ticker writes the full OHLCV+FEATURE schema.
   Left in: an `n_short_history_in_scope` observability counter and a
   per-ticker `partial-features` log line when the last row has NaN
   features. The fork was a time-bomb for next Saturday's weekly
   backfill: it would regress PR #79's schema migration by writing
   stripped-column frames that daily_append's `lib.update()` then
   rejects with schema mismatch.

2. **Macro writes gated by --ticker.** A per-ticker backfill
   (`--ticker X`) now skips the macro library rewrite by default. The
   parquet price cache's macro series may be stale relative to what
   daily_append has been appending; rewriting from parquet during a
   per-ticker patch silently regresses SPY/VIX/XL* last_date. The
   2026-04-22 SOLS patch knocked macro back from 4/20 to 4/17 by
   exactly this path and broke the predictor's macro-freshness
   preflight. Operators who genuinely want the macro rewrite during a
   ticker-scoped run can pass `--rebuild-macro` (explicit opt-in).
   Default full-universe backfill still rewrites macro as before.

8 regression tests added — source-text invariants + functional tests
covering the three macro-scoping modes and the short-history schema
contract. 151/151 full suite passes.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant