daily_append: use update() instead of append() (dedup at source) by cipher813 · Pull Request #35 · cipher813/alpha-engine-data

cipher813 · 2026-04-15T18:08:36Z

Summary

Root cause fix for the 2026-04-15 predictor retrain outage: 904/909 tickers in the production universe library had duplicate date rows when read back from ArcticDB.

The workaround landed in alpha-engine-predictor (#26 — defensive dedup in the training loader, mirroring inference's long-standing defensive dedup at `inference/stages/load_prices.py:403`). This PR fixes the accumulation at the write site so both defenses become unnecessary going forward.

Change: swap three `lib.append(symbol, today_row)` calls for `lib.update(symbol, today_row)` in `builders/daily_append.py` (universe, macro key, macro sym-path for sector ETFs). `append()` adds rows without dedup; `update()` replaces any existing rows whose dates overlap with the input, which is idempotent under re-runs, races, or concurrent pipeline invocations.

Why duplicates accumulated: the read-check at line 195 (`if today_ts in hist.index: skip`) was the only dedup guard. It fails under any of:

Concurrent Saturday + Sunday pipeline invocations (already flagged in ROADMAP under Research: weekend-dated signals)
Retries during partial failures
Read reflecting a cache slightly behind the actual write state

`update()` removes all three failure modes by making the write itself idempotent.

Test plan

Source-level regression test: `tests/test_daily_append_semantics.py` locks the `update()` semantic — a future revert to `append()` on any of the 3 sites fails the test.
Full suite: 43 passed.
Next weekday pipeline run (2026-04-16 Thu): confirm no accumulation in universe_lib. Spot check: `lib.read('AAPL').data.index.has_duplicates` should be False.
After 1-2 full Saturday cycles have cleaned state, remove the defensive dedup in alpha-engine-predictor `data/dataset.py:_load_ticker_parquet` (tracked on ROADMAP).

…rce) Root cause fix for the 2026-04-15 predictor retrain outage where 904/909 tickers in the universe library had duplicate date rows when read back from ArcticDB. That failure was worked around defensively in the predictor loader (alpha-engine-predictor PR #26) and the inference loader has had equivalent defensive dedup for some time (load_prices.py line 403). This PR fixes the accumulation at the write site. Change: swap `lib.append(symbol, today_row)` for `lib.update(symbol, today_row)` at the three daily write sites in builders/daily_append.py: - universe_lib.update(ticker, today_row) (line 251) - macro_lib.update(key, new_row) (line 269) - macro_lib.update(sym, new_row) (line 286) append() adds rows without dedup — if daily_append runs twice for the same date (race, retry, concurrent Saturday+Sunday pipelines), rows accumulate. update() is idempotent: ArcticDB replaces any existing rows whose dates overlap with the input DataFrame, so a re-run with the same or updated row produces at most one row per date regardless of invocation count. The read-check at line 195 (if today_ts in hist.index: skip) stays — it's an efficiency guard that avoids the write entirely when the row already exists. update() is the safety net when that check misses. tests/test_daily_append_semantics.py — source-level regression guards against a future revert to append() on any of the three sites. Follow-up: once this has been in place for 1-2 full Saturday cycles, remove the defensive dedup in alpha-engine-predictor data/dataset.py (`_load_ticker_parquet`). Track on ROADMAP under Data Platform. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cipher813 merged commit 0fbd6d5 into main Apr 15, 2026
1 check passed

cipher813 deleted the fix/dedup-arcticdb-writes-at-source branch April 15, 2026 18:10

cipher813 mentioned this pull request Apr 15, 2026

daily_append: hard-fail on missing macro keys + verify writes landed #37

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

daily_append: use update() instead of append() (dedup at source)#35

daily_append: use update() instead of append() (dedup at source)#35
cipher813 merged 1 commit into
mainfrom
fix/dedup-arcticdb-writes-at-source

cipher813 commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cipher813 commented Apr 15, 2026

Summary

Test plan

Related

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant