feat(wave3): price_cache_read_prefixes helper + sf_preflight reader migration + sector_map.json write-both gap fix (ROADMAP L1401)#272
Merged
Conversation
…igration + sector_map.json write-both gap fix (ROADMAP L1401) Wave-3 reader-side follow-on to producer write-both PR1 (alpha-engine-data#270, shipped 2026-05-19). Adds the read-side companion helper, migrates the simplest reader site (sf_preflight), and patches a PR1 gap: sector_map.json was only being written to `data/` + `predictor/price_cache/` — readers that hit `reference/price_cache/sector_map.json` first (e.g. alpha-engine-backtester#230) would see a stale snapshot after PR4 deletes legacy. Added - `builders/_price_cache_writeboth.price_cache_read_prefixes()` — companion to `price_cache_write_prefixes`. Read order = WRITE order REVERSED: `[reference/, predictor/]` so the new prefix is consulted first, legacy is the soak-window fallback. PR4 cutover edits the helper + the write helper in lockstep. - Re-exported alongside the existing helper. Reader migrated - `sf_preflight.py` backfill-source-freshness check — iterates the two prefixes via the helper instead of hardcoding the legacy key. Net behavior change: a missing legacy key alone no longer fails the check (post-PR4 state); only when BOTH prefixes are unreadable does the check report `fail`. Producer-side gap fix - `collectors/constituents.py` — sector_map.json now written to 3 keys: `data/` + `predictor/price_cache/` + (NEW) `reference/price_cache/`. PR1 #270's write-both helper scoped only the ticker-parquet writes (yfinance / FRED / chronic-gap), so the separately-emitted sector_map.json from this collector wasn't reaching the new prefix on an ongoing basis. The one-shot backfill 2026-05-19 22:13Z made `reference/price_cache/sector_map.json` fresh as of that moment; without this fix it would have gone stale after the next weekly producer run. Sites NOT migrated in this PR (audit/scoping) - `builders/backfill.py` (`_load_full_cache` — LIST + bulk read) - `builders/daily_append.py` (`PRICE_CACHE_PREFIX` constant + many per-ticker GETs) - `collectors/slim_cache.py` (default-arg propagation) - More invasive; warrant a focused follow-up PR with the helper already in place. Each site should iterate `price_cache_read_prefixes()` and break on first hit. Audit miss caught - `collectors/constituents.py:114` initially read as a Wave-3 reader site (per the L1401 list) but is actually a WRITER site — the L1401 entry was mislabeled. The catch surfaced the missing 3rd write target above; net better state than the original scope. Tests (+4 new, suite 1387 → 1391 green) - `test_read_helper_default_returns_new_first_legacy_second` — pins new-first read order. - `test_read_helper_explicit_legacy_returns_new_first_legacy_second` — explicit-legacy is identical to default (production semantic). - `test_read_helper_custom_prefix_returns_single` — test/override opt-out matches the write-side semantic. - `test_sector_map_writes_to_all_three_paths` — pins the 3-key write contract on `constituents.collect()`; byte-equal bodies. Composes with - alpha-engine-data#270 (producer write-both, this PR's prerequisite). - alpha-engine#197 (IAM ARN add for `reference/price_cache/`). - alpha-engine-backtester#230 (Wave-3 backtester reader, sibling PR). - Wave-3 PR4 cutover (drops the fallback branch in both helpers; the one-line edit at that time). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ROADMAP: L1401 —
predictor/S3 namespace rationalization Wave 3 — data-repo half of the PR3+ reader migrations + producer-side sector_map.json gap fix.Three things in one PR
1. Read-side helper (new)
builders/_price_cache_writeboth.price_cache_read_prefixes()— companion to the existingprice_cache_write_prefixes. Read order is WRITE order REVERSED:[reference/, predictor/]so the new prefix is consulted first, legacy is the soak-window fallback. PR4 cutover edits both helpers in one lockstep one-line change each.2.
sf_preflightbackfill-source-freshness checkMigrated to iterate the helper instead of hardcoding the legacy key. Behavior change: a missing legacy key alone no longer fails the check (post-PR4 state); only when BOTH prefixes are unreadable does it report
fail.3.
collectors/constituents.py— sector_map.json write-both gap fix (caught by audit)sector_map.jsonis now written to 3 keys:data/+predictor/price_cache/+ (NEW)reference/price_cache/. PR1 #270's write-both helper scoped only the ticker-parquet writes (yfinance / FRED / chronic-gap); the separately-emitted sector_map.json from this collector wasn't reaching the new prefix on an ongoing basis. The 2026-05-19 22:13Z one-shot backfill made it fresh at that moment — without this fix it would have gone stale after the next weekly producer run, and the read-new-first contract (already shipped in alpha-engine-backtester#230) would silently serve stale data.Sites NOT migrated in this PR (deferred to a focused follow-up)
The remaining 3 reader sites are more invasive and warrant their own PR now that the helper is in place:
builders/backfill.py(_load_full_cache)builders/daily_append.pyPRICE_CACHE_PREFIXconstant threaded through many per-ticker GETs; mechanical but widecollectors/slim_cache.pyEach remaining site should iterate
price_cache_read_prefixes()and break on first hit (mirrorssf_preflighthere).Audit miss caught
collectors/constituents.py:114was initially listed as a Wave-3 reader site (per the L1401 "data ×5" list) but is actually a writer site — the ROADMAP entry was mislabeled. Catching this surfaced the missing 3rd write target (the producer-side fix in §3 above). Net result: better than the original scope — the read-new-first contract in alpha-engine-backtester#230 is now safe.Tests (+4 new, suite 1387 → 1391 green)
test_read_helper_default_returns_new_first_legacy_second— pins read order.test_read_helper_explicit_legacy_returns_new_first_legacy_second— explicit-legacy default semantic.test_read_helper_custom_prefix_returns_single— test/override opt-out (mirrors write-side).test_sector_map_writes_to_all_three_paths— locks the 3-key write contract with byte-equal bodies.Composes with
reference/price_cache/).🤖 Generated with Claude Code