Skip to content

fix(constituents): persist sector_map in local cache for outage resilience#208

Merged
cipher813 merged 3 commits into
mainfrom
fix/constituents-cache-includes-sector-map
May 11, 2026
Merged

fix(constituents): persist sector_map in local cache for outage resilience#208
cipher813 merged 3 commits into
mainfrom
fix/constituents-cache-includes-sector-map

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

  • Local cache (data/constituents_cache.csv) now persists three columns: ticker, gics_sector, sector_etf (previously: ticker only).
  • On Wikipedia outage, _load_from_cache reconstructs a fully-populated sector_map + sector_etf_map so collect()'s coverage check passes instead of raising "Sector mapping incomplete".
  • Backwards-compatible reader: legacy ticker-only caches return empty sector dicts (same as today's behavior) and get rewritten to the new schema on the next successful fetch.
  • Gitignore data/constituents_cache.csv — test runs write to the real cache path so the file otherwise shows up as untracked test pollution.

Why

Defense-in-depth for the 2026-05-11 cascade. Stacked on top of #207 (which fixes the underlying Wikipedia table-selection bug); together they ensure:

Stacked on #207

This PR's diff is on top of #207. If #207 merges first, this rebases cleanly.

Test plan

  • pytest tests/test_constituents_sector_map.py — 13/13 pass (4 new cache tests)
  • Full suite: pytest tests/ — 720 passed, 1 skipped
  • Legacy ticker-only cache schema verified backwards-compatible (test test_cache_fallback_handles_legacy_ticker_only_schema)
  • Post-merge: confirm next successful weekly run rewrites cache with new schema on EC2

🤖 Generated with Claude Code

cipher813 and others added 2 commits May 11, 2026 06:45
S&P 400 Wikipedia page inserted a disambiguation-warning banner table
at position 0 on/around 2026-05-11. `tables[0]` then returned a 1-row
2-column banner (columns: [0, 1]) instead of the 400-row constituents
table at position 1. `_fetch_constituents` raised "GICS sector column
missing", fell through to the local cache (which only stores ticker
symbols, no sector mapping), and `constituents.collect()` then raised
"Sector mapping incomplete: 903 of 903 tickers missing GICS sector".

This silently aborted MorningEnrich on the 2026-05-11 weekday SF:
`weekly_collector` exited 1, but `python ... 2>&1 | tee ...` with no
`set -o pipefail` masked the exit code, SSM reported Success, the SF
moved on, and the morning planner aborted minutes later with
"daily_data: 46h stale".

Replace position-based selection with `_select_constituents_table`,
which scans every table on the page and picks the largest one that has
both a Symbol/Ticker column AND a GICS Sector (non-sub-industry) column.
Raises explicitly if no match — Wikipedia layout drift surfaces loudly
instead of silently selecting the wrong table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ience

The local CSV cache previously stored only ticker symbols. When Wikipedia
fetch raised (e.g. 2026-05-11: S&P 400 page schema drift before PR #207),
the fallback returned (cached_tickers, {}, {}, 0, 0) — empty sector maps.
`collect()`'s coverage check then raised "Sector mapping incomplete:
903 of 903 tickers missing GICS sector" and `weekly_collector` exited 1.

Cache now persists three columns: ticker, gics_sector, sector_etf. On a
successful Wikipedia fetch the full mapping is written; on the next
Wikipedia outage the fallback returns a fully-populated sector_map and
the coverage check passes.

Reader (`_load_from_cache`) tolerates the legacy ticker-only schema —
existing EC2 caches return empty sector dicts (same behavior as before),
and the next successful Wikipedia fetch upgrades the cache to the new
schema in place.

Also gitignore `data/constituents_cache.csv` — test runs of
`_fetch_constituents` write to the real cache path, so the file
otherwise lands in `git status` as untracked test pollution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 6c95a04 into main May 11, 2026
1 check passed
@cipher813 cipher813 deleted the fix/constituents-cache-includes-sector-map branch May 11, 2026 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant