Skip to content

feat(morning-enrich): chronic-polygon-gap self-heal via yfinance backfill#193

Merged
cipher813 merged 2 commits into
mainfrom
feat/chronic-polygon-gaps-self-heal
May 9, 2026
Merged

feat(morning-enrich): chronic-polygon-gap self-heal via yfinance backfill#193
cipher813 merged 2 commits into
mainfrom
feat/chronic-polygon-gaps-self-heal

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

Root-cause fix for the 2026-05-09 weekly-SF DataPhase1 postflight failure. After alpha-engine-data PR #192 fixed the arcticdb backfill regression, postflight failed at PSTG = 3d stale vs SPY (>2d threshold) — exposing a deeper problem: polygon does not reliably serve 4 known chronic-gap tickers (BF-B / BRK-B / MOG-A / PSTG), so they accumulate multi-day rot whenever the EOD yfinance pass also drops a day.

The current design conflates two orthogonal concerns into a binary source=polygon_only:

  1. Source preference (polygon authoritative > yfinance fallback)
  2. Coverage expectation (which tickers polygon supplies vs doesn't)

This PR separates them: coverage expectation moves to config (chronic_polygon_gaps.tickers), and a new self-heal step in MorningEnrich yfinance-backfills any ArcticDB row gap for chronic-gap tickers. Tickers absent from the allowlist still hard-fail polygon_only collection — preserving the strict "no silent fails" default for the ~900 healthy tickers.

Companion config update in alpha-engine-config (private repo) at data/config.yaml.

This is PR A of a 3-PR root-cause arc:

  • PR A (this) — chronic-gap config + self-heal in MorningEnrich
  • PR B — provenance source column on ArcticDB OHLCV rows so postflight + downstream consumers can distinguish polygon vs yfinance origin per row
  • PR C — postflight uniformity (drop ad-hoc allowlists; rely on freshness only) + drift alarm for chronic-gap tickers that start getting polygon coverage (signals the allowlist entry should be pruned)

Test plan

  • 8 new tests in tests/test_chronic_polygon_gap_self_heal.py cover the config loader (sorted/missing/malformed) + self-heal helper (already-fresh skip, yfinance + parquet patch + backfill invocation, dry-run, per-ticker error isolation, empty-list no-op)
  • pytest full suite (582 passed, 1 skipped — vs 574 on origin/main)
  • After merge, fresh Saturday SF execution: MorningEnrich self-heals PSTG to 5/8, daily_append succeeds, postflight passes 20/20 tickers fresh
  • Verify CW logs show chronic-gap self-heal: 1 healed, 3 already-fresh, 0 errors (or similar) on the rerun
  • Confirm subsequent weekday SF MorningEnrich logs show all 4 chronic tickers already_fresh (no work after the initial heal)

Followups (separate PRs)

  • PR B: provenance source column on ArcticDB rows (additive schema)
  • PR C: postflight uniformity + chronic-gap drift alarm

🤖 Generated with Claude Code

cipher813 and others added 2 commits May 9, 2026 07:16
…fill

Closes the 2026-05-09 weekly-SF DataPhase1 postflight failure. PSTG
ended at 5/5 in ArcticDB while SPY was at 5/8 (3d stale, > 2d threshold);
the other 3 chronic polygon gaps (BF-B / BRK-B / MOG-A) ended at 5/6
(2d, just under threshold today — could fail tomorrow). Polygon does
not reliably serve these 4 tickers (class B/A share dot-vs-dash naming
on 3 of them, intermittent coverage on PSTG since ~2026-04), so
MorningEnrich's polygon_only daily_append leaves them at whatever the
prior EOD yfinance pass landed; on days when EOD also dropped the
ticker the gap compounds and postflight catches them as stale.

This adds a `_self_heal_chronic_polygon_gaps` step that runs after
daily_append in MorningEnrich. For each ticker in
`config.chronic_polygon_gaps.tickers`:
  - Read ArcticDB universe last_date.
  - If last_date >= target_date, skip (idempotent).
  - Else yfinance-fetch [last_date+1, target_date], patch
    `predictor/price_cache/{ticker}.parquet` with the new rows
    (dedupe by date keep="last"), and invoke
    `builders.backfill(ticker_filter=ticker)` so the ArcticDB write
    goes through the same per-ticker compute_features path as every
    other ticker.

Best-effort by design: a yfinance hiccup on one chronic ticker logs
the error but doesn't halt MorningEnrich. Postflight remains the
load-bearing gate on freshness — if a ticker is still stale after
this step, postflight surfaces it.

Tickers absent from `chronic_polygon_gaps.tickers` still hard-fail
polygon_only collection when missing, preserving the strict
"no silent fails" default for the ~900 healthy tickers.

Companion config update lives in alpha-engine-config (private repo)
under `data/config.yaml`. The example here documents the schema for
future operators.

Tests cover: config loader (sorted keys, missing/malformed
permissive), already-fresh skip, yfinance fetch + parquet patch +
backfill invocation, dry-run no-side-effect, per-ticker error
isolation, empty-list no-op.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `git -C alpha-engine-config pull --ff-only origin main` to the
SF DataPhase1 task command list, alongside the existing alpha-engine-data
pull. Without this, a config change merged to alpha-engine-config
(e.g. adding/removing a chronic_polygon_gaps ticker) doesn't reach the
trading instance until something else triggers an external pull —
masking the change while polluting downstream behavior.

Triggered by the chronic_polygon_gaps allowlist landing in
alpha-engine-config #88: the dispatcher's local clone was 2 days
behind origin/main during the 2026-05-09 weekly-SF DataPhase1
recovery, and the new config section never reached weekly_collector
until I SSM-ran a manual pull. Closing the loop here so future config
changes are SF-pulled automatically.

Scope: DataPhase1 only. Other states (RAGIngestion, Predictor,
Backtester, Dashboard) don't read alpha-engine-config directly today;
adding the pull universally is a separate decision (cheap +30ms each
but expands the trust surface).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 945a0ba into main May 9, 2026
1 check passed
@cipher813 cipher813 deleted the feat/chronic-polygon-gaps-self-heal branch May 9, 2026 14:22
cipher813 added a commit that referenced this pull request May 9, 2026
…ha-engine-config (#194)

Closes the same staleness vector PR #193 closed for DataPhase1: the SF
PredictorTraining task pulls alpha-engine-predictor on every run but
relies on the dispatcher's local
``alpha-engine-predictor/config/predictor.yaml`` for training config.
That file is gitignored in the predictor repo and must be staged from
the alpha-engine-config sibling clone — but nothing in the SF flow was
keeping the staged copy in lockstep with origin/main of alpha-engine-config.

The 2026-05-09 horizon migration (alpha-engine-config #90: forward_days
5 → 21, output_distribution_gate_blocking false → true, purge_days bump)
would not have reached the next Saturday training without a manual
SSM-side intervention to copy the config from alpha-engine-config to
alpha-engine-predictor.

Adds two commands before the spot_train.sh invocation:
  - ``git -C alpha-engine-config pull --ff-only origin main``
  - ``cp alpha-engine-config/predictor/predictor.yaml alpha-engine-predictor/config/predictor.yaml``

Now any merged config change in alpha-engine-config reaches the next
PredictorTraining cycle automatically. Mirrors the symmetric DataPhase1
fix from PR #193.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813 added a commit that referenced this pull request May 9, 2026
…195)

Pairs with the chronic-gap self-heal step from PR #193. The
chronic_polygon_gaps allowlist was added because polygon doesn't
reliably serve BF-B / BRK-B / MOG-A / PSTG. If polygon coverage
RECOVERS for any of these — polygon adds a Berkshire B share class
CIK or fixes a flaky data feed — the allowlist entry becomes silent
operational debt: yfinance fallback would still happen even though
polygon now has the data.

Adds `_detect_chronic_gap_polygon_recovery` step in MorningEnrich,
runs BEFORE the self-heal so the signal is a clean read of what
polygon shipped today (not contaminated by our yfinance backfill).
Reads `staging/daily_closes/{date}.parquet` written by
`daily_closes.collect(source="polygon_only")` and checks each chronic
ticker for membership. Emits CW gauge
`AlphaEngine/Data/chronic_gap_polygon_recovery_count` (always — gauge
of 0 anchors the alarm baseline; CW missing-data is harder to alarm
on than a steady 0).

Operator action when count > 0 across multiple cycles: prune the
allowlist entry from alpha-engine-config predictor.yaml.

Best-effort by design — read errors / metric emit errors log a
warning but never raise. MorningEnrich is not blocked by drift
detection; postflight remains the load-bearing freshness gate.

Tests: 5 new in test_chronic_gap_drift_detection.py
  - no recovery → metric emits 0 + absent_as_expected list
  - partial recovery (BRK-B + PSTG covered) → recovery list + metric=2
  - parquet read failure → status=skipped, no raise
  - empty chronic list → noop, no S3 read, no metric
  - CW emit failure → swallowed, result still records counts

Closes the chronic_polygon_gaps loop: self-heal (PR #193) backfills
the gap; drift detection (this PR) flags when the gap closes upstream.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813 added a commit that referenced this pull request May 10, 2026
…econciliation (#199)

PR 1 of the windowed-data-reconciliation arc (plan doc:
alpha-engine-docs/private/windowed-data-reconciliation-260510.md).

**Origin:** 2026-05-09 evaluator email "121 tickers with >5d gap"
diagnostic. Investigation revealed 110 of the 121 are matrix-pivot
artifacts — front-of-history mismatches and tail drift from the
chronic-polygon-gap self-heal (data #193) widening the gap between
the 5 chronically-stale tickers (which heal daily) and the rest of
the universe (which only refresh on weekly DataPhase1).

**This PR's scope:** structural orchestration only.
- New ``window_days: int = 1`` parameter on ``collect()``. Default 1
  preserves all single-date legacy behavior — no consumer call sites
  change shape.
- New ``_previous_business_days(run_date, n)`` helper enumerating
  ``n`` BDays ending at ``run_date`` (inclusive), newest first.
  Saturday/Sunday run_date normalizes to the prior Friday so a Sat SF
  firing at 02:00 PT doesn't burn a slot on a non-trading day.
- New ``_collect_window()`` helper that iterates oldest → newest,
  calling the existing ``collect(window_days=1)`` per date so all the
  fetch / coverage-gate / write logic reuses unchanged. Per-date
  failures don't kill the rest of the window — the aggregate's
  ``status`` flips to ``"partial"`` and successful dates still write.

**Polygon free-tier rate-limit contract:** one ``grouped-daily`` call
per date in the window, total ``window_days`` polygon calls — the
only way to honor 14/day at the free tier. Test
``test_polygon_only_window_makes_one_grouped_daily_per_date`` pins
this invariant for the production default ``window_days=14``.

**Out of scope (later PRs in the arc):**
- PR 2: per-cell skip-if-canonical optimization on the yfinance side
  (cells where ``source ∈ {"yfinance", "polygon"}`` skip the yfinance
  refetch, keeping yfinance batch cost near zero in steady state).
- PR 3: SF wiring + ``window_days=14`` config knob.
- PR 4: simulator gap-warning metric refactor reading the ``source``
  column.
- PR 5: ``chronic_polygon_gaps`` allowlist deprecation.

+14 tests pinning the legacy-parity contract (``window_days=1``
produces byte-identical single-date result shape), the orchestration
contract (window mode fans out to N per-date calls oldest first),
the polygon rate-limit invariant, and the per-date-failure
non-blocking behavior.

Suite: 633 passed (was 619; +14 new).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant