feat(daily_closes): skip_if_canonical optimization for windowed yfinance pass (PR 2/5)#200
Merged
Merged
Conversation
…nce pass
PR 2 of the windowed-data-reconciliation arc (plan doc:
alpha-engine-docs/private/windowed-data-reconciliation-260510.md).
Builds on PR 1's structural window orchestration.
**What this adds**
New ``skip_if_canonical: bool = False`` parameter on ``collect()``.
When True (set by ``_collect_window`` automatically per the
windowed-arc design):
- ``yfinance_only`` / ``auto`` modes: read the existing parquet,
identify "canonical" rows (``source ∈ {"yfinance", "polygon"}`` AND
non-null ``Close``), skip yfinance fetch for those tickers, and
merge the preserved canonical rows into the output parquet. Net
effect: steady-state yfinance batch cost stays near zero across the
14-day window because most cells are already populated by prior
passes.
- ``polygon_only`` mode: flag is *ignored* per Brian's 2026-05-10
option (a) — polygon always re-overwrites within the window so
corporate-action backfills (where polygon's adjusted close shifts
retroactively) are picked up. ``grouped-daily`` call rate stays at
one per date in the window regardless, honoring the 14/day free-tier
contract.
- Legacy post-close-skip short-circuit is bypassed when
``skip_if_canonical=True`` because the whole point of windowed
reconciliation is to look INSIDE the existing parquet for NaN cells
in older window dates that the legacy "skip if file exists post-close"
semantic would otherwise skip.
- If reading the existing parquet fails (corrupt, network), fall back
to legacy refetch+overwrite for the date — don't take down the whole
window because of one unreadable parquet.
**Source-precedence-ladder semantics**
``NaN < "yfinance" < "polygon"``. Each pass writes only "below itself":
- yfinance pass skips cells where source ∈ {yfinance, polygon} —
yfinance never demotes polygon, never re-fetches its own work.
- polygon pass overwrites cells where source ∈ {NULL, "yfinance"} —
polygon canonicalizes ahead of yfinance — AND overwrites polygon
cells too (option a, corporate-action handling). Polygon never
introduces NaN: a polygon-empty cell retains whatever was there.
**Out of scope (later PRs in the arc)**
- PR 3: SF wiring + ``window_days=14`` config knob. Today's default
``skip_if_canonical=False`` preserves legacy single-date behavior;
PR 3 flips ``window_days=14`` and the SF callers automatically pick
up the skip optimization.
- PR 4: simulator gap-warning metric refactor reading the ``source``
column.
- PR 5: ``chronic_polygon_gaps`` allowlist deprecation.
**Test coverage**
+9 new tests across two files:
- ``tests/test_daily_closes_skip_if_canonical.py`` (8 tests):
- skips canonical yfinance + canonical polygon tickers
- does not skip NaN-Close tickers (refetch fills the NaN)
- does not skip when ``source`` column is missing (legacy parquet)
- bypasses post-close-skip short-circuit
- polygon_only mode IGNORES the flag (option a contract)
- corrupt parquet falls back to legacy refetch
- default (skip_if_canonical=False) preserves legacy short-circuit
- ``tests/test_daily_closes_window_days.py`` (+1 test): window-mode
call sets skip_if_canonical=True per design.
Suite: 642 passed (was 633 after PR 1; +9 new).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813
added a commit
that referenced
this pull request
May 10, 2026
… both daily_closes call sites (#201) PR 3 of the windowed-data-reconciliation arc (plan doc: alpha-engine-docs/private/windowed-data-reconciliation-260510.md). Builds on PR 1 (#199, window_days orchestration) + PR 2 (#200, skip_if_canonical optimization). **What this adds** Two SF call sites in ``weekly_collector.py`` now pass the windowed- reconciliation knobs through to ``daily_closes.collect``: - **MorningEnrich** (line 961, ``polygon_only``) — reads ``daily_cfg.get("window_days", 1)`` + ``daily_cfg.get( "skip_if_canonical", False)`` and forwards. Polygon ignores the skip flag per option (a) but still benefits from windowed ``grouped-daily`` calls (one per BDay in the window — 14 calls/day when ``window_days=14``, the free-tier rate-limit ceiling). - **EOD pass** (line 1196, ``yfinance_only``) — same config read + forward. With ``skip_if_canonical=true`` the yfinance batch cost stays near zero in steady state because most cells are already canonical from prior pass days. Adds ``window_days: 1`` and ``skip_if_canonical: false`` to ``config.yaml.example`` with documentation on the production-target values (``window_days: 14`` + ``skip_if_canonical: true``) + the staged-rollout protocol from the plan doc. **Default behavior preserved** When the new config keys are absent (current production state), both call sites pass ``window_days=1`` + ``skip_if_canonical=False``, which is byte-identical to legacy single-date behavior. No live behavior change from this PR landing — the cutover happens via a separate alpha-engine-config commit that flips the values once the wiring is observed clean. **Out of scope (later PRs in the arc)** - Cutover commit in alpha-engine-config (the actual ``window_days: 14`` flip) — flag-gated rollout per the plan doc: 1 clean Sat SF + 5 clean weekday SFs at ``window_days=14`` before ``skip_if_canonical: true`` flips. - PR 4: simulator gap-warning metric refactor reading the ``source`` column. - PR 5: ``chronic_polygon_gaps`` allowlist deprecation. **Test coverage** +6 new tests in ``tests/test_weekly_collector_window_days_wiring.py``: - absent config keys default to legacy ``window_days=1`` (both call sites) - configured ``window_days=14`` + ``skip_if_canonical=true`` flow through - string YAML coercion (``"14"`` → ``14``, ``"true"`` → ``True``) - end-to-end roundtrip via ``daily_closes.collect`` mock Suite: 648 passed (was 642 after PR 2; +6 new). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
cipher813
added a commit
that referenced
this pull request
May 12, 2026
…clobber (#219) The windowed-reconciliation cutover (PRs #199/#200/#201 + alpha-engine-config flip to daily_closes:{window_days:14, skip_if_canonical:true}, activated 2026-05-11) amplified a latent bug in _fetch_fred_closes: the FRED query used sort_order=desc + limit=5 with no upper bound, so per-date calls across the rolling window all returned today's most-recent observation. Every historical date's parquet got today's VIX/VIX3M/TNX/IRX/TWO/HYOAS/ BAA10Y stamped on it, clobbering correct historical closes. FlowDoctor surfaced the regression 2026-05-12 ~13:01/13:04 UTC with paired "polygon_only OVERWRITE VIX" ERROR alerts for 2026-04-22 and 2026-04-28, both showing identical pre (18.36) and post (17.19) closes — the signature of "every per-date stamp got today's latest". Fix: - _fetch_fred_closes sends observation_end=date_str so per-date calls return that date's actual FRED observation (or most-recent on-or-before for the same-day case where FRED hasn't published yet — preserves the legacy "today's parquet carries yesterday's FRED close" semantic). - Defensive guard refuses to write a future-dated observation if FRED somehow returns one despite observation_end. Repair tool (collectors/daily_closes_fred_repair.py) re-fetches correct FRED values across an operator-specified window and rewrites only the FRED-ticker rows of each affected daily_closes parquet. Polygon stock rows are untouched (their fetcher was always per-date-correct). Idempotent. Tests: +7 per-date regression tests pinning observation_end + same-day fallback + future-date refusal + missing-value skip; +11 repair tests covering business-day enumeration + on-or-before lookup + idempotent no-op + dry-run + missing-parquet skip. Suite 774 → 792. Operator follow-up: after merge, run python -m collectors.daily_closes_fred_repair \ --bucket alpha-engine-research \ --start 2026-04-22 --end 2026-05-12 [--dry-run] to repair the clobbered window before tomorrow's MorningEnrich (which now writes correct per-date FRED values going forward). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
cipher813
added a commit
that referenced
this pull request
May 20, 2026
…ine_lib.alerts CLI (#277) Per ROADMAP L146 (SOTA / institutional-approach sub-sub-rule). Second inline-alerts site named in the ROADMAP — sibling PR alpha-engine #200 migrates the alpha-engine/infrastructure/health_checker.sh half. infrastructure/lambdas/changelog-incident-mirror/deploy.sh: raw `aws sns publish` → `python -m alpha_engine_lib.alerts publish`. SNS target stays identical (default `alpha-engine-alerts` topic resolution), so the changelog-incident-mirror Lambda still receives the message and the smoke test still verifies end-to-end. `--no-telegram` keeps the deliberate-per-deploy noise off the operator channel; `severity=info` matches the smoke-test semantics. Lib pin v0.20.0 → v0.21.0 in BOTH requirements.txt AND Dockerfile (lockstep, per the test_lib_pin_lockstep regression test). v0.21.0 is the alerts-module floor; v0.20 → v0.21 is additive (just the alerts module). Suite: 1401 passed (vs 1400 baseline — lib v0.21 doesn't break any existing consumer). Closes alpha-engine-data half of ROADMAP L146 (P2). After both PRs merge, `grep -rE "aws sns publish.*alpha-engine-alerts|api.telegram.org/bot" infrastructure/ deploy/ --include="*.sh"` returns zero hits across both alpha-engine and alpha-engine-data. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR 2 of the windowed-data-reconciliation arc. Plan doc:
alpha-engine-docs/private/windowed-data-reconciliation-260510.md. Builds on PR 1's structural window orchestration (#199).Adds
skip_if_canonical: bool = Falseparameter tocollect(). When True (set automatically by_collect_window):source ∈ {"yfinance", "polygon"}AND non-nullClose), skip yfinance fetch for those, merge preserved canonical rows into the output. Net effect: steady-state yfinance batch cost stays near zero across the 14-day window because most cells are already populated by prior passes.grouped-dailycall rate stays at 1 per date.skip_if_canonical=True— the whole point is to fill NaN cells in older window dates that legacy logic would skip.Source-precedence ladder enforced
NaN < "yfinance" < "polygon". Each pass writes only "below itself":Out of scope (later PRs in the arc)
window_days=14config knob (default flag-gated OFF for first cycle's observation).sourcecolumn.chronic_polygon_gapsallowlist deprecation.Test plan
pytest tests/test_daily_closes_skip_if_canonical.py— 8 new tests pinning skip semantics, NaN-Close still refetches, legacy parquet without source column doesn't skip, post-close-skip bypass, polygon_only ignores flag, corrupt-parquet fallback, default-False preserves legacypytest tests/test_daily_closes_window_days.py— +1 test pinning window-mode propagatesskip_if_canonical=True🤖 Generated with Claude Code