Skip to content

fix(daily_append): scope missing-from-closes check to caller's request list#132

Merged
cipher813 merged 1 commit into
mainfrom
fix/daily-append-constituents-aware-threshold
May 2, 2026
Merged

fix(daily_append): scope missing-from-closes check to caller's request list#132
cipher813 merged 1 commit into
mainfrom
fix/daily-append-constituents-aware-threshold

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

  • MorningEnrich's missing-from-closes hard-fail was tripping every S&P churn week. Today (2026-05-02): 8 tickers got dropped from the index this past week (ASGN, GTM, HOLX, KMPR, LW, MOH, MTCH, PAYC). Still in ArcticDB universe (awaiting prune cycle); absent from the new constituents.json Phase 1 wrote at 09:20; MorningEnrich no longer requests them from polygon. Check saw 8 churn-outs + 4 chronic polygon gaps = 12 missing > threshold 5 → SF halt.
  • Fix adds optional expected_tickers to daily_append. When caller passes its request list, the check scopes to arctic ∩ expected instead of the full ArcticDB universe. Stragglers excluded from the alarm but logged at INFO so operators see drift building up between prune cycles.
  • Backward compatible: callers not passing expected_tickers retain the prior whole-universe behavior.
  • Both call sites (_run_morning_enrich, _run_daily) updated — ticker list was already in scope at both.

Net effect on 2026-05-02 redrive: missing-from-closes count drops 12 → 4 (only chronic BF-B/BRK-B/MOG-A/PSTG remain — well under the 5-threshold WARN-only path).

Test plan

  • 5 new tests in tests/test_daily_append_missing_from_closes.py cover: stragglers excluded; real constituents-gap still raises; caret-prefix stripping; None preserves legacy behavior; straggler-count INFO log fires
  • 13 existing tests still pass (whole-universe legacy path)
  • Full suite green: 360 passed
  • Live verification via Saturday SF redrive (after merge): MorningEnrich completes (4 missing < threshold) → Phase 1 → PR fix(backfill): apply daily_closes delta + regression preflight #130's backfill preflight passes → ArcticDB at 5/1 → postflight passes → Research / Predictor Training / Backtester run

🤖 Generated with Claude Code

…t list

MorningEnrich's missing-from-closes hard-fail was tripping every S&P
churn week. Today (2026-05-02): 8 tickers got dropped from the index
this past week (ASGN, GTM, HOLX, KMPR, LW, MOH, MTCH, PAYC). They're
still in ArcticDB universe (awaiting next prune cycle); they're absent
from the new constituents.json Phase 1 wrote at 09:20; MorningEnrich
no longer requests them from polygon. The pre-existing check
(arctic_universe - closes) saw 8 churn-outs + 4 chronic polygon-
coverage gaps = 12 missing > threshold of 5 → SF halt at MorningEnrich.

Add an optional ``expected_tickers`` parameter to ``daily_append``. When
the caller passes its request list, the check scopes to
``arctic ∩ expected`` instead of the full ArcticDB universe. Tickers
absent from the request (S&P churn-out stragglers) are excluded from
the alarm and logged at INFO so operators see drift building up between
prune cycles. Backward compatible — callers that don't pass it retain
the prior whole-universe behavior.

Both call sites (``_run_morning_enrich`` and ``_run_daily``) now pass
their constituents-derived ticker list. The ticker list was already
in scope at both sites.

Net effect on the 2026-05-02 SF redrive: missing-from-closes count
drops 12 → 4 (only the chronic BF-B/BRK-B/MOG-A/PSTG remain — well
under the 5-threshold WARN-only path), MorningEnrich completes, Phase 1
runs, PR #130's backfill regression preflight passes, ArcticDB lands
at 5/1, postflight passes, downstream Research/Predictor Training/
Backtester all run.

5 new tests in tests/test_daily_append_missing_from_closes.py (360
total) cover: stragglers excluded, real constituents-gap still raises,
caret-prefix stripping, None preserves legacy behavior, straggler-count
INFO log fires.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 3bccf98 into main May 2, 2026
1 check passed
@cipher813 cipher813 deleted the fix/daily-append-constituents-aware-threshold branch May 2, 2026 14:03
cipher813 added a commit that referenced this pull request May 2, 2026
#133)

Companion to PR #132. The pre-write missing-from-closes check correctly
excluded 8 S&P churn-out stragglers (ASGN, GTM, HOLX, KMPR, LW, MOH,
MTCH, PAYC) on today's redrive, daily writes completed cleanly
(n_ok=898), and then ``_scan_universe_and_emit_freshness_receipt`` —
the post-write scan that audits every ArcticDB universe symbol's
last-row date — re-tripped on the same 8 stragglers (HOLX 25d stale,
the rest 8d stale) and halted the SF a 4th time.

Plumb ``expected_tickers`` through to the freshness scan with the same
semantics as the pre-write check: scope to ``arctic ∩ expected``,
exclude stragglers, log them at INFO so operators see drift building
up between prune cycles. A genuinely-stale symbol that IS in
expected_tickers still raises (silent-fail rule preserved). Empty
intersection raises loudly so a misconfigured caller can't silently
emit a meaningless all-fresh receipt. Backward compatible:
expected_tickers=None preserves the prior whole-library scan.

The call site (only one — ``daily_append`` line 1013) passes its
own expected_tickers parameter through. No new wiring needed in
weekly_collector.py.

6 new tests in tests/test_daily_append_universe_freshness.py cover:
stragglers excluded; stale-in-expected still raises; INFO log fires;
caret-prefix stripping; None preserves legacy; empty-intersection
raises loudly. 366 tests total.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813 added a commit that referenced this pull request May 2, 2026
…rite (#134)

Architectural fix for the 2026-05-02 SF-halt class. PR #132 + PR #133
shipped earlier today scope the missing-from-closes + freshness checks
to the caller's expected_tickers list, gracefully tolerating S&P
churn-out stragglers that linger in ArcticDB awaiting the post-Phase-1
prune cycle. Those bandages are sound but they're now the load-bearing
path for every MorningEnrich invocation.

Reorder MorningEnrich to make the universe coherent BEFORE any check
fires:

1. constituents.collect() — pulls fresh S&P 500/400 from Wikipedia,
   writes the new constituents.json + sector_map.json. Hard-fails the
   step on any error (Wikipedia outage / sector-mapping completeness)
   per feedback_no_silent_fails — daily_closes can't proceed against
   stale tickers.

2. prune_delisted_tickers(constituents_override=fresh_set, absent_days=5,
   apply=True) — drops ArcticDB stragglers absent from the fresh
   constituents and ≥5 days stale (matching the freshness scan
   threshold). Best-effort: a prune failure logs ERROR and lands a
   ``prune_preflight_warning`` entry on the result, but does NOT
   block the rest of MorningEnrich (PR #132/#133 still tolerate
   stragglers as fallback).

3. Existing daily_closes + daily_append flow runs against a coherent
   universe. The bandage scoping in PR #132/#133 becomes a quiet no-op
   for the happy path.

The new ``constituents_override`` parameter on prune_delisted_tickers
swaps the freshness reference without updating the public
``latest_weekly.json`` pointer (which has cross-module read fan-out:
alternative.py, macro.py, features/compute.py all depend on it).
Mutually exclusive with ``tickers_override``.

prune_delisted_tickers also runs at its existing post-Phase-1 site
with the conservative 14d default — caught any newcomers the SF
picked up between MorningEnrich and Phase 1.

9 new tests:
- 5 in tests/test_weekly_collector_morning_enrich.py: refreshes
  constituents before collect / prunes before daily_append /
  aborts on constituents failure / continues on prune failure /
  dry-run skips preflight writes
- 4 in tests/test_prune_delisted_tickers.py: constituents_override
  uses in-process set / accepts list-or-set / still gates on
  last_date / mutually-exclusive with tickers_override

369 tests total.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813 added a commit that referenced this pull request May 2, 2026
* feat(preflight): add sf_preflight.py — Saturday SF dry-rehearsal

Predicts whether the Saturday SF would succeed BEFORE launching a spot.
Today's recovery cycle (5 SF redrives, ~5 polygon API calls each) burned
free-tier quota and operator hours discovering bugs sequentially. This
module simulates the critical pre-Phase-1 path against real S3 + ArcticDB
state and reports per-step pass/fail in ~30s with 1 polygon call total.

Eight independent checks, mapped to today's incident stack:

  PR #130 (backfill regression)         → check_backfill_source_freshness
  PR #131 (polygon coverage flake)      → check_polygon_grouped_coverage
  PR #132 (missing-from-closes scoping) → check_predicted_missing_from_closes
  PR #133 (freshness scan scoping)      → check_universe_sample_freshness
  PR #134 (workflow ordering)           → check_universe_drift
  PR #135 (return shape)                → check_constituents_fetch
  Postflight contracts                  → check_postflight_contracts
  ArcticDB reachability                 → check_arctic_connectivity

Each check is a pure function taking a PreflightContext, returning a
CheckResult. The orchestrator runs them all (catching per-check
exceptions so one fail doesn't abort the suite) and emits human or
JSON output. Exit code 1 on any failure.

Two macOS-specific design notes:

1. ArcticDB libs are initialized once in check_arctic_connectivity and
   reused across downstream checks via the context — re-initializing
   adb.Arctic() crashes Aws::S3::S3Client::S3Client on macOS.
2. Checks are ordered with arctic_connectivity FIRST so its bundled AWS
   SDK loads before boto3 (which gets pulled in by collectors imports).

Polygon check skips gracefully (WARN, not FAIL) when POLYGON_API_KEY
is unset — supports laptop-side preflight where the .env isn't loaded.
On the spot the key is present and the check fires.

18 tests in tests/test_sf_preflight.py — happy path + each failure mode
each check is designed to catch + orchestrator isolation.

394 tests total.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(sf_preflight): set POLYGON_API_KEY in polygon-coverage tests

CI runs without POLYGON_API_KEY in env, so the no-key skip-to-WARN
guard short-circuited the 3 polygon-coverage tests before they
reached the mocked client. Set the env var via monkeypatch so the
guard passes through to the polygon mock. Also add explicit test
for the no-key path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant