Split daily collection by source: yfinance EOD + polygon morning enrichment#90
Merged
Merged
Conversation
…ng empty dict
Prior behavior swallowed 403 responses (notably free-tier "before end of
day" rejections) by logging a warning + returning {"results": [],
"status": "FORBIDDEN"}. The status field was never checked by callers,
so daily_closes.collect silently fell through to its yfinance fallback,
which writes VWAP=None for every stock — producing the 2026-04-17 →
2026-04-23 outage where ArcticDB's VWAP column stayed universally null
across the entire universe despite daily_append running successfully
each weekday.
New behavior raises PolygonForbiddenError with the polygon-supplied
message + the failed path. Callers that want to fall back to a different
source must do so explicitly (see PR 1 follow-up changes to
collectors/daily_closes.py).
Per feedback_no_silent_fails. Tested behavior:
- 403 with standard polygon message raises with the message preserved
- 403 with malformed/non-JSON body still raises (no AttributeError on .json())
- 403 outcomes are not cached — retries re-hit the API
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…uto)
Adds explicit per-mode failure semantics so operational pipelines can no
longer mask polygon outages by silently substituting yfinance.
* yfinance_only — EOD pass. Polygon skipped entirely (free tier 403's
same-day, deferring to morning enrichment is canonical). Hard-fails
if yfinance returns < 95% of stocks.
* polygon_only — morning pass. Polygon required, PolygonForbiddenError
propagates to the SF, NO yfinance fallback for stocks. FRED still
serves the 4 indices polygon never provides. When overwriting an
existing parquet (the yfinance EOD wrote first), per-ticker Close
discrepancy is logged: WARN > 1%, ERROR > 5%, summary at the end.
* auto — legacy chain (polygon → FRED → yfinance) preserved for
backfill scripts. New operational code paths must specify a mode
explicitly.
The skip-on-exists short-circuit is mode-aware:
- yfinance_only / auto keep the post-close guard (re-running yfinance
for an already-collected date is wasteful)
- polygon_only never skips — overwrite is the design intent
10 new tests in tests/test_daily_closes_source_modes.py cover:
- source validation (rejects unknown modes)
- yfinance_only: polygon never called, sub-coverage hard-fails
- polygon_only: PolygonForbiddenError propagates, yfinance never called,
empty polygon response hard-fails, VWAP from polygon lands in parquet,
existing parquet always overwritten, Close discrepancies logged
- auto: legacy silent-fallback chain preserved
Full suite: 144/144 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…h mode
Operational wiring for the split-by-source design from the prior commit.
* `--daily` now passes `source="yfinance_only"` to `daily_closes.collect`.
No more polygon attempt at EOD — same-day polygon free-tier 403's were
silently masked by yfinance for a week (4/17 → 4/23 incident). VWAP
lands as None and gets backfilled by morning enrichment.
* New `--morning-enrich` flag invokes a new `_run_morning_enrich` flow:
- Resolves the previous trading day via `alpha_engine_lib.trading_calendar`
(correctly skips weekends + holidays; 10-day runaway guard)
- Runs `daily_closes.collect(source="polygon_only")` for that date —
hard-fails on `PolygonForbiddenError` instead of silent yfinance
fallback
- Runs `daily_append` for the same date — `universe_lib.update()` is
idempotent for same-date overwrites (existing design intent
documented at daily_append.py:232-242), so the polygon row replaces
the yfinance row in ArcticDB cleanly
- Skips the feature_store snapshot step (already ran at EOD; polygon
delta on OHLCV is typically <1% and per-ticker features get
recomputed inside daily_append against the polygon row)
* `--date` overrides which trading day to enrich (used for backfill of
2026-04-17 → 2026-04-23 once the morning Lambda is deployed).
* builders/daily_append.py: `_load_daily_closes` docstring updated —
the prior text incorrectly claimed VWAP falls back to (H+L+C)/3
proxy. The collector explicitly refuses that proxy per the 2026-04-17
decision; new docstring describes the actual two-source semantics.
Tests (9 new, in tests/test_weekly_collector_morning_enrich.py):
- _previous_trading_day walks back over weekend (Mon → Fri)
- _previous_trading_day walks back over holiday (Christmas test)
- _previous_trading_day strict inequality (today ≠ result)
- Runaway guard raises after 10 calendar days of \!is_trading_day
- --morning-enrich calls daily_closes with polygon_only source
- --morning-enrich propagates PolygonForbiddenError as failed status
- --morning-enrich runs daily_append after polygon succeeds
- --morning-enrich defaults to previous trading day when --date omitted
- --daily routes through yfinance_only source
Full suite: 153/153 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
cipher813
added a commit
that referenced
this pull request
Apr 24, 2026
…r inference) (#91) * SF: add MorningEnrich step + move EOD PostMarketData to ae-trading Operational wiring for the split-by-source design from PR 1 (alpha-engine-data #90). step_function_daily.json (weekday SF, Mon-Fri 6:05 AM PT): Insert MorningEnrich SSM step on ae-trading between the trading-day check and PredictorInference. Runs: python weekly_collector.py --morning-enrich which finds the previous trading day, fetches polygon grouped-daily for it (hard-fails on PolygonForbiddenError — no yfinance fallback), and re-runs daily_append to overwrite the prior day's ArcticDB row with polygon's authoritative OHLCV+VWAP. PredictorInference is gated on this succeeding — failure routes to HandleFailure, not silent inference on uncorrected data (per feedback_no_silent_fails). This closes the operational loop on the 2026-04-17→2026-04-23 silent VWAP outage where the EOD yfinance pass was the only source and ArcticDB's VWAP column stayed universally null across the window. step_function_eod.json (EOD SF, daemon-shutdown trigger): Move PostMarketData from micro to ae-trading (InstanceIds.$ now uses $.trading_instance_id; same change for WaitForPostMarketData polling). Avoids the OOM regression that originally moved DailyData off micro on 2026-04-16. Bumps executionTimeout 180→720 to match observed ~7 min runtime + safety margin (15-min window between daemon shutdown at 1:15 PM PT and EC2 stop at 1:30 PM PT — 8-10 min usage, comfortable margin). Simplified the two-command pattern (--only daily_closes + builders.daily_append) to a single `python weekly_collector.py --daily` since PR 1 unified --daily under source=yfinance_only and the full _run_daily flow now does closes + features + append together. Comment updated: this SF is now the sole canonical EOD path. The alpha-engine-daily-data systemd timer that was racing this SF gets deleted in the paired alpha-engine PR. Validation: - Both SF JSONs parse cleanly (json.load smoke check) - 153/153 unit tests pass - Production validation gated on: 1. Deploy via infrastructure/deploy_step_function_daily.sh + deploy_step_function.sh (or equivalent) 2. Paired alpha-engine PR deletes systemd timer + retargets daemon._trigger_eod_pipeline (or accepts ec2_instance_id field being unused) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Revert step_function_eod.json changes — EOD SF is no longer operationally triggered alpha-engine PR #94 (merged 2026-04-22) removed _trigger_eod_pipeline from executor/daemon.py — the EOD SF (alpha-engine-eod-pipeline) is no longer fired from daemon shutdown. The canonical EOD path is now: * 1:05 PM PT — alpha-engine-daily-data.timer (systemd, ae-trading) runs `python weekly_collector.py --daily` (post-PR-1 = yfinance_only) * 1:20 PM PT — alpha-engine-eod.timer (systemd, ae-trading) runs `python executor/eod_reconcile.py` The EOD SF JSON exists only for manual disaster recovery. Modifying it (moving from micro→trading + simplifying commands) was based on stale context from earlier in this session — the original "EOD SF as canonical path" framing was true a week ago but no longer holds. Reverting keeps the EOD SF unchanged so this PR's scope stays minimal: just the MorningEnrich SSM step in step_function_daily.json. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813
added a commit
that referenced
this pull request
Apr 27, 2026
…ma (#105) Background — 2026-04-27 EOD-email blackout investigation ======================================================== The structural fix in PR #104 decoupled macro/SPY freshness from stock-coverage correctness. Validation today exposed a second, latent issue: with the universe-coverage guard now passing, daily_append's per-stock writes finally execute — and 100% of them fail with an ArcticDB schema-mismatch error. Schema audit (2026-04-27 22:14 UTC) revealed heterogeneous universe state: - 816 symbols (~90%): 64 cols, no VWAP at all - 88 symbols (~10%): 65 cols, VWAP at idx=64 (appended at end) daily_append writes via OHLCV_COLS = [Open, High, Low, Close, Volume, VWAP, ...features], which puts VWAP at idx=5. ArcticDB update() requires column order match — both schema variants fail. Per-stock universe writes have therefore been failing since the polygon-VWAP work landed on 2026-04-24 (PRs #90/#91/#92), masked until today by the macro-coupled universe-coverage guard. Operational design (yfinance EOD → polygon morning) ==================================================== - yfinance EOD post-close hook writes daily_closes parquet with VWAP=NaN (yfinance does not expose true volume-weighted VWAP). - polygon morning enrichment overwrites the parquet with real VWAP values from polygon grouped-daily. - daily_append runs end-of-day and writes whatever VWAP is in the parquet to ArcticDB universe — NaN initially, real values after the morning enrichment re-runs daily_append. For that flow to work, VWAP must be a first-class column in the universe schema with a stable position. This migration normalizes every symbol to the canonical layout: [Open, High, Low, Close, Volume, VWAP] + FEATURES NaN-fills VWAP historically for the 816 symbols that didn't have it. Repositions VWAP for the 88 symbols that had it appended at idx=64. Existing FEATURES block keeps its relative order. Idempotent — symbols already in canonical order are skipped. Per-symbol error isolation — one symbol's write failure does not abort the batch (records into errors[], continues with the rest). Tests ===== - _canonical_column_order: VWAP inserted at idx=5, feature block preserved in relative order, drops nothing. - _is_canonical: recognizes correct layout, rejects appended-VWAP and missing-VWAP variants. - migrate_universe_vwap apply path: - Inserts VWAP at idx=5 with FLOAT64 NaN when absent. - Relocates VWAP from idx=last when appended (preserving values). - Skips already-canonical symbols (idempotent). - Honors --tickers override for canary / subset runs. - Per-symbol error isolation — partial-status return on partial failure. - All 275 existing tests still pass (261 + 14 new). Operational follow-up (not in this PR) ====================================== After merge, deploy + run: python -m builders.migrate_universe_vwap --apply on ae-trading. Expected: 904 symbols migrated (816 + 88), audit JSON written to s3://alpha-engine-research/builders/migrate_universe_vwap_audit/. Then rerun alpha-engine-daily-data.service (per-stock writes succeed) and alpha-engine-eod.service (held-stock close lookups succeed; EOD email + 2026-04-27 eod_pnl row land). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 5, 2026
cipher813
added a commit
that referenced
this pull request
May 9, 2026
…ha-engine-config (#194) Closes the same staleness vector PR #193 closed for DataPhase1: the SF PredictorTraining task pulls alpha-engine-predictor on every run but relies on the dispatcher's local ``alpha-engine-predictor/config/predictor.yaml`` for training config. That file is gitignored in the predictor repo and must be staged from the alpha-engine-config sibling clone — but nothing in the SF flow was keeping the staged copy in lockstep with origin/main of alpha-engine-config. The 2026-05-09 horizon migration (alpha-engine-config #90: forward_days 5 → 21, output_distribution_gate_blocking false → true, purge_days bump) would not have reached the next Saturday training without a manual SSM-side intervention to copy the config from alpha-engine-config to alpha-engine-predictor. Adds two commands before the spot_train.sh invocation: - ``git -C alpha-engine-config pull --ff-only origin main`` - ``cp alpha-engine-config/predictor/predictor.yaml alpha-engine-predictor/config/predictor.yaml`` Now any merged config change in alpha-engine-config reaches the next PredictorTraining cycle automatically. Mirrors the symmetric DataPhase1 fix from PR #193. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Splits the current daily OHLCV collection into two passes by data source, fixing the 2026-04-17 → 2026-04-23 silent VWAP outage where every weekday wrote
VWAP=Nonefor every stock in ArcticDB.Root cause (proven from
/var/log/daily-data.logon ae-trading): polygon free-tier returns 403 "before end of day" when called at 1:05 PM PT, the collector logged a warning and silently fell through to yfinance, yfinance writesVWAP=Noneperfeedback_no_silent_fails→ ArcticDB's VWAP column was universally null across the affected window even though daily_append reported success each day.This PR (Phase 1 of the fix):
polygon_client._get: raisePolygonForbiddenErroron 403 instead of returning empty dict (silent-fail violation)Plan (subsequent commits land here as draft → flipped to ready)
polygon_clientraises on 403 + testscollectors/daily_closes.pyadds--source {yfinance_only,polygon_only,auto}mode. yfinance_only (EOD pass) skips polygon entirely. polygon_only (morning pass) hard-fails onPolygonForbiddenError, no yfinance fallback for stocks. polygon_only also overwrites existing parquets and logs Close-discrepancy vs prior yfinance row.weekly_collector.pyroutes--dailyto--source yfinance_onlyand adds new--morning-enrichmode that runs polygon-only daily_closes + daily_append for the previous trading day.builders/daily_append.pydocstring fix (lines 82-86 incorrectly claim VWAP falls back to (H+L+C)/3 proxy; collector explicitly refuses that proxy per the 2026-04-17 decision).Sequencing with PR 2 (separate)
PR 2 in this repo (forthcoming) will wire the two modes into the existing Step Functions — EOD SF runs
--daily(yfinance), weekday SF gets a newMorningEnrichLambda step that runs--morning-enrichbeforePredictorInference. Once both PRs are deployed:alpha-enginealpha-engine-daily-data.timersystemd unit gets removed (no more EC2 timer racing the SF — see task Add PredictorHealthCheck to weekday pipeline #6 closure)--morning-enrich --date <D>per weekdayTest plan
tests/test_polygon_client.py— 7 tests pass, including 3 new ones covering the 403 raise contracttests/test_daily_closes_*regression + new--sourcemode teststests/test_weekly_collector_morning_enrich.py(new) — finds previous trading day correctly across weekends + holidayspython weekly_collector.py --morning-enrich --date 2026-04-23 --dry-runshould make exactly one polygon call (no yfinance fallthrough) and report a discrepancy summary vs the existing yfinance parquet🤖 Generated with Claude Code