fix(collectors): daily_closes skip only if existing parquet is post-close#83
Merged
Merged
Conversation
…en post-close The 2026-04-20 incident had the morning DailyData Step Function writing `predictor/daily_closes/2026-04-20.parquet` at 06:07 PT with Friday's polygon aggregate stamped under Monday's key. The 16:14 PT post-close rerun hit the existing `head_object → skip` short-circuit and propagated the stale data through daily_append into ArcticDB for every ticker, producing a bogus α = −1.33% on the EOD email vs the real +0.08%. Fix: skip only if `LastModified >= NYSE_close(run_date)`. Pre-close writes log a warning and fall through to re-collect the authoritative post-close data. NYSE close = 16:00 America/New_York; zoneinfo resolves EST/EDT automatically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813
added a commit
that referenced
this pull request
May 27, 2026
…artifact probe (#335) Phase 3 of the artifact-freshness-monitor arc (plan doc at ~/Development/alpha-engine-docs/private/artifact-freshness-monitor-260527.md; ROADMAP P1 entry in alpha-engine-config #342 / #344). The absence-driven complement to flow-doctor / SF Catch / substrate-health-check (all event-driven). Closes the silent absence-of-artifact bug class — 2026-05-17→27 pit_parity.json + the sibling factor-profiles orphan + missing-signals.json incidents. Surface (infrastructure/lambdas/freshness-monitor/): - index.py — EventBridge cron handler (every 15min): 1. load_registry(s3, REGISTRY_BUCKET, REGISTRY_KEY) — fetches ARTIFACT_REGISTRY.yaml from S3, merges defaults block into each entry, instantiates ArtifactSpec (per alpha-engine-config #344). 2. Walks every spec, calls alpha_engine_lib.artifact_freshness. check_freshness per row, isolates per-spec exceptions so a bad row doesn't sink the whole pass. 3. Emits _freshness_monitor/check_results.json (dashboard surface in Phase 5) + _freshness_monitor/heartbeat.json (self-heartbeat per plan §3 invariant 9 — the monitor monitors itself; substrate-health-check daily watches the heartbeat). 4. For misses past SLA (state ∈ {missing, stale, probe_failed}), routes to alpha_engine_lib.alerts.publish with dedup_key=resolve_dedup_key(spec, now) — collapses 4×/hour retries to one alert per cycle per artifact. 5. probe_failed always escalates to severity=critical regardless of spec — the monitor itself is broken; operator must know (plan §3 invariant 6). 6. OBSERVE-mode gate: MNEMON_FRESHNESS_MONITOR_ENABLED env var (default unset = false) suppresses alerts but emits results. Phase 6 cutover flips via aws lambda update-function-configuration without redeploying — mirrors mnemon 0.7.0rc4 pattern from 2026-05-24. - requirements.txt — pinned to alpha-engine-lib@v0.40.0 (substrate introduced in lib #83) + pyyaml. - iam-policy.json — Logs + Telegram SSM params + alpha-engine-alerts SNS publish + S3 HeadObject/GetObject on alpha-engine-research + S3 PutObject scoped to _freshness_monitor/ + _alerts/_dedup/. - deploy.sh — bootstrap (IAM role + Lambda + EventBridge cron rule + cron permission), code-update path, registry upload from local alpha-engine-config clone to S3. Validates registry locally via alpha-engine-config/scripts/validate_artifact_registry.py BEFORE upload — malformed YAML never reaches S3. Mirrors the sf-telegram-notifier deploy.sh shape. Managed outside CFN per same rationale as sf-telegram-notifier / spot-orphan-reaper / changelog-cloudwatch-mirror. - test_handler.py — 12 unit tests covering: * load_registry (defaults merge, per-entry override, ISO-string date coercion, missing-artifacts raise) * Handler OBSERVE-mode does not alert but emits heartbeat * Handler alerts-enabled fires alerts with resolved dedup_key * probe_failed routes to critical severity regardless of spec * Per-spec exception (e.g., unsupported placeholder) classified as probe_failed without sinking the rest of the pass * Env-flip cutover (OBSERVE → production) without code change * _maybe_alert per-state coverage (fresh skip, within-SLA-grace skip, missing-past-SLA fires, probe_failed bumps severity) Phase 6 cutover: ≥2 weekly cycles in OBSERVE mode (earliest cutover ~2026-06-13 if this PR + #344 land before Sat 5/30; more realistically ~2026-06-20). Acceptance criteria per plan §7: simulated pit_parity- class silent failure fires Telegram with correct dedup key within ~15min; NYSE-holiday Monday produces zero alerts despite cron firing; failed Saturday SF + successful recovery-SF in same window produces zero alerts (substitution working); per plan §11 risk register. Composes with alpha-engine-lib #83 (substrate, v0.40.0) + alpha-engine-config #344 (registry SoT + PR-time validator). Phase 4 (CI guards across 4 producing repos) + Phase 5 (dashboard surface) ship in follow-up PRs. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
head_object → skipshort-circuit incollectors/daily_closes.pynow only fires whenLastModified >= NYSE_close(run_date).zoneinfo.ZoneInfo("America/New_York")so EST/EDT resolve automatically — no explicit DST handling.Why
The 2026-04-20 incident had a morning DailyData SF run writing the parquet at 06:07 PT with polygon's T-1 aggregate under today's key; my 16:14 PT post-close rerun skipped the file (it existed!) and daily_append propagated Friday's closes into ArcticDB for every ticker. Forward-fix (moving DailyData to post-close systemd timer) isn't sufficient — anything that writes pre-close permanently shadows the authoritative post-close data.
Test plan
test_post_close_write_is_skipped— post-close LastModified short-circuitstest_pre_close_write_forces_refetch— pre-close LastModified falls through to fetchtest_is_post_close_write_edt— 2026-04-20 EDT: 20:00 UTC boundarytest_is_post_close_write_est— 2026-01-15 EST: 21:00 UTC boundarytest_missing_object_proceeds_to_fetch— 404 path unchangedtest_head_object_auth_failure_propagates— non-404 errors still raise🤖 Generated with Claude Code