Skip to content

fix(collectors): daily_closes skip only if existing parquet is post-close#83

Merged
cipher813 merged 1 commit into
mainfrom
fix/daily-closes-pre-close-guard
Apr 22, 2026
Merged

fix(collectors): daily_closes skip only if existing parquet is post-close#83
cipher813 merged 1 commit into
mainfrom
fix/daily-closes-pre-close-guard

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

  • ROADMAP P0: close the stale-parquet recurrence window. The head_object → skip short-circuit in collectors/daily_closes.py now only fires when LastModified >= NYSE_close(run_date).
  • Pre-close writes log a warning and fall through to re-collect the authoritative post-close data. Eliminates the silent α-attribution corruption class behind the 2026-04-20 EOD email (reported α = −1.33% vs real +0.08%).
  • Uses zoneinfo.ZoneInfo("America/New_York") so EST/EDT resolve automatically — no explicit DST handling.

Why

The 2026-04-20 incident had a morning DailyData SF run writing the parquet at 06:07 PT with polygon's T-1 aggregate under today's key; my 16:14 PT post-close rerun skipped the file (it existed!) and daily_append propagated Friday's closes into ArcticDB for every ticker. Forward-fix (moving DailyData to post-close systemd timer) isn't sufficient — anything that writes pre-close permanently shadows the authoritative post-close data.

Test plan

  • test_post_close_write_is_skipped — post-close LastModified short-circuits
  • test_pre_close_write_forces_refetch — pre-close LastModified falls through to fetch
  • test_is_post_close_write_edt — 2026-04-20 EDT: 20:00 UTC boundary
  • test_is_post_close_write_est — 2026-01-15 EST: 21:00 UTC boundary
  • test_missing_object_proceeds_to_fetch — 404 path unchanged
  • test_head_object_auth_failure_propagates — non-404 errors still raise
  • Full suite: 139/139 pass

🤖 Generated with Claude Code

…en post-close

The 2026-04-20 incident had the morning DailyData Step Function writing
`predictor/daily_closes/2026-04-20.parquet` at 06:07 PT with Friday's
polygon aggregate stamped under Monday's key. The 16:14 PT post-close
rerun hit the existing `head_object → skip` short-circuit and propagated
the stale data through daily_append into ArcticDB for every ticker,
producing a bogus α = −1.33% on the EOD email vs the real +0.08%.

Fix: skip only if `LastModified >= NYSE_close(run_date)`. Pre-close
writes log a warning and fall through to re-collect the authoritative
post-close data. NYSE close = 16:00 America/New_York; zoneinfo resolves
EST/EDT automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 416eff3 into main Apr 22, 2026
1 check passed
@cipher813 cipher813 deleted the fix/daily-closes-pre-close-guard branch April 22, 2026 15:36
cipher813 added a commit that referenced this pull request May 27, 2026
…artifact probe (#335)

Phase 3 of the artifact-freshness-monitor arc (plan doc at
~/Development/alpha-engine-docs/private/artifact-freshness-monitor-260527.md;
ROADMAP P1 entry in alpha-engine-config #342 / #344). The
absence-driven complement to flow-doctor / SF Catch /
substrate-health-check (all event-driven). Closes the silent
absence-of-artifact bug class — 2026-05-17→27 pit_parity.json + the
sibling factor-profiles orphan + missing-signals.json incidents.

Surface (infrastructure/lambdas/freshness-monitor/):

- index.py — EventBridge cron handler (every 15min):
  1. load_registry(s3, REGISTRY_BUCKET, REGISTRY_KEY) — fetches
     ARTIFACT_REGISTRY.yaml from S3, merges defaults block into each
     entry, instantiates ArtifactSpec (per alpha-engine-config #344).
  2. Walks every spec, calls alpha_engine_lib.artifact_freshness.
     check_freshness per row, isolates per-spec exceptions so a bad
     row doesn't sink the whole pass.
  3. Emits _freshness_monitor/check_results.json (dashboard surface
     in Phase 5) + _freshness_monitor/heartbeat.json (self-heartbeat
     per plan §3 invariant 9 — the monitor monitors itself;
     substrate-health-check daily watches the heartbeat).
  4. For misses past SLA (state ∈ {missing, stale, probe_failed}),
     routes to alpha_engine_lib.alerts.publish with
     dedup_key=resolve_dedup_key(spec, now) — collapses 4×/hour
     retries to one alert per cycle per artifact.
  5. probe_failed always escalates to severity=critical regardless
     of spec — the monitor itself is broken; operator must know
     (plan §3 invariant 6).
  6. OBSERVE-mode gate: MNEMON_FRESHNESS_MONITOR_ENABLED env var
     (default unset = false) suppresses alerts but emits results.
     Phase 6 cutover flips via aws lambda update-function-configuration
     without redeploying — mirrors mnemon 0.7.0rc4 pattern from
     2026-05-24.

- requirements.txt — pinned to alpha-engine-lib@v0.40.0 (substrate
  introduced in lib #83) + pyyaml.

- iam-policy.json — Logs + Telegram SSM params + alpha-engine-alerts
  SNS publish + S3 HeadObject/GetObject on alpha-engine-research +
  S3 PutObject scoped to _freshness_monitor/ + _alerts/_dedup/.

- deploy.sh — bootstrap (IAM role + Lambda + EventBridge cron
  rule + cron permission), code-update path, registry upload from
  local alpha-engine-config clone to S3. Validates registry locally
  via alpha-engine-config/scripts/validate_artifact_registry.py
  BEFORE upload — malformed YAML never reaches S3. Mirrors the
  sf-telegram-notifier deploy.sh shape. Managed outside CFN per
  same rationale as sf-telegram-notifier / spot-orphan-reaper /
  changelog-cloudwatch-mirror.

- test_handler.py — 12 unit tests covering:
  * load_registry (defaults merge, per-entry override, ISO-string
    date coercion, missing-artifacts raise)
  * Handler OBSERVE-mode does not alert but emits heartbeat
  * Handler alerts-enabled fires alerts with resolved dedup_key
  * probe_failed routes to critical severity regardless of spec
  * Per-spec exception (e.g., unsupported placeholder) classified as
    probe_failed without sinking the rest of the pass
  * Env-flip cutover (OBSERVE → production) without code change
  * _maybe_alert per-state coverage (fresh skip, within-SLA-grace
    skip, missing-past-SLA fires, probe_failed bumps severity)

Phase 6 cutover: ≥2 weekly cycles in OBSERVE mode (earliest cutover
~2026-06-13 if this PR + #344 land before Sat 5/30; more realistically
~2026-06-20). Acceptance criteria per plan §7: simulated pit_parity-
class silent failure fires Telegram with correct dedup key within
~15min; NYSE-holiday Monday produces zero alerts despite cron firing;
failed Saturday SF + successful recovery-SF in same window produces
zero alerts (substitution working); per plan §11 risk register.

Composes with alpha-engine-lib #83 (substrate, v0.40.0) +
alpha-engine-config #344 (registry SoT + PR-time validator).
Phase 4 (CI guards across 4 producing repos) + Phase 5 (dashboard
surface) ship in follow-up PRs.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant