feat(daily_append): source-aware skip_if_exists for EOD re-runs#128
Merged
Conversation
The 2026-04-18 fix made daily_append always overwrite same-date rows to recover from the polygon-relabel incident. That's the right default for MorningEnrich (polygon must overwrite yfinance's NaN VWAP), but it makes EOD post-market re-runs catastrophically slow: target_ts == existing.index.max() routes every ticker through ``_write_row_backfill_safe``'s backfill branch (full-series ``lib.write(combined, prune_previous_versions=True)`` per ticker — 904 × ~1.5s ≈ 22 min, over the SSM 1200s ceiling). This was the secondary failure mode in the 2026-05-01 EOD incident: after the manual recovery already wrote today's rows, the SF rerun of PostMarketData timed out exactly here. Adds a ``skip_if_exists`` parameter (default False, preserving the 2026-04-18 contract). EOD post-market path in ``weekly_collector._run_daily`` opts in (``skip_if_exists=True``); MorningEnrich path leaves it False so polygon's true VWAP keeps overwriting yfinance's NaN. The CLI exposes ``--skip-if-exists`` for direct invocations. Tests: - 8 new tests in test_daily_append_skip_if_exists.py covering the function-level skip behavior, default value, EOD/MorningEnrich call-site wiring, and CLI flag presence. - Updates the 2026-04-18 invariant test to allow the gated form ``if skip_if_exists and today_ts in hist.index:`` while still forbidding the unconditional skip pattern that triggered the polygon-relabel bug. SF JSON wiring (passing --skip-if-exists in the deployed SSM command) is intentionally out of scope; that requires a separate SF redeploy via ``infrastructure/update_eod_pipeline_sf.sh``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
cipher813
added a commit
that referenced
this pull request
May 1, 2026
…ation (#129) Follow-up to PR #128. The PostMarketData step in alpha-engine-eod-pipeline ran two python commands back-to-back: python weekly_collector.py --daily --only daily_closes python -m builders.daily_append The second one was a no-op duplicate. ``_run_daily`` runs daily_closes + feature_store + daily_append regardless of ``--only`` (the flag is only wired into ``_run_phase1``), so the second invocation just ran daily_append again — and as a bare CLI call it didn't get the new ``skip_if_exists`` flag PR #128 added, so on re-runs every ticker took the slow lib.write backfill path (904 × ~1.5s ≈ 22 min, over the SSM 1200s timeout — exactly what blew up the 2026-05-01 EOD SF rerun). Also drop the misleading ``--only daily_closes`` qualifier on the remaining invocation since it doesn't actually filter. Adds a regression test locking the SSM script to a single ``weekly_collector --daily`` invocation. Both infrastructure/step_function_eod.json and infrastructure/update_eod_pipeline_sf.sh updated; the deploy script needs to be run on ae-trading after merge to take effect. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
cipher813
added a commit
that referenced
this pull request
May 3, 2026
* fix(sf): skip_backtester preserves eval-judge skip-gate path Caught 2026-05-03 in SF eval-pipeline-validation-5: Research succeeded and wrote new-format captures to S3, but the eval-judge state silently never fired because the operator had passed skip_backtester=true to skip the long-running backtester for validation purposes. PR 4c (#140) wired the eval-pipeline states between Backtester success and SaturdayHealthCheck: CheckBacktesterStatus.Success → CheckSkipEvalJudge → ComputeEvalCadence → CheckMonthlyCadence → EvalJudgeFirstSaturday or EvalJudgeWeekly → EvalRollingMean → SaturdayHealthCheck But CheckSkipBacktester.skip routed directly to SaturdayHealthCheck, bypassing the eval-pipeline entirely. Production Sat 5/9 won't hit this (skip_backtester defaults false; Backtester runs and routes through eval-judge correctly), but operator manual skips for any non-eval validation purpose silently dropped the eval state. Fix: route skip_backtester=true → CheckSkipEvalJudge instead of SaturdayHealthCheck. Eval pipeline now fires on every SF execution where the operator hasn't explicitly skip_eval_judge'd it. tests/test_sf_eval_judge_wiring.py — TestSkipBacktesterPreservesEvalJudge: pins the routing so a future "simplification" can't re-introduce the silent bypass. Tests 433 → 434 (+1 wiring assertion). Pairs with alpha-engine-research PR #104 (RubricEvalLLMOutput defense + judge max_tokens to strategic tier — closes the 5/32 remaining failure class observed in this same SF run). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: drop dead ALPHA_ENGINE_LIB_TOKEN PAT plumbing alpha-engine-lib was flipped public 2026-05-03; PAT auth machinery that existed to install from a private repo is now dead weight. Removed across 6 files (net −87 lines). CI: - .github/workflows/ci.yml — drop "Configure git auth" step - .github/workflows/deploy.yml — drop the secondary actions/checkout for cipher813/alpha-engine-lib + the LIB_REPO_DIR env on the deploy step Docker / deploy: - Dockerfile — replace `COPY vendor/alpha-engine-lib` + local pip install with `pip install "alpha-engine-lib[flow_doctor] @ git+https://github.com/cipher813/alpha-engine-lib@v0.3.0"`. The [flow_doctor]-only install for Lambda is preserved (Lambda doesn't need [arcticdb] or [rag]); requirements.txt's [arcticdb,flow_doctor,rag] extras still apply for the EC2 install path. - infrastructure/deploy.sh — drop the vendor/alpha-engine-lib staging block + cleanup_lib_staging trap. Replace with one-line comment explaining lib comes from public git+https now. EC2 spot scripts: - infrastructure/spot_data_weekly.sh — drop SSM PAT fetch + insteadOf rewrite from the DEPS step. Update inline comments referencing the old mechanism (3 spots). - infrastructure/spot_drift_detection.sh — same removal. Companion follow-ups (not in this PR): - Delete ALPHA_ENGINE_LIB_TOKEN GitHub Actions secret on this repo - Delete /alpha-engine/lib-token SSM SecureString (us-east-1) - vendor/alpha-engine-lib local checkout can be removed (gitignored, not in any commit) Per ROADMAP follow-up "P3 Drop ALPHA_ENGINE_LIB_TOKEN PAT plumbing" added 2026-05-03. Second of 6 consumer-repo PRs in this cleanup arc; prototype landed in alpha-engine PR #128. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_write_row_backfill_safe's backfill branch and does a fulllib.write(prune_previous_versions=True)(904 × ~1.5s ≈ 22 min, over the 1200s SSM ceiling).skip_if_existsopt-in. EOD post-market path turns it on; MorningEnrich keeps the always-overwrite default.Incident link
This was the secondary failure mode in the 2026-05-01 EOD incident. After PR #126 fixed the schema regression and the universe was re-migrated, the SF rerun of
PostMarketDatastill timed out — because the manual recovery had already written today's rows, so every ticker took the slow backfill path. PR #126 fixed the column-order bug; this PR fixes the redundant-rewrite bug.Behavior matrix
skip_if_existsweekly_collector._run_daily(EOD post-market)_run_morning_enrichpython -m builders.daily_appendCLI--skip-if-existsflagTests
tests/test_daily_append_skip_if_exists.py:skip_if_exists=True+ today in hist → skip, no write callskip_if_exists=True+ today missing from hist → write proceeds (no regression to 2026-04-18 silent-skip bug)skip_if_exists=False(default) + today in hist → overwrite (preserves polygon contract)Falseweekly_collectorpassesTrueTrue--skip-if-existsflag is present and wires through to function calltest_no_unconditional_skip_guard_on_existing_today_row) to allow the gated form while still forbidding the unconditional skip pattern that triggered the polygon-relabel bug.Out of scope
SF-JSON wiring (passing
--skip-if-existsin the deployed SSM command) is intentionally not in this PR. The EOD SF definition lives ininfrastructure/step_function_eod.json+infrastructure/update_eod_pipeline_sf.sh, and updating them needs a separate redeploy viaupdate_eod_pipeline_sf.sh. That can be sequenced after this lands. Until the SF is redeployed, the EOD path benefits fromskip_if_existsviaweekly_collector --daily(the first command in the SSM script); the secondpython -m builders.daily_appendinvocation continues to default toFalseuntil the SF is updated.Test plan
pytest tests/test_daily_append_skip_if_exists.py— 8/8 passpytest tests/— 334/334 passweekly_collector._run_dailyskipping today's already-written rows on re-run🤖 Generated with Claude Code