Skip to content

feat(daily_append): source-aware skip_if_exists for EOD re-runs#128

Merged
cipher813 merged 1 commit into
mainfrom
feat/daily-append-skip-if-exists
May 1, 2026
Merged

feat(daily_append): source-aware skip_if_exists for EOD re-runs#128
cipher813 merged 1 commit into
mainfrom
feat/daily-append-skip-if-exists

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

  • The 2026-04-18 always-overwrite contract is correct for MorningEnrich (polygon must overwrite yfinance's NaN VWAP) but makes EOD post-market re-runs catastrophically slow: every ticker hits _write_row_backfill_safe's backfill branch and does a full lib.write(prune_previous_versions=True) (904 × ~1.5s ≈ 22 min, over the 1200s SSM ceiling).
  • Adds a source-aware skip_if_exists opt-in. EOD post-market path turns it on; MorningEnrich keeps the always-overwrite default.

Incident link

This was the secondary failure mode in the 2026-05-01 EOD incident. After PR #126 fixed the schema regression and the universe was re-migrated, the SF rerun of PostMarketData still timed out — because the manual recovery had already written today's rows, so every ticker took the slow backfill path. PR #126 fixed the column-order bug; this PR fixes the redundant-rewrite bug.

Behavior matrix

Caller Source skip_if_exists Behavior on re-run
weekly_collector._run_daily (EOD post-market) yfinance True Microsecond skip (today's row already final)
_run_morning_enrich polygon False (default) Always overwrite (apply real VWAP)
python -m builders.daily_append CLI (caller-driven) new --skip-if-exists flag Default False for safety

Tests

  • 8 new tests in tests/test_daily_append_skip_if_exists.py:
    • skip_if_exists=True + today in hist → skip, no write call
    • skip_if_exists=True + today missing from hist → write proceeds (no regression to 2026-04-18 silent-skip bug)
    • skip_if_exists=False (default) + today in hist → overwrite (preserves polygon contract)
    • default value is False
    • EOD call site in weekly_collector passes True
    • MorningEnrich call site does NOT pass True
    • CLI --skip-if-exists flag is present and wires through to function call
  • Updates the 2026-04-18 invariant test (test_no_unconditional_skip_guard_on_existing_today_row) to allow the gated form while still forbidding the unconditional skip pattern that triggered the polygon-relabel bug.
  • Full suite: 334/334 pass locally.

Out of scope

SF-JSON wiring (passing --skip-if-exists in the deployed SSM command) is intentionally not in this PR. The EOD SF definition lives in infrastructure/step_function_eod.json + infrastructure/update_eod_pipeline_sf.sh, and updating them needs a separate redeploy via update_eod_pipeline_sf.sh. That can be sequenced after this lands. Until the SF is redeployed, the EOD path benefits from skip_if_exists via weekly_collector --daily (the first command in the SSM script); the second python -m builders.daily_append invocation continues to default to False until the SF is updated.

Test plan

  • pytest tests/test_daily_append_skip_if_exists.py — 8/8 pass
  • pytest tests/ — 334/334 pass
  • Monday 2026-05-04 EOD SF runs cleanly with weekly_collector._run_daily skipping today's already-written rows on re-run

🤖 Generated with Claude Code

The 2026-04-18 fix made daily_append always overwrite same-date rows
to recover from the polygon-relabel incident. That's the right
default for MorningEnrich (polygon must overwrite yfinance's NaN
VWAP), but it makes EOD post-market re-runs catastrophically slow:
target_ts == existing.index.max() routes every ticker through
``_write_row_backfill_safe``'s backfill branch (full-series
``lib.write(combined, prune_previous_versions=True)`` per ticker —
904 × ~1.5s ≈ 22 min, over the SSM 1200s ceiling).

This was the secondary failure mode in the 2026-05-01 EOD incident:
after the manual recovery already wrote today's rows, the SF rerun
of PostMarketData timed out exactly here.

Adds a ``skip_if_exists`` parameter (default False, preserving the
2026-04-18 contract). EOD post-market path in
``weekly_collector._run_daily`` opts in (``skip_if_exists=True``);
MorningEnrich path leaves it False so polygon's true VWAP keeps
overwriting yfinance's NaN. The CLI exposes ``--skip-if-exists``
for direct invocations.

Tests:
- 8 new tests in test_daily_append_skip_if_exists.py covering the
  function-level skip behavior, default value, EOD/MorningEnrich
  call-site wiring, and CLI flag presence.
- Updates the 2026-04-18 invariant test to allow the gated form
  ``if skip_if_exists and today_ts in hist.index:`` while still
  forbidding the unconditional skip pattern that triggered the
  polygon-relabel bug.

SF JSON wiring (passing --skip-if-exists in the deployed SSM
command) is intentionally out of scope; that requires a separate
SF redeploy via ``infrastructure/update_eod_pipeline_sf.sh``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 75ebccf into main May 1, 2026
1 check passed
@cipher813 cipher813 deleted the feat/daily-append-skip-if-exists branch May 1, 2026 22:37
cipher813 added a commit that referenced this pull request May 1, 2026
…ation (#129)

Follow-up to PR #128. The PostMarketData step in alpha-engine-eod-pipeline
ran two python commands back-to-back:

    python weekly_collector.py --daily --only daily_closes
    python -m builders.daily_append

The second one was a no-op duplicate. ``_run_daily`` runs daily_closes
+ feature_store + daily_append regardless of ``--only`` (the flag is
only wired into ``_run_phase1``), so the second invocation just ran
daily_append again — and as a bare CLI call it didn't get the new
``skip_if_exists`` flag PR #128 added, so on re-runs every ticker took
the slow lib.write backfill path (904 × ~1.5s ≈ 22 min, over the SSM
1200s timeout — exactly what blew up the 2026-05-01 EOD SF rerun).

Also drop the misleading ``--only daily_closes`` qualifier on the
remaining invocation since it doesn't actually filter.

Adds a regression test locking the SSM script to a single
``weekly_collector --daily`` invocation. Both
infrastructure/step_function_eod.json and
infrastructure/update_eod_pipeline_sf.sh updated; the deploy script
needs to be run on ae-trading after merge to take effect.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813 added a commit that referenced this pull request May 3, 2026
* fix(sf): skip_backtester preserves eval-judge skip-gate path

Caught 2026-05-03 in SF eval-pipeline-validation-5: Research succeeded
and wrote new-format captures to S3, but the eval-judge state silently
never fired because the operator had passed skip_backtester=true to
skip the long-running backtester for validation purposes.

PR 4c (#140) wired the eval-pipeline states between Backtester success
and SaturdayHealthCheck:

  CheckBacktesterStatus.Success
    → CheckSkipEvalJudge → ComputeEvalCadence → CheckMonthlyCadence
        → EvalJudgeFirstSaturday or EvalJudgeWeekly → EvalRollingMean
    → SaturdayHealthCheck

But CheckSkipBacktester.skip routed directly to SaturdayHealthCheck,
bypassing the eval-pipeline entirely. Production Sat 5/9 won't hit
this (skip_backtester defaults false; Backtester runs and routes
through eval-judge correctly), but operator manual skips for any
non-eval validation purpose silently dropped the eval state.

Fix: route skip_backtester=true → CheckSkipEvalJudge instead of
SaturdayHealthCheck. Eval pipeline now fires on every SF execution
where the operator hasn't explicitly skip_eval_judge'd it.

tests/test_sf_eval_judge_wiring.py — TestSkipBacktesterPreservesEvalJudge:
  pins the routing so a future "simplification" can't re-introduce
  the silent bypass.

Tests 433 → 434 (+1 wiring assertion).

Pairs with alpha-engine-research PR #104 (RubricEvalLLMOutput
defense + judge max_tokens to strategic tier — closes the 5/32
remaining failure class observed in this same SF run).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: drop dead ALPHA_ENGINE_LIB_TOKEN PAT plumbing

alpha-engine-lib was flipped public 2026-05-03; PAT auth machinery
that existed to install from a private repo is now dead weight.
Removed across 6 files (net −87 lines).

CI:
- .github/workflows/ci.yml — drop "Configure git auth" step
- .github/workflows/deploy.yml — drop the secondary
  actions/checkout for cipher813/alpha-engine-lib + the LIB_REPO_DIR
  env on the deploy step

Docker / deploy:
- Dockerfile — replace `COPY vendor/alpha-engine-lib` + local pip
  install with `pip install "alpha-engine-lib[flow_doctor] @
  git+https://github.com/cipher813/alpha-engine-lib@v0.3.0"`. The
  [flow_doctor]-only install for Lambda is preserved (Lambda doesn't
  need [arcticdb] or [rag]); requirements.txt's
  [arcticdb,flow_doctor,rag] extras still apply for the EC2 install
  path.
- infrastructure/deploy.sh — drop the vendor/alpha-engine-lib
  staging block + cleanup_lib_staging trap. Replace with one-line
  comment explaining lib comes from public git+https now.

EC2 spot scripts:
- infrastructure/spot_data_weekly.sh — drop SSM PAT fetch + insteadOf
  rewrite from the DEPS step. Update inline comments referencing the
  old mechanism (3 spots).
- infrastructure/spot_drift_detection.sh — same removal.

Companion follow-ups (not in this PR):
- Delete ALPHA_ENGINE_LIB_TOKEN GitHub Actions secret on this repo
- Delete /alpha-engine/lib-token SSM SecureString (us-east-1)
- vendor/alpha-engine-lib local checkout can be removed (gitignored,
  not in any commit)

Per ROADMAP follow-up "P3 Drop ALPHA_ENGINE_LIB_TOKEN PAT plumbing"
added 2026-05-03. Second of 6 consumer-repo PRs in this cleanup arc;
prototype landed in alpha-engine PR #128.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant