Add PredictorHealthCheck to weekday pipeline by cipher813 · Pull Request #6 · cipher813/alpha-engine-data

cipher813 · 2026-04-08T02:15:25Z

Summary

Insert PredictorHealthCheck Lambda invoke between PredictorInference and HealthCheck in weekday Step Function
Non-blocking: Catch sends to HealthCheck on failure (doesn't halt trading)
IAM policy updated with alpha-engine-predictor-health-check* Lambda ARN

Companion PR: cipher813/alpha-engine-backtester#5 (Lambda code + deploy script)

Test plan

Step Function already redeployed and live
Lambda canary passed (dry_run=true)
Monitor first live run tomorrow 6:05 AM PT

🤖 Generated with Claude Code

Insert daily predictor health check Lambda between PredictorInference and HealthCheck. Non-blocking — failure continues to data health check and executor start. IAM policy updated with new Lambda ARN. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ate preflight (#138) Closes the prune+backfill loop that recreated 7 S&P churn-out stragglers on every Saturday SF run. 2026-05-02 redrive #6 surfaced the loop: pre-MorningEnrich prune (PR #134, absent_days=5) drops stragglers ✓; Phase 1 step 8 (builders.backfill) loads ALL predictor/price_cache/*.parquet files and writes EVERY ticker back to ArcticDB universe — including the ones we just pruned, because their parquet files still exist (kept for historical lookup). Loop closes; Backtester preflight (~2 hours later) trips on the 8-day-stale rows. ## Fix 1: backfill respects current constituents In ``builders.backfill``, load current constituents via the ``market_data/latest_weekly.json`` pointer and filter ``universe_tickers`` against it. Tickers absent from constituents (churn-outs) get a price_cache parquet preserved (history kept) but NO arctic row written. If a ticker comes back to S&P later, it appears in constituents and backfill picks it up automatically. Hard-fails on constituents-load failure (vs silently writing everything) per feedback_no_silent_fails. Skipped in dry_run so local smoke tests don't need S3 access. ## Fix 2: sf_preflight escalates straggler detection ``check_universe_drift`` now returns FAIL (not OK) when any straggler is "old enough to prune" (>5 days stale). Forces operators to drop stragglers BEFORE launching recovery SFs that skip MorningEnrich (would otherwise burn a 120-min Backtester spot to re-discover them). Result includes a remediation hint pointing at the prune CLI. Validation against current state (post manual prune of 7): [OK] universe_drift 1 arctic stragglers; 0 would be pruned 3 new tests in test_backfill_no_regression.py: - backfill_skips_tickers_absent_from_constituents (the loop closure) - backfill_hard_fails_when_constituents_load_fails (no silent recreate-everything fallback) - backfill_dry_run_does_not_filter_by_constituents (CI / smoke doesn't need S3) Existing test scaffolding updated to mock _load_current_constituents across both backfill test files. 406 tests pass. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ndent; per-branch error isolation) (#251) Research and PredictorTraining are data-independent (CLAUDE.md Architecture: "no data flows between them"). They ran sequentially only to "spread API load" — a now-stale rationale: predictor training (alpha-engine-predictor/training/train_handler.py) reads ArcticDB + CPU LightGBM and makes NO Anthropic calls (yfinance fallback removed by predictor PR #6; train_handler yfinance docstrings are stale). Research's only heavy load is Anthropic. They do not contend on the rate-limited API. Restructure the sequential Research…→PredictorTraining run into an SF Parallel state (ResearchPredictorParallel): - Branch A: CheckSkipResearch → Research → DataPhase2 → EvalJudge chain → EvalRollingMean → RationaleClustering → ReplayConcordance → Counterfactual (everything that consumes Research output, current order, all CheckSkip*/quartets/fail-soft Catches intact). - Branch B: PredictorTraining quartet + skip-gate intact. - Join → AggregateBranchOutcomes → CheckBranchOutcomes → CheckSkipDriftDetection → Backtester → Parity → Evaluator (unchanged). Per-branch error isolation (the correctness-critical requirement): SF Parallel's default cancels siblings when one branch errors. To prevent a strict-Research hard-fail from aborting/wasting an in-flight or completed+S3-promoted PredictorTraining, each branch ends in a branch-local Pass terminal (End:true) recording OK/FAILED as data — a branch NEVER throws. The SF is failed AFTER the join (post-aggregation) if either branch recorded FAILED, so the other branch's completed work (incl. already-promoted predictor weights in S3) persists and the recovery re-run's skip-set can skip whichever branch genuinely completed (Research-fail + Predictor-done → re-run with skip_predictor_training). Parallel-level Catch → existing shared HandleFailure (no new error channel); Parallel Retry is a documented no-op (MaxAttempts:0) so a completed PredictorTraining is never re-run. Inbound edges (RegimeRetrospectiveEval Next+Catch, CheckSkipRegimeRetrospectiveEval skip choice) re-pointed to ResearchPredictorParallel. Tests: new tests/test_sf_research_predictor_parallel_wiring.py (72 tests: sibling branches; Branch-A/B contents; per-branch isolation incl. no in-branch escape to HandleFailure; post-join fail-if-either-FAILED; ec2_instance_id reaches Branch B; Backtester after join; no dangling targets anywhere). Updated test_sf_eval_judge_wiring.py (flattened state fixture + old cross-boundary edge assertions retargeted to BranchAComplete) and test_sf_regime_substrate_wiring.py (inbound edge → Parallel). Full suite green: 1207 passed, 1 skipped (pre-existing pandas FutureWarnings in daily_append.py, unrelated). DEPLOY HELD — prod SF-topology change; do not merge/redeploy/trigger until the user directs. CLAUDE.md:100 "spread API load" rationale is stale and must be corrected on merge (flagged, not edited — that file is outside this repo). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cipher813 merged commit 1ee368c into main Apr 8, 2026
1 check passed

cipher813 deleted the feat/daily-health-check branch April 8, 2026 02:18

cipher813 mentioned this pull request Apr 24, 2026

Split daily collection by source: yfinance EOD + polygon morning enrichment #90

Merged

9 tasks

cipher813 mentioned this pull request May 2, 2026

fix(backfill): filter universe writes by current constituents + escalate preflight #138

Merged

5 tasks

cipher813 mentioned this pull request May 17, 2026

feat(sf): run Research and PredictorTraining in parallel (data-independent; per-branch error isolation) #251

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PredictorHealthCheck to weekday pipeline#6

Add PredictorHealthCheck to weekday pipeline#6
cipher813 merged 1 commit into
mainfrom
feat/daily-health-check

cipher813 commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cipher813 commented Apr 8, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant