Skip to content

fix(sf): skip_backtester preserves eval-judge skip-gate path#147

Merged
cipher813 merged 1 commit into
mainfrom
fix/sf-skip-backtester-preserve-eval
May 3, 2026
Merged

fix(sf): skip_backtester preserves eval-judge skip-gate path#147
cipher813 merged 1 commit into
mainfrom
fix/sf-skip-backtester-preserve-eval

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

Caught 2026-05-03 in SF `eval-pipeline-validation-5`: Research succeeded and wrote new-format captures to S3, but the eval-judge state silently never fired because the operator had passed `skip_backtester=true` to skip the long-running backtester for validation purposes.

`CheckSkipBacktester.skip` routed directly to `SaturdayHealthCheck`, bypassing the eval-pipeline entirely.

Fix

`skip_backtester=true` now routes to `CheckSkipEvalJudge` instead of `SaturdayHealthCheck`.

Production Sat 5/9 impact

None — `skip_backtester` defaults false on scheduled runs.

Test plan

  • `pytest tests/ -q` → 434 passed (was 433; +1 `TestSkipBacktesterPreservesEvalJudge`).
  • Pin asserts `CheckSkipBacktester.Choices[0].Next == "CheckSkipEvalJudge"` AND `!= "SaturdayHealthCheck"`.

Pairs with

`alpha-engine-research` PR #104.

🤖 Generated with Claude Code

Caught 2026-05-03 in SF eval-pipeline-validation-5: Research succeeded
and wrote new-format captures to S3, but the eval-judge state silently
never fired because the operator had passed skip_backtester=true to
skip the long-running backtester for validation purposes.

PR 4c (#140) wired the eval-pipeline states between Backtester success
and SaturdayHealthCheck:

  CheckBacktesterStatus.Success
    → CheckSkipEvalJudge → ComputeEvalCadence → CheckMonthlyCadence
        → EvalJudgeFirstSaturday or EvalJudgeWeekly → EvalRollingMean
    → SaturdayHealthCheck

But CheckSkipBacktester.skip routed directly to SaturdayHealthCheck,
bypassing the eval-pipeline entirely. Production Sat 5/9 won't hit
this (skip_backtester defaults false; Backtester runs and routes
through eval-judge correctly), but operator manual skips for any
non-eval validation purpose silently dropped the eval state.

Fix: route skip_backtester=true → CheckSkipEvalJudge instead of
SaturdayHealthCheck. Eval pipeline now fires on every SF execution
where the operator hasn't explicitly skip_eval_judge'd it.

tests/test_sf_eval_judge_wiring.py — TestSkipBacktesterPreservesEvalJudge:
  pins the routing so a future "simplification" can't re-introduce
  the silent bypass.

Tests 433 → 434 (+1 wiring assertion).

Pairs with alpha-engine-research PR #104 (RubricEvalLLMOutput
defense + judge max_tokens to strategic tier — closes the 5/32
remaining failure class observed in this same SF run).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit f1050c7 into main May 3, 2026
1 check passed
@cipher813 cipher813 deleted the fix/sf-skip-backtester-preserve-eval branch May 3, 2026 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant