refactor(judge-batch): delegate build_batch_request to lib chokepoint (L334 part 2/2)#249
Merged
Merged
Conversation
… (L334)
`evals/judge.py::build_batch_request` now delegates payload construction
to `alpha_engine_lib.anthropic_payload.build_batches_request_params`
(L334 second-consumer chokepoint, shipped in lib v0.41.0). Drops the
inline `{custom_id, params}` dict construction; same wire shape, same
behavior.
The chokepoint enforces the server-tool ⊥ assistant-prefill invariant
on the embedded `params` dict — so a future RubricEval extension that
adds a server-side tool (e.g. `web_search` for citation lookup) can't
silently reach Anthropic's HTTP 400 the way morning-signal did in May.
Lib pin v0.34.0 → v0.41.0 in lockstep across:
- requirements.txt
- Dockerfile (main research Lambda image)
- Dockerfile.alerts (research-alerts Lambda image)
ROADMAP: **L334** part 2/2 — consumer migration. Part 1 = alpha-engine-lib
PR #85 (build_batches_request_params + v0.40.1 → v0.41.0).
**Merge-blocked on lib PR #85 landing + the v0.41.0 git tag.**
Suite: 1609 passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
evals/judge.py::build_batch_requestnow delegates payload construction toalpha_engine_lib.anthropic_payload.build_batches_request_params(L334 second-consumer chokepoint). Drops the inline{custom_id, params}dict construction; same wire shape, same behavior.Lib pin v0.34.0 → v0.41.0 in lockstep across:
requirements.txtDockerfile(main research Lambda image)Dockerfile.alerts(research-alerts Lambda image)Why
The chokepoint enforces the server-tool ⊥ assistant-prefill invariant on the embedded
paramsdict — so a future RubricEval extension that adds a server-side tool (e.g.web_searchfor citation lookup) can't silently reach Anthropic's HTTP 400 the way morning-signal did in May.ROADMAP L334 part 2/2 — consumer migration. Part 1 = alpha-engine-lib #85 (
build_batches_request_params+ v0.40.1 → v0.41.0).This PR is blocked on:
v0.41.0git tag being cut on alpha-engine-libOnce both land, this PR's CI will go green (currently fails on the unreachable lib pin).
Test plan
tests/test_eval_judge_batch.py(12 tests covering build_batch_request shape, error semantics, schema pin) all pass against the lib-routed implementation🤖 Generated with Claude Code