Problem.
attacks/evaluator/evaluation_step.py (~1.35 kLOC) runs judge calls via ThreadPoolExecutor. With N goals × M judges × K iterations and blocking HTTP, throughput is GIL/thread-bound. httpx, openai, and litellm all support async.
Actions.
Files:
attacks/evaluator/evaluation_step.py, attacks/evaluator/base.py, possibly router/router.py.
Acceptance:
≥2× speedup on the chosen benchmark; no regression in tests/unit; a new integration test confirms ordering of results is stable.
Problem.
attacks/evaluator/evaluation_step.py (~1.35 kLOC) runs judge calls via
ThreadPoolExecutor. With N goals × M judges × K iterations and blocking HTTP, throughput is GIL/thread-bound.httpx,openai, andlitellmall support async.Actions.
httpx.AsyncClientinstance scoped to the run (with the timeout from chore(ci)(deps): bump codecov/codecov-action from 4 to 5 #2).asyncio.gatherwith a boundedasyncio.Semaphorefor rate-limit pressure (configurable, default e.g. 10).pairwith 50 goals × 1 judge) and capture before/after wall clock + tokens/s.Files:
attacks/evaluator/evaluation_step.py, attacks/evaluator/base.py, possibly router/router.py.
Acceptance:
≥2× speedup on the chosen benchmark; no regression in
tests/unit; a new integration test confirms ordering of results is stable.