The official implementations of Agent-as-a-Router: Agentic Model Routing for Coding Tasks.
ACRouter routes coding tasks to different backend models under a performance-cost tradeoff. This repository contains the reproducible implementation, CodeRouterBench data, reference outputs, and small demos for plugging the router into your own workflow.
Offline reproduction does not require API keys or live model calls. The current
public benchmark is OOD176. Older OOD112/SWE-MiniSandbox assets are kept as
legacy supplementary data under data/ood/.
For the full pre-cleanup guide with maintainer notes, extended data details, and publishing commands, see docs/HANDBOOK.md.
cd open-source-acrouter
conda create -n acrouter python=3.11 -y
conda activate acrouter
python -m pip install --upgrade pip setuptools wheel
python -m pip install -r requirements.txt
python -m pip install -e .
python -m unittest discover -s tests# In-distribution ID test replay.
python scripts/run_id.py --output-dir outputs/tmp/id
# ACRouter on OOD176.
python scripts/run_acrouter_ood176.py --output-dir outputs/tmp/acrouter_ood176
# OOD176 baselines and paper-style table.
python scripts/run_baselines_ood176.py --output-dir outputs/tmp/baselines_ood176Expected headline outputs:
ID n=2919 AvgPerf=50.14 CumReg=202.0 $Total=22.31 Perf/$=2.25 rAcc=0.2395
ACRouter-OOD176 n=176 AvgPerf=73.30 CumReg=15.9 $Total=86.72 Perf/$=0.85
All commands above write to outputs/tmp/ so checked-in reference outputs are
not overwritten.
Dataset:
Lance1573/CodeRouterBench
Optional trained router adapter:
Lance1573/acrouter-qwen35-08b-router-lora
Download only the files needed for OOD176 replay:
python scripts/download_hf_assets.py --minimal --dataset-dir .hf/CodeRouterBenchDownload the benchmark plus the optional trained router adapter:
python scripts/download_hf_assets.py \
--minimal \
--with-router-model \
--dataset-dir .hf/CodeRouterBench \
--model-dir .hf/router_modelRun directly from a downloaded Hugging Face snapshot:
python scripts/run_acrouter_ood176.py \
--hf-dataset-dir .hf/CodeRouterBench \
--output-dir outputs/tmp/acrouter_ood176_hf
python scripts/run_baselines_ood176.py \
--hf-dataset-dir .hf/CodeRouterBench \
--output-dir outputs/tmp/baselines_ood176_hfThe replay matrix used by these commands is:
raw_matrices/phase2_ood/unified/matrix_acrouter_ood176.json
demos/api_coding_solver/ routes one programming problem through an
OpenRouter/OpenAI-compatible model list until a verifier passes.
export OPENROUTER_API_KEY="<set-this-in-your-shell>"
python demos/api_coding_solver/solve.py \
--config demos/api_coding_solver/models.example.json \
--problem-file demos/api_coding_solver/problems/two_sum.txt \
--dry-run
python demos/api_coding_solver/solve.py \
--config demos/api_coding_solver/models.example.json \
--problem-file demos/api_coding_solver/problems/two_sum.txtEdit demos/api_coding_solver/models.example.json to add providers, model
names, or a stronger verifier command.
demos/commercial_cli_router/ routes a prompt into a local Codex, Claude Code,
or opencode CLI command.
python demos/commercial_cli_router/router_mvp.py \
--prompt "Patch this repository so pytest passes" \
--dry-run
python demos/commercial_cli_router/router_mvp.py \
--tool codex \
--workdir /path/to/project \
--prompt "Run the tests and fix the failing parser case"Command templates live in
demos/commercial_cli_router/tools.example.json. A private local npm wrapper is
available under demos/commercial_cli_router/npm/ for npm link workflows.
Set ACROUTER_CODEX_PREFIX, ACROUTER_CLAUDE_PREFIX, or
ACROUTER_OPENCODE_PREFIX to wrap a backend with ccswitch or another command
switcher.
Use the config-driven pipeline when you already have task/model results:
python scripts/run_pipeline.py --config configs/eval_pipeline.example.jsonThe config can point to either a ready matrix or tasks.jsonl plus
model_results.jsonl. Result rows should include task_id, model, and
either resolved or score; token fields are optional but allow cost
calculation.
For inference-time integration, import ACRouter:
from acrouter_repro.inference import ACRouter
router = ACRouter(
candidate_models=["cheap-model", "strong-model"],
cheap_chain=["cheap-model"],
escalate_to="strong-model",
k=1,
)
decision = router.run_with_verifier(
task={"task_id": "task_001", "dimension": "bug_fixing", "prompt": "..."},
call_model=lambda model, task: call_your_backend(model, task),
verify=lambda response, task, model: run_your_checks(response, task),
)
print(decision.chosen_model, decision.final_response)See examples/inference_demo.py for a complete mock workflow.
CodeRouterBench is a task-by-model benchmark release:
data/coderouterbench/id_results_long.csv: 9,999 ID tasks x 8 models.data/coderouterbench/ood176_results_long.csv: 176 OOD tasks x 8 models.data/coderouterbench/id_tasks.jsonlandood176_tasks.jsonl: task metadata.data/coderouterbench/models.json: canonical backend models and USD pricing.outputs/baselines_ood176/: checked-in reference tables and decisions.
All bundled USD costs are computed from
data/matrices/phase1_id/model_pricing.json, the pricing table mirrored from
the internal release source.
The public OOD176 matrix is under
data/matrices/phase2_ood/unified/matrix_acrouter_ood176.json. Raw old112 and
new64 snapshots remain under data/matrices/phase2_ood/raw/.
configs/ Example evaluation pipeline config
demos/ API and commercial-CLI router demos
examples/ Custom benchmark and inference examples
src/acrouter_repro/ Reproduction and inference package
src/routing/ Baseline router implementations
scripts/ Replay, export, download, and pipeline scripts
tests/ Unit and bundle-integrity tests
data/coderouterbench/ Public CodeRouterBench tables
data/matrices/ ID/OOD source matrices and pricing
outputs/ Checked-in reference outputs
agentic-artifacts/ Agent-readable research evidence
python -m unittest discover -s tests
python -m py_compile $(find scripts src tests examples demos -name '*.py' -print)
bash scripts/check_no_secrets.shInstall requirements-sandbox.txt only if you need to run live Docker or
Apptainer verification. The default offline replay uses cached results.
@article{agent2026zhou,
title = {Agent-as-a-Router: Agentic Model Routing for Coding Tasks},
author = {Pengfei Zhou, Zhiwei Tang, Yixing Ma, Jiasheng Tang, Yizeng Han, Zhenglin Wan, Fanqing Meng, Wei Wang, Bohan Zhuang, Wangbo Zhao, Yang You},
journal = {arXiv preprint arXiv:2606.22902},
year = {2026},
archivePrefix = {arXiv},
eprint = {2606.22902},
url = {https://arxiv.org/abs/2606.22902}
}