Skip to content

LanceZPF/agent-as-a-router

Repository files navigation

Agent-as-a-Router

The official implementations of Agent-as-a-Router: Agentic Model Routing for Coding Tasks.

Homepage Paper Benchmark

Homepage | Paper | CodeRouterBench

ACRouter routes coding tasks to different backend models under a performance-cost tradeoff. This repository contains the reproducible implementation, CodeRouterBench data, reference outputs, and small demos for plugging the router into your own workflow.

Offline reproduction does not require API keys or live model calls. The current public benchmark is OOD176. Older OOD112/SWE-MiniSandbox assets are kept as legacy supplementary data under data/ood/.

For the full pre-cleanup guide with maintainer notes, extended data details, and publishing commands, see docs/HANDBOOK.md.

Quick Start

cd open-source-acrouter

conda create -n acrouter python=3.11 -y
conda activate acrouter

python -m pip install --upgrade pip setuptools wheel
python -m pip install -r requirements.txt
python -m pip install -e .

python -m unittest discover -s tests

Reproduce The Main Results

# In-distribution ID test replay.
python scripts/run_id.py --output-dir outputs/tmp/id

# ACRouter on OOD176.
python scripts/run_acrouter_ood176.py --output-dir outputs/tmp/acrouter_ood176

# OOD176 baselines and paper-style table.
python scripts/run_baselines_ood176.py --output-dir outputs/tmp/baselines_ood176

Expected headline outputs:

ID n=2919 AvgPerf=50.14 CumReg=202.0 $Total=22.31 Perf/$=2.25 rAcc=0.2395
ACRouter-OOD176 n=176 AvgPerf=73.30 CumReg=15.9 $Total=86.72 Perf/$=0.85

All commands above write to outputs/tmp/ so checked-in reference outputs are not overwritten.

Hugging Face Assets

Dataset: Lance1573/CodeRouterBench

Optional trained router adapter: Lance1573/acrouter-qwen35-08b-router-lora

Download only the files needed for OOD176 replay:

python scripts/download_hf_assets.py --minimal --dataset-dir .hf/CodeRouterBench

Download the benchmark plus the optional trained router adapter:

python scripts/download_hf_assets.py \
  --minimal \
  --with-router-model \
  --dataset-dir .hf/CodeRouterBench \
  --model-dir .hf/router_model

Run directly from a downloaded Hugging Face snapshot:

python scripts/run_acrouter_ood176.py \
  --hf-dataset-dir .hf/CodeRouterBench \
  --output-dir outputs/tmp/acrouter_ood176_hf

python scripts/run_baselines_ood176.py \
  --hf-dataset-dir .hf/CodeRouterBench \
  --output-dir outputs/tmp/baselines_ood176_hf

The replay matrix used by these commands is:

raw_matrices/phase2_ood/unified/matrix_acrouter_ood176.json

Demo Routers

API Coding Solver

demos/api_coding_solver/ routes one programming problem through an OpenRouter/OpenAI-compatible model list until a verifier passes.

export OPENROUTER_API_KEY="<set-this-in-your-shell>"

python demos/api_coding_solver/solve.py \
  --config demos/api_coding_solver/models.example.json \
  --problem-file demos/api_coding_solver/problems/two_sum.txt \
  --dry-run

python demos/api_coding_solver/solve.py \
  --config demos/api_coding_solver/models.example.json \
  --problem-file demos/api_coding_solver/problems/two_sum.txt

Edit demos/api_coding_solver/models.example.json to add providers, model names, or a stronger verifier command.

Codex / Claude Code / opencode Router

demos/commercial_cli_router/ routes a prompt into a local Codex, Claude Code, or opencode CLI command.

python demos/commercial_cli_router/router_mvp.py \
  --prompt "Patch this repository so pytest passes" \
  --dry-run

python demos/commercial_cli_router/router_mvp.py \
  --tool codex \
  --workdir /path/to/project \
  --prompt "Run the tests and fix the failing parser case"

Command templates live in demos/commercial_cli_router/tools.example.json. A private local npm wrapper is available under demos/commercial_cli_router/npm/ for npm link workflows. Set ACROUTER_CODEX_PREFIX, ACROUTER_CLAUDE_PREFIX, or ACROUTER_OPENCODE_PREFIX to wrap a backend with ccswitch or another command switcher.

Add Your Own Benchmark Or Models

Use the config-driven pipeline when you already have task/model results:

python scripts/run_pipeline.py --config configs/eval_pipeline.example.json

The config can point to either a ready matrix or tasks.jsonl plus model_results.jsonl. Result rows should include task_id, model, and either resolved or score; token fields are optional but allow cost calculation.

For inference-time integration, import ACRouter:

from acrouter_repro.inference import ACRouter

router = ACRouter(
    candidate_models=["cheap-model", "strong-model"],
    cheap_chain=["cheap-model"],
    escalate_to="strong-model",
    k=1,
)

decision = router.run_with_verifier(
    task={"task_id": "task_001", "dimension": "bug_fixing", "prompt": "..."},
    call_model=lambda model, task: call_your_backend(model, task),
    verify=lambda response, task, model: run_your_checks(response, task),
)

print(decision.chosen_model, decision.final_response)

See examples/inference_demo.py for a complete mock workflow.

Data And Pricing

CodeRouterBench is a task-by-model benchmark release:

  • data/coderouterbench/id_results_long.csv: 9,999 ID tasks x 8 models.
  • data/coderouterbench/ood176_results_long.csv: 176 OOD tasks x 8 models.
  • data/coderouterbench/id_tasks.jsonl and ood176_tasks.jsonl: task metadata.
  • data/coderouterbench/models.json: canonical backend models and USD pricing.
  • outputs/baselines_ood176/: checked-in reference tables and decisions.

All bundled USD costs are computed from data/matrices/phase1_id/model_pricing.json, the pricing table mirrored from the internal release source.

The public OOD176 matrix is under data/matrices/phase2_ood/unified/matrix_acrouter_ood176.json. Raw old112 and new64 snapshots remain under data/matrices/phase2_ood/raw/.

Project Layout

configs/                       Example evaluation pipeline config
demos/                         API and commercial-CLI router demos
examples/                      Custom benchmark and inference examples
src/acrouter_repro/            Reproduction and inference package
src/routing/                   Baseline router implementations
scripts/                       Replay, export, download, and pipeline scripts
tests/                         Unit and bundle-integrity tests

data/coderouterbench/          Public CodeRouterBench tables
data/matrices/                 ID/OOD source matrices and pricing
outputs/                       Checked-in reference outputs
agentic-artifacts/             Agent-readable research evidence

Development Checks

python -m unittest discover -s tests
python -m py_compile $(find scripts src tests examples demos -name '*.py' -print)
bash scripts/check_no_secrets.sh

Install requirements-sandbox.txt only if you need to run live Docker or Apptainer verification. The default offline replay uses cached results.

Citation

@article{agent2026zhou,
  title         = {Agent-as-a-Router: Agentic Model Routing for Coding Tasks},
  author        = {Pengfei Zhou, Zhiwei Tang, Yixing Ma, Jiasheng Tang, Yizeng Han, Zhenglin Wan, Fanqing Meng, Wei Wang, Bohan Zhuang, Wangbo Zhao, Yang You},
  journal       = {arXiv preprint arXiv:2606.22902},
  year          = {2026},
  archivePrefix = {arXiv},
  eprint        = {2606.22902},
  url           = {https://arxiv.org/abs/2606.22902}
}

About

The official implementations of Agent-as-a-Router: Agentic Model Routing for Coding Tasks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors