Release v1.6.0 — Hybrid Coding Arena (rebrand) · RunanywhereAI/hybrid-arena

Hybrid Coding Arena is the new name for this benchmark (formerly hybrid-coding-eval), from RunAnywhere.

What changed in v1.6.0

This is a rebrand release. The benchmark, methodology, and dataset are unchanged.

Python package: hybrid_coding_eval is now hybrid_arena.
Distribution + repo: hybrid-arena (the old hybrid-coding-eval URL redirects here).
CLI command: bench is now arena (e.g. arena sweep, arena analyze).
Clearer headline chart: pass-rate and cloud usage are now separate, labeled elements.

git clone https://github.com/RunanywhereAI/hybrid-arena
cd hybrid-arena && python3.12 -m venv .venv
.venv/bin/pip install -e ".[dev,agents]"
arena setup
arena sweep --config configs/v1.4-smoke.yaml --strategies always-cloud --seeds 42
arena analyze results/runs/v1.4-smoke

Dataset

results-v1.6.0.tar.gz is byte-identical to the v1.5.0/v1.5.1 dataset (1,704 rows). No new benchmark runs in this release.

Headline

cline + qwen3.6 + cascade on real-developer refactors: 24/24 = 100% at 8% cloud, about $0.022/task.
Local-only solves 67% of the hard (D6) tasks at $0 cloud; cloud-only holds 100%.
1,704 rows, 3 local models, 3 coding agents, 8 routing strategies, 17 tasks, one M4 Max laptop, 95% bootstrap CIs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.6.0 — Hybrid Coding Arena (rebrand)

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What changed in v1.6.0

Dataset

Headline

Uh oh!