Hybrid Coding Arena is the new name for this benchmark (formerly hybrid-coding-eval), from RunAnywhere.
What changed in v1.6.0
This is a rebrand release. The benchmark, methodology, and dataset are unchanged.
- Python package:
hybrid_coding_evalis nowhybrid_arena. - Distribution + repo:
hybrid-arena(the oldhybrid-coding-evalURL redirects here). - CLI command:
benchis nowarena(e.g.arena sweep,arena analyze). - Clearer headline chart: pass-rate and cloud usage are now separate, labeled elements.
git clone https://github.com/RunanywhereAI/hybrid-arena
cd hybrid-arena && python3.12 -m venv .venv
.venv/bin/pip install -e ".[dev,agents]"
arena setup
arena sweep --config configs/v1.4-smoke.yaml --strategies always-cloud --seeds 42
arena analyze results/runs/v1.4-smokeDataset
results-v1.6.0.tar.gz is byte-identical to the v1.5.0/v1.5.1 dataset (1,704 rows). No new benchmark runs in this release.
Headline
cline + qwen3.6 + cascadeon real-developer refactors: 24/24 = 100% at 8% cloud, about $0.022/task.- Local-only solves 67% of the hard (D6) tasks at $0 cloud; cloud-only holds 100%.
- 1,704 rows, 3 local models, 3 coding agents, 8 routing strategies, 17 tasks, one M4 Max laptop, 95% bootstrap CIs.