Releases · JohnYCChiang/holon-bench

Holon-Bench v0.1.0-alpha

First public pre-release of Holon-Bench, an open-source benchmark harness for evaluating AI coding agents on maintainer-style workflows.

9 benchmark tracks: Python tool engineering, Rust core, Rust Bevy, Rust porting, Go core, Go game server, Flutter cross-platform, graph memory workflow, repair needed
Phase 1: 35 cases (5 per track) validating runner/scorer/report plumbing
Deterministic runners with verifier-feedback repair loop support
Repair cost metrics: first_pass, repaired_pass, repair_tax_rate
Hidden and mutation verifier architecture (Phase 2+)
JSON schemas for cases, results, scores, and failures
GitHub Actions CI: schema check, py compile, smoke test
Minimal example case for onboarding new contributors
OSS maintainer use case documentation

Model	python_tool first_pass	rust_porting first_pass
qwen36-27b-mtp-q4 (local)	3/5	2/5
gemma3-27b-q4 (local)	2/5	1/5