Skip to content

FeatureBench: Benchmarking Agentic Coding for Complex Feature Development [ICLR 2026]

License

Notifications You must be signed in to change notification settings

LiberCoders/FeatureBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

logo

arXiv License DockerHub HuggingFace Leaderboard


FeatureBench is a test-driven data generation and evaluation pipeline for feature-level coding benchmarks. It provides a unified CLI to run inference, evaluation, and dataset generation.

📰 News

🎁 2026.02.06: We now support one-click inference for mainstream agent frameworks, including OpenHands, Claude Code, Codex, Gemini CLI, and mini-swe-agent. All supported agent frameworks can be found here. We have also open-sourced the FeatureBench data pipeline.

🚀 Quickstart

Prerequisites:

  • uv for Python environment management
  • docker for reproducible builds and evaluation
# pypi
pip install featurebench
# or uv add featurebench

# local
git clone https://github.com/LiberCoders/FeatureBench.git
cd FeatureBench
uv sync

Configure:

cp config_example.toml config.toml

See docs/config.md for a comprehensive reference (harness, infer, data pipeline) with examples.

Optional: pre-pull images to reduce network variance:

fb pull --mode lite                 # lite split image list (13 images)
fb pull --mode full                 # full split image list (24 images)
fb pull --mode /path/to/images.txt  # one image name per line

# full list: featurebench/resources/constants/full_images.txt
# lite list: featurebench/resources/constants/lite_images.txt

Run inference:

fb infer \
    --config-path config.toml \
    --agent mini_swe_agent \
    --model openai/qwen3-coder-480b-a35b-instruct \
    --split lite

Run evaluation:

fb eval \
    -p runs/<timestamp>/output.jsonl \
    --split lite

🧭 CLI Overview

fb provides three core commands:

✍️ Citation

If you found FeatureBench useful, please cite us as:

@misc{zhou2026featurebenchbenchmarkingagenticcoding,
      title={FeatureBench: Benchmarking Agentic Coding for Complex Feature Development}, 
      author={Qixing Zhou and Jiacheng Zhang and Haiyang Wang and Rui Hao and Jiahe Wang and Minghao Han and Yuxue Yang and Shuzhe Wu and Feiyang Pan and Lue Fan and Dandan Tu and Zhaoxiang Zhang},
      year={2026},
      eprint={2602.10975},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2602.10975}, 
}

📧 Contact

If you have any questions, feel free to contact qixingzhou1125@gmail.com or zjcheng2022@gmail.com.

About

FeatureBench: Benchmarking Agentic Coding for Complex Feature Development [ICLR 2026]

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published