CodeFuse-CommitEval

CodeFuse-CommitEval is the first benchmark tailored to commit Message-Code Inconsistency (MCI) detection with large language models (LLMs). Building on the ApacheCM dataset for diversity and quality, we synthesize seven types of inconsistent messages via rule-guided mutations of originally consistent commits and apply two-fold validation to verify both positive (inconsistent) and negative (consistent) samples. Using this rich and labeled dataset of message–diff pairs, we then evaluate six state-of-the-art open-source LLMs under a vanilla setting and with three augmentation strategies: few-shot prompting, chain-of-thought (CoT), and extended context.

Features

Multilingual & large-scale dataset
Even distruction of samples
Rich inconsistent commit types
Modular commit mutation rules
Effective verification for synthesized samples

Related Project

ApacheCM Dataset - Contextual Code Retrieval for Commit Message Generation: A Preliminary Study

Documentations

Environment Setup

Execute under Python 3.9.6

python3 -m pip install langchain langchain_openai langchain_community

Benchmarking

First you need to download all the repositories for contextual code retrieval:

python3 evaluation/clone_repos.py <dataset_json_path> <repo_collection_path>

Then, you need to deploy the targeted models by yourself, or use public apis. In our paper, we evaluated the following models:

DeepSeek-V3.1 (Remote API)
gpt-oss-20b (Local deployment)
Qwen3-30B-A3B (Local deployment)
Llama-3.1-8B (Local deployment)
Mistral-Small-3.2-24B (Local deployment)
Kimi-K2-Instruct (Remote API)

Run benchmarking:

python3 evaluation/evaluate_main.py \
    -s {pure_llm,fewshot_llm,cot_llm} \
    --ctx <context_code_lines> \
    -d <dataset_json_path> \
    -r <repo_collection_path> \
    --api_key <api_key> \
    --api-base <base_url> \
    --model <model_name> \
    -o <output_json_path> \
    --worker <concurrent_worker_num>

Contribution

We welcome and encourage contributions from the community! If you're interested in contributing to this project, please follow these guidelines:

Identify a Need: Before submitting a pull request (PR), ensure that your contribution addresses a real need or improvement for the project.
Submit a PR: Create a pull request with a clear description of:
- The problem or feature request you're addressing
- How your changes solve the problem or implement the feature
- Any relevant test cases or documentation updates
Review Process: Our team will review your PR based on:
- Whether the contribution addresses a genuine need for the project
- The quality and correctness of the implementation
- Adherence to the project's coding standards and architecture

We appreciate your interest in making CodeFuse-CommitEval better.

Citation

@misc{zhang2025codefusecommitevalbenchmarkingllmspower,
      title={CodeFuse-CommitEval: Towards Benchmarking LLM's Power on Commit Message and Code Change Inconsistency Detection}, 
      author={Qingyu Zhang and Puzhuo Liu and Peng Di and Chenxiong Qian},
      year={2025},
      eprint={2511.19875},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2511.19875}, 
}

License

CodeFuse-CommitEval is licensed under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
data_synthesis		data_synthesis
evaluation		evaluation
LEGAL.md		LEGAL.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CodeFuse-CommitEval

Features

Related Project

Documentations

Environment Setup

Benchmarking

Contribution

Citation

License

About

Uh oh!

Releases

Packages

Languages

License

codefuse-ai/CodeFuse-CommitEval

Folders and files

Latest commit

History

Repository files navigation

CodeFuse-CommitEval

Features

Related Project

Documentations

Environment Setup

Benchmarking

Contribution

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages