CompMath-MCQ

This repository contains the code and data for the paper "The CompMath-MCQ Dataset: Are LLMs Ready for Higher-Level Math?". The dataset consists of multiple-choice questions (MCQs) designed to evaluate the mathematical capabilities of large language models (LLMs).

Prerequisites

# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install required packages
pip install -r requirements.txt

Use lm_eval library

The dataset is stored in my_eval_task folder as mcq_lm_eval_data.jsonl. This must be replaced in the .venv folder of lm_eval under .venv/lib/python3.10/site-packages/lm_eval/tasks/my_custom_task/ to be used for testing. If in that path there is already a file named mcq_lm_eval_data.jsonl, replace it. If there is no such folder my_custom_task, create it. The folder must contain also the file my_mcq_task.yaml, which will define the task for lm_eval. An example of such file is provided in the my_eval_task folder.

To run the experiments using lm_eval:

Use the script test_script.sh provided. Make sure to modify the path to the models you want to test. The script will run the evaluation and save the results in a directory named results/{model_name}.

Citation

If you find this dataset useful in your research, please consider citing our paper:

@article{raimondi2026compmath,
  title={The CompMath-MCQ Dataset: Are LLMs Ready for Higher-Level Math?},
  author={Raimondi, Bianca and Pivi, Francesco and Evangelista, Davide and Gabbrielli, Maurizio},
  journal={arXiv preprint arXiv:2603.03334},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
my_eval_task		my_eval_task
.env		.env
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test.py		test.py
test_script.sh		test_script.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CompMath-MCQ

Prerequisites

Use lm_eval library

To run the experiments using lm_eval:

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CompMath-MCQ

Prerequisites

Use lm_eval library

To run the experiments using lm_eval:

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages