Making Multimodal LLMs Reliable Chart Data Extractors: A Benchmark and Training Framework
Yuchen He, Peizhi Ying, Liqi Cheng, Kuilin Peng, Yuan Tian, Dazhen Deng, Yingcai Wu
Published at CHI 2026 | Paper | Website
Chart data extraction, which reverse-engineers data tables from chart images, is essential for reproducibility, analysis, retrieval, and redesign. Existing interactive tools are reliable but tedious, and mixed-initiative systems, while more efficient, lack generalizability. Recent multimodal large language models (MLLMs) offer a unified interface for chart interpretation, yet their ability to extract accurate data tables, especially without visible labels, remains unclear. We build a benchmark featuring diverse real-world charts without data labels to evaluate this capability. Results show that, while current MLLMs reliably reconstruct table structures, they struggle with precise value recovery. To address this, we revisit chart data extraction from a human-centered perspective and argue that extraction should follow a progressive learning process similar to how people read charts. Our training framework substantially improves numerical accuracy, achieving state-of-the-art performance with a 7B-parameter model. A user study further shows that our model effectively supports mixed-initiative workflows for reliable chart data extraction.
- [2026-03] We manually debugged the dataset, fixing several annotation errors and image issues, and improving the accuracy of manually extracted values.
- [2026-03] Release of ExChart-Bench code and dataset.
- [2026-02] Paper accepted at CHI 2026!
conda create -n exchart python=3.10 -y
conda activate exchart
pip install -r requirements.txtDownload the dataset from Google Drive, and place the dataset folder in the root directory of this repository.
generate.sh provides an example of prompting Qwen2.5-VL-Instruct to extract data from charts.
bash generate.shResults are saved in the results/ directory as JSON files with the following structure:
{
"00000.png": {
"figure_id": "00000.png",
"response": "```csv\n\"class\",\"Romania\",\"Togo\",\"Zambia\"\n\"1995\",63000,146000,218000\n\"2001\",47000,89000,286000\n\"2004\",5000,70000,141000\n\"2005\",5000,70000,48000\n```"
},
...
}evaluate.sh provides an example of evaluating the generated results of Qwen2.5-VL-Instruct on the ExChart-Bench dataset.
bash evaluate.shIf you find our work useful for your research please cite:
@inproceedings{he2026exchart,
author = {He, Yuchen and Ying, Peizhi and Cheng, Liqi and Peng, Kuilin and Tian, Yuan and Deng, Dazhen and Wu, Yingcai},
title = {Making Multimodal LLMs Reliable Chart Data Extractors: A Benchmark and Training Framework},
year = {2026},
isbn = {9798400722783},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3772318.3790721},
doi = {10.1145/3772318.3790721},
booktitle = {Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems},
series = {CHI '26}
}
This project is licensed under GNU GPL v3.