StatsChartMWP

This is the official repository for the paper "StatsChartMWP: A Dataset for Evaluating Multimodal Mathematical Reasoning Abilities on Math Word Problems with Statistical Charts", the paper link is coming soon.

🏆 Leaderboard

The leaderboard is continuously being updated. If you have any new results to contribute, please feel free to reach out to us.

#	Model	Method	Date	ALL	Bar	Hist	Line	Line-f	Scatter	D-axis	P-bar	Pie	Table	Comp	Radar
1	o3	LMM	2025-09-08	82.75	81.73	77.71	76.96	71.97	83.12	82.81	90.91	88.10	93.23	83.98	33.33
2	Qwen2.5-VL-72B	LMM	2025-09-08	71.12	78.45	59.51	68.45	56.90	54.37	65.62	63.64	65.78	85.89	61.07	41.67
3	Qwen2-VL-72B	LMM	2025-02-23	59.33	69.91	39.29	60.03	46.44	43.75	62.50	59.09	65.78	77.12	50.39	62.50
4	GPT-4o	LMM	2025-02-23	57.05	66.51	26.38	58.76	42.26	45.62	68.75	54.55	72.57	81.54	49.50	45.83
5	InternVL2_5-78B	LMM	2025-02-23	55.25	70.93	29.26	56.12	40.59	48.75	57.81	54.55	57.01	74.27	51.84	37.04
6	GPT4 (GPT-4o)	LLM	2025-02-23	46.95	59.98	13.30	52.72	35.98	27.50	45.31	27.27	59.19	71.85	38.82	20.83
7	InternVL2-Llama3-76B	LMM	2025-02-23	45.02	58.81	24.58	50.43	35.98	43.12	42.19	13.64	48.08	57.38	35.37	29.17
8	Qwen2-VL-7B	LMM	2025-02-23	37.46	45.67	20.16	39.29	30.96	31.25	65.62	36.36	44.54	51.25	25.70	62.50
9	GPT-4V	LMM	2025-02-23	34.28	38.57	12.10	40.48	28.87	30.00	39.06	18.18	38.25	55.67	27.89	33.33
10	LLaVA-OV-72B	LMM	2025-02-23	32.39	38.33	15.26	39.80	30.54	35.62	42.19	31.82	34.32	45.97	22.91	16.67
11	GPT4 (GPT-4V)	LLM	2025-02-23	31.47	38.11	8.61	39.12	22.18	20.62	35.94	4.55	34.71	52.46	24.36	20.83
12	Qwen-VL-MAX	LLM	2025-02-23	30.24	37.40	10.19	29.51	19.25	20.00	29.69	18.18	37.86	54.74	16.91	33.33
13	IXC-2.5-7B	LMM	2025-02-23	22.55	31.10	7.36	29.25	17.99	18.75	43.75	18.18	24.88	29.72	15.02	41.67
14	Cambrian-34B	LMM	2025-02-23	18.15	22.03	8.77	27.89	14.23	18.75	46.88	22.73	16.52	20.24	14.02	41.67
15	LLaVA-NeXT-34B	LMM	2025-02-23	15.67	20.96	5.45	23.13	13.39	20.00	25.00	4.55	14.06	19.24	12.44	20.83
16	DeepSeek-VL-7B	LMM	2025-02-23	13.20	16.06	4.63	21.43	11.72	12.50	28.12	4.55	14.16	15.47	9.78	8.33
17	HPT-1.0	LMM	2025-02-23	10.10	9.91	5.07	17.77	9.62	10.62	26.56	9.09	7.18	10.62	11.56	29.17

📐 StatsChartMWP Dataset

The StatsChartMWP dataset is designed as a benchmark to develop AI models capable of understanding multimodal information present in math word problems with statistical charts. Our dataset incorporates a variety of chart forms, presenting a broad visual spectrum and mathematical knowledge competencies and each item originates from real-world educational contexts, encompassing challenges formulated by mathematics educators, genuine student inquiries, and historical examination questions. The StatsChartMWP dataset encompasses 8,514 unique MWPs with statistical charts. The StatsChartMWP dataset contains 11 different types of statistical charts, including bar, line, line-function, dual-axis, pie, composite, radar, histograms, scatter, percentage-bar, tables. A comparative example between our dataset and ChartQA and FigureQA is shown below. R-Steps means the average reasoning steps of the dataset.

The StatsChartMWP dataset json file and images are provided in [data].

🌟 CoTAR

Introduction

We introduce CoTAR, a data augmentation strategy that utilizes CoT augmented reasoning to alleviate the cross-modal alignment between representations of visual mediums of artificial figures and technical language and equations. Specifically, instead of directly using the concise textual solutions of the MWPs, we use the state-of-the-art LLM, so convert them into detailed step-by-step explanations in a CoT-alike format to improve their logical clarity. Furthermore, each step is made up of a short step summary that explicitly states the purpose of this step and a concrete reasoning response. The step summary serves as a guiding directive for the logical analysis or computation required in the current step, while the concrete reasoning response provides a detailed explanation of the process undertaken in response to the step summary. The architecture of our method illustrated in follow:

An illustration of CoTAR. (a) the original MWP with statistical chart. (b) the corresponding original solution. (c) the solution of CoTAR. The bold words are the step summaries and the following sentences are reasoning responses.

We conducted fine-tuning on Qwen2-VL-7B. By employing both problem-original solution pairs and problem-augmented solution pairs on our proprietary training dataset, we achieved a 8.76% improvement in algorithmic accuracy.

Quick Start

Finetune

Finetune the Qwen2-VL-7B, you can see the official GitHub repository of Qwen2-VL-7B.

CoTAR

the prompt of CoTAR is provided in prompts. You can run the main code to get the CoTAR solution data.

python main.py

License

This work is marked with CC0 1.0

Related Work

Explore our additional research on Vision-Language Large Models, focusing on multi-modal LLMs and mathematical reasoning:

[ChartQA] ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
[TABMWP] DYNAMIC PROMPT LEARNING VIA POLICY GRADIENT FOR SEMI-STRUCTURED MATHEMATICAL REASONING
[MathVista] MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
[MathVerse] MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
[MATH-Vision] Measuring Multimodal Mathematical Reasoning with the MATH-Vision Dataset
[OlympiadBench] OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
[InternVL] InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
[LLaVA] LLaVA: Large Language and Vision Assistant

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets/figures		assets/figures
data		data
prompts		prompts
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

StatsChartMWP

🏆 Leaderboard

📐 StatsChartMWP Dataset

🌟 CoTAR

Introduction

Quick Start

Finetune

CoTAR

License

Related Work

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

ai4ed/StatsChartMWP

Folders and files

Latest commit

History

Repository files navigation

StatsChartMWP

🏆 Leaderboard

📐 StatsChartMWP Dataset

🌟 CoTAR

Introduction

Quick Start

Finetune

CoTAR

License

Related Work

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages