Skip to content

ai4ed/StatsChartMWP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StatsChartMWP

MWP Mathematical Reasoning Multimodal Reasoning

GPT-4 GPT-4V GPT-4o

This is the official repository for the paper "StatsChartMWP: A Dataset for Evaluating Multimodal Mathematical Reasoning Abilities on Math Word Problems with Statistical Charts", the paper link is coming soon.

🏆 Leaderboard

The leaderboard is continuously being updated. If you have any new results to contribute, please feel free to reach out to us.

# Model Method Date ALL Bar Hist Line Line-f Scatter D-axis P-bar Pie Table Comp Radar
1 o3 LMM 2025-09-08 82.75 81.73 77.71 76.96 71.97 83.12 82.81 90.91 88.10 93.23 83.98 33.33
2 Qwen2.5-VL-72B LMM 2025-09-08 71.12 78.45 59.51 68.45 56.90 54.37 65.62 63.64 65.78 85.89 61.07 41.67
3 Qwen2-VL-72B LMM 2025-02-23 59.33 69.91 39.29 60.03 46.44 43.75 62.50 59.09 65.78 77.12 50.39 62.50
4 GPT-4o LMM 2025-02-23 57.05 66.51 26.38 58.76 42.26 45.62 68.75 54.55 72.57 81.54 49.50 45.83
5 InternVL2_5-78B LMM 2025-02-23 55.25 70.93 29.26 56.12 40.59 48.75 57.81 54.55 57.01 74.27 51.84 37.04
6 GPT4 (GPT-4o) LLM 2025-02-23 46.95 59.98 13.30 52.72 35.98 27.50 45.31 27.27 59.19 71.85 38.82 20.83
7 InternVL2-Llama3-76B LMM 2025-02-23 45.02 58.81 24.58 50.43 35.98 43.12 42.19 13.64 48.08 57.38 35.37 29.17
8 Qwen2-VL-7B LMM 2025-02-23 37.46 45.67 20.16 39.29 30.96 31.25 65.62 36.36 44.54 51.25 25.70 62.50
9 GPT-4V LMM 2025-02-23 34.28 38.57 12.10 40.48 28.87 30.00 39.06 18.18 38.25 55.67 27.89 33.33
10 LLaVA-OV-72B LMM 2025-02-23 32.39 38.33 15.26 39.80 30.54 35.62 42.19 31.82 34.32 45.97 22.91 16.67
11 GPT4 (GPT-4V) LLM 2025-02-23 31.47 38.11 8.61 39.12 22.18 20.62 35.94 4.55 34.71 52.46 24.36 20.83
12 Qwen-VL-MAX LLM 2025-02-23 30.24 37.40 10.19 29.51 19.25 20.00 29.69 18.18 37.86 54.74 16.91 33.33
13 IXC-2.5-7B LMM 2025-02-23 22.55 31.10 7.36 29.25 17.99 18.75 43.75 18.18 24.88 29.72 15.02 41.67
14 Cambrian-34B LMM 2025-02-23 18.15 22.03 8.77 27.89 14.23 18.75 46.88 22.73 16.52 20.24 14.02 41.67
15 LLaVA-NeXT-34B LMM 2025-02-23 15.67 20.96 5.45 23.13 13.39 20.00 25.00 4.55 14.06 19.24 12.44 20.83
16 DeepSeek-VL-7B LMM 2025-02-23 13.20 16.06 4.63 21.43 11.72 12.50 28.12 4.55 14.16 15.47 9.78 8.33
17 HPT-1.0 LMM 2025-02-23 10.10 9.91 5.07 17.77 9.62 10.62 26.56 9.09 7.18 10.62 11.56 29.17

📐 StatsChartMWP Dataset

The StatsChartMWP dataset is designed as a benchmark to develop AI models capable of understanding multimodal information present in math word problems with statistical charts. Our dataset incorporates a variety of chart forms, presenting a broad visual spectrum and mathematical knowledge competencies and each item originates from real-world educational contexts, encompassing challenges formulated by mathematics educators, genuine student inquiries, and historical examination questions. The StatsChartMWP dataset encompasses 8,514 unique MWPs with statistical charts. The StatsChartMWP dataset contains 11 different types of statistical charts, including bar, line, line-function, dual-axis, pie, composite, radar, histograms, scatter, percentage-bar, tables. A comparative example between our dataset and ChartQA and FigureQA is shown below. R-Steps means the average reasoning steps of the dataset.

domains

The StatsChartMWP dataset json file and images are provided in [data].

🌟 CoTAR

Introduction

We introduce CoTAR, a data augmentation strategy that utilizes CoT augmented reasoning to alleviate the cross-modal alignment between representations of visual mediums of artificial figures and technical language and equations. Specifically, instead of directly using the concise textual solutions of the MWPs, we use the state-of-the-art LLM, so convert them into detailed step-by-step explanations in a CoT-alike format to improve their logical clarity. Furthermore, each step is made up of a short step summary that explicitly states the purpose of this step and a concrete reasoning response. The step summary serves as a guiding directive for the logical analysis or computation required in the current step, while the concrete reasoning response provides a detailed explanation of the process undertaken in response to the step summary. The architecture of our method illustrated in follow:


An illustration of CoTAR. (a) the original MWP with statistical chart. (b) the corresponding original solution. (c) the solution of CoTAR. The bold words are the step summaries and the following sentences are reasoning responses.

We conducted fine-tuning on Qwen2-VL-7B. By employing both problem-original solution pairs and problem-augmented solution pairs on our proprietary training dataset, we achieved a 8.76% improvement in algorithmic accuracy.

Quick Start

Finetune

Finetune the Qwen2-VL-7B, you can see the official GitHub repository of Qwen2-VL-7B.

CoTAR

the prompt of CoTAR is provided in prompts. You can run the main code to get the CoTAR solution data.

python main.py

License

CC0 license

This work is marked with CC0 1.0

Related Work

Explore our additional research on Vision-Language Large Models, focusing on multi-modal LLMs and mathematical reasoning:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages