📊 START: Spatial and Textual Learning for Chart Understanding

Zhuoming Liu¹, Xiaofeng Gao², Feiyang Niu², Qiaozi Gao², Liu Liu³, Robinson Piramuthu²

¹University of Wisconsin-Madison ²Amazon AGI ³MIT

Introduction

This code repo holds the open-source models and resources reproduced version of START following the method proposed in paper START: Spatial and Textual learning for chART understanding.

Key Contributions

Spatial-Textual Learning:
- (i) Chart-element grounding — to strengthen an MLLM's understanding of chart visual layout
- (ii) Chart-to-code generation — to enhance understanding of data details

START-Dataset: A novel data-generation pipeline that first leverages an MLLM to translate real chart images into executable chart code, recovering the underlying data representation while preserving the visual distribution of real-world charts. We then evolve the code with a Large Language Model (LLM) to ascertain the positions of chart elements that capture the chart's visual structure, addressing challenges that existing methods cannot handle.

CS-Bench (Chart Spatial understanding Benchmark): A new benchmark to evaluate a model's ability to understand chart spatial structures, filling a critical gap in comprehensive chart understanding evaluation.

State-of-the-art Performance: Leveraging spatial and textual learning, START delivers consistent gains across model sizes and benchmarks over the base models and surpasses prior state-of-the-art by a clear margin.

1. Install

Please install the environment below to reproduce the model performance on CS-Bench.

conda create -n qwenvl_env python=3.10
conda activate qwenvl_env

pip install vllm==0.11.0
pip install transformers==4.57.0
# Install Qwen-VL utility library (recommended for offline inference)
pip install qwen-vl-utils==0.0.14
pip install omegaconf
pip install matplotlib ipdb
pip install packaging
pip install ninja
pip install flash-attn==2.8.2 --no-build-isolation
pip install wandb
pip install anls

2. Reproduced START Dataset

2.1 The Raw Code data

You can download the Python Chart codes here, and all the evolved Python Chart codes here.

Running these python codes will give you the rendered image and the json file with chart's element location (when you running the evolved code):

# run all the python chart code: generate rendered image
python data_preprocessing/run_simulated_code.py \
    --code_dir /<YOUR_DATA_DIR>/generated_code \
    --image_dir /<YOUR_DATA_DIR>/simulated_images/ \
    --info_file /<YOUR_DATA_DIR>/simulated_images/all_info.jsonl \
    --save_dir /<YOUR_DATA_DIR>/temp_code/

# run all the evolved python chart code: generate rendered image and json files
python data_preprocessing/run_evolved_code.py \
    --code_dir /<YOUR_DATA_DIR>/descriptive_evolved_code/ \
    --image_dir /<YOUR_DATA_DIR>/evolved_simulated_images/ \
    --info_file /<YOUR_DATA_DIR>/evolved_simulated_images/all_info.jsonl \
    --save_dir /<YOUR_DATA_DIR>/temp_evolved_code/

2.2 The START Dataset - Annotation and images

You can download the prepared annotations and image here.

Dataset Statistics

Dataset	# Annotations	# Unique Images
SFT Annotation	399,897	70,652
RL Annotation	39,211	3,979

Question Type Distribution (RL Annotation)

In this open-source version, we also includes the point and line training data.

Question Type	Count	Percentage
bbox	6,694	17.07%
Number-in-Chart	6,059	15.45%
Text-in-Chart	5,842	14.90%
Number-in-General	4,969	12.67%
Text-in-General	2,955	7.54%
dot	1,764	4.50%
line	1,498	3.82%
chart_to_code	881	2.25%

Question Type Distribution (SFT Annotation)

Question Type	Count	Percentage
Number-in-General	145,676	36.43%
Chart to code (With QA)	111,032	27.77%
Text-in-General	68,917	17.23%
Text-in-Chart	27,262	6.82%
Number-in-Chart	23,284	5.82%
Multiple Round Agentic	19,789	4.95%
Other	3,937	0.99%

In this open-source version, we also includes the agentic multiple-rounds converation in SFT dataset, sample:

{'messages': [{'content': '<image>### Role\nYou are a meticulous chart-analyzing agent.  \nYou may call **Atomic Operations** to gather extra visual evidence, *or* you may answer directly if confident.\n\n### Chart Image\n(see attached image)\n\n### User Question\nHow many subplots show both lines increasing from All Rp to Rp/Re < 10?\n\n### Available Atomic Operations\n1. **CropRegion** – args: `[x_min, y_min, x_max, y_max]`    \n2. **DrawBBox**    – args: `[x_min, y_min, x_max, y_max]`    \n3. **DrawLine**    – args: `[x1, y1, x2, y2]`\n\n### Instructions\nProduce **one** of the two response styles below:\n\n**(A) Plan-and-Act** – if you need operations first\n````\n```json\n[\n  {"op": "<operation_name>", "args": [...], "purpose": "<why>"},\n  …\n]\n```\n````\n\n**(B) Direct Answer** – if no operations are required  \n````\n<think><step-by-step explanation></think>\nThe final answer is: \\boxed{<answer>}\n````\n', 'role': 'user'}, {'content': '```json\n[{"op": "DrawBBox", "args": [62, 429, 468, 761], "purpose": "Highlight the AGN subplot to examine the trend of both lines"}, {"op": "DrawBBox", "args": [62, 37, 468, 368], "purpose": "Highlight the Quiescent subplot for comparison - both lines decrease"}, {"op": "DrawBBox", "args": [554, 37, 961, 368], "purpose": "Highlight the Star Forming subplot for comparison - only control line increases"}, {"op": "DrawBBox", "args": [554, 429, 961, 761], "purpose": "Highlight the Composite subplot for comparison - both lines decrease"}]\n```', 'role': 'assistant'}, {'content': '<image>\nHere are the images base on your operation.\n', 'role': 'user'}, {'content': '<think>By examining each subplot: Quiescent shows both lines decreasing, Star Forming shows only the control line increasing while all pairs decreases, AGN shows both lines clearly increasing from left to right, and Composite shows both lines decreasing. Therefore, only 1 subplot (AGN) shows both lines increasing from All Rp to Rp/Re < 10.</think>\nThe final answer is: \\boxed{1}\n', 'role': 'assistant'}], 'images': ['0905.3386_1_origin.png', '0905.3386_1_6_draw.png']}

3. Reproduced CS-Bench

Please download the CS-Bench here.

CS-Bench Statistics

Metric	Value
Number of Annotations	692
Number of Unique Images	600

Question Type Distribution

Question Type	Count	Percentage
Grounding Question	350	50.58%
QA Grounding Question	342	49.42%

4. Reproduced Checkpoint

The following model is trained on the reproduced START dataset initialized with Qwen2.5-VL-7B.

Models	CharXiv (desc)	CharXiv (rea)	ChartQAPro	ChartMimic	CS-Bench (Recall@0.3)	CS-Bench (acc)
Qwen2.5-VL-7B	73.0	43.2	41.9	47.1	23.1	50.8
START-RL-7B	78.5	47.6	47.2	69.9	48.6	65.5

5. Reproduce Checkpoint Results on Benchmarks

We provide evaluation Code for CS-Bench, ChartQAPro, CharXiv, ChartMimic in this section.

5.1 CS-Bench Evaluation

Please refer to CS-Bench Eval.

5.2 ChartQAPro Evaluation

Please refer to ChartQAPro Eval.

5.3 CharXiv Evaluation

Please refer to CharXiv Eval.

5.4 ChartMimic Evaluation

Please refer to ChartMimic Eval.

6. Training Code

Training code will come soon.

Contact

Zhuoming Liu (zliu2346@wisc.edu)

📖 Citation

Please kindly cite our paper if you find this project helpful.

@misc{liu2025startspatialtextuallearning,
      title={START: Spatial and Textual Learning for Chart Understanding},
      author={Zhuoming Liu and Xiaofeng Gao and Feiyang Niu and Qiaozi Gao and Liu Liu and Robinson Piramuthu},
      year={2025},
      eprint={2512.07186},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.07186},
}

💡 Acknowledgements

This project was built upon the following repositories with many thanks to their authors.

Qwen2.5-VL, vLLM

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data_preprocessing		data_preprocessing
eval		eval
images		images
predictions		predictions
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊 START: Spatial and Textual Learning for Chart Understanding

Introduction

Key Contributions

Contents

1. Install

2. Reproduced START Dataset

2.1 The Raw Code data

2.2 The START Dataset - Annotation and images

Dataset Statistics

Question Type Distribution (RL Annotation)

Question Type Distribution (SFT Annotation)

3. Reproduced CS-Bench

CS-Bench Statistics

Question Type Distribution

4. Reproduced Checkpoint

5. Reproduce Checkpoint Results on Benchmarks

5.1 CS-Bench Evaluation

5.2 ChartQAPro Evaluation

5.3 CharXiv Evaluation

5.4 ChartMimic Evaluation

6. Training Code

Contact

📖 Citation

💡 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📊 START: Spatial and Textual Learning for Chart Understanding

Introduction

Key Contributions

Contents

1. Install

2. Reproduced START Dataset

2.1 The Raw Code data

2.2 The START Dataset - Annotation and images

Dataset Statistics

Question Type Distribution (RL Annotation)

Question Type Distribution (SFT Annotation)

3. Reproduced CS-Bench

CS-Bench Statistics

Question Type Distribution

4. Reproduced Checkpoint

5. Reproduce Checkpoint Results on Benchmarks

5.1 CS-Bench Evaluation

5.2 ChartQAPro Evaluation

5.3 CharXiv Evaluation

5.4 ChartMimic Evaluation

6. Training Code

Contact

📖 Citation

💡 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages