Zhuoming Liu1, Xiaofeng Gao2, Feiyang Niu2, Qiaozi Gao2, Liu Liu3, Robinson Piramuthu2
1University of Wisconsin-Madison 2Amazon AGI 3MIT
This code repo holds the open-source models and resources reproduced version of START following the method proposed in paper START: Spatial and Textual learning for chART understanding.
- Spatial-Textual Learning:
- (i) Chart-element grounding — to strengthen an MLLM's understanding of chart visual layout
- (ii) Chart-to-code generation — to enhance understanding of data details
- START-Dataset: A novel data-generation pipeline that first leverages an MLLM to translate real chart images into executable chart code, recovering the underlying data representation while preserving the visual distribution of real-world charts. We then evolve the code with a Large Language Model (LLM) to ascertain the positions of chart elements that capture the chart's visual structure, addressing challenges that existing methods cannot handle.
- CS-Bench (Chart Spatial understanding Benchmark): A new benchmark to evaluate a model's ability to understand chart spatial structures, filling a critical gap in comprehensive chart understanding evaluation.
- State-of-the-art Performance: Leveraging spatial and textual learning, START delivers consistent gains across model sizes and benchmarks over the base models and surpasses prior state-of-the-art by a clear margin.
- Install
- START Dataset
- CS-Bench
- Checkpoints
- Reproduce Results
- Training Code
- Contact
- Citation
- Acknowledgement
Please install the environment below to reproduce the model performance on CS-Bench.
conda create -n qwenvl_env python=3.10
conda activate qwenvl_env
pip install vllm==0.11.0
pip install transformers==4.57.0
# Install Qwen-VL utility library (recommended for offline inference)
pip install qwen-vl-utils==0.0.14
pip install omegaconf
pip install matplotlib ipdb
pip install packaging
pip install ninja
pip install flash-attn==2.8.2 --no-build-isolation
pip install wandb
pip install anlsYou can download the Python Chart codes here, and all the evolved Python Chart codes here.
Running these python codes will give you the rendered image and the json file with chart's element location (when you running the evolved code):
# run all the python chart code: generate rendered image
python data_preprocessing/run_simulated_code.py \
--code_dir /<YOUR_DATA_DIR>/generated_code \
--image_dir /<YOUR_DATA_DIR>/simulated_images/ \
--info_file /<YOUR_DATA_DIR>/simulated_images/all_info.jsonl \
--save_dir /<YOUR_DATA_DIR>/temp_code/
# run all the evolved python chart code: generate rendered image and json files
python data_preprocessing/run_evolved_code.py \
--code_dir /<YOUR_DATA_DIR>/descriptive_evolved_code/ \
--image_dir /<YOUR_DATA_DIR>/evolved_simulated_images/ \
--info_file /<YOUR_DATA_DIR>/evolved_simulated_images/all_info.jsonl \
--save_dir /<YOUR_DATA_DIR>/temp_evolved_code/
You can download the prepared annotations and image here.
| Dataset | # Annotations | # Unique Images |
|---|---|---|
| SFT Annotation | 399,897 | 70,652 |
| RL Annotation | 39,211 | 3,979 |
In this open-source version, we also includes the point and line training data.
| Question Type | Count | Percentage |
|---|---|---|
| bbox | 6,694 | 17.07% |
| Number-in-Chart | 6,059 | 15.45% |
| Text-in-Chart | 5,842 | 14.90% |
| Number-in-General | 4,969 | 12.67% |
| Text-in-General | 2,955 | 7.54% |
| dot | 1,764 | 4.50% |
| line | 1,498 | 3.82% |
| chart_to_code | 881 | 2.25% |
| Question Type | Count | Percentage |
|---|---|---|
| Number-in-General | 145,676 | 36.43% |
| Chart to code (With QA) | 111,032 | 27.77% |
| Text-in-General | 68,917 | 17.23% |
| Text-in-Chart | 27,262 | 6.82% |
| Number-in-Chart | 23,284 | 5.82% |
| Multiple Round Agentic | 19,789 | 4.95% |
| Other | 3,937 | 0.99% |
In this open-source version, we also includes the agentic multiple-rounds converation in SFT dataset, sample:
{'messages': [{'content': '<image>### Role\nYou are a meticulous chart-analyzing agent. \nYou may call **Atomic Operations** to gather extra visual evidence, *or* you may answer directly if confident.\n\n### Chart Image\n(see attached image)\n\n### User Question\nHow many subplots show both lines increasing from All Rp to Rp/Re < 10?\n\n### Available Atomic Operations\n1. **CropRegion** – args: `[x_min, y_min, x_max, y_max]` \n2. **DrawBBox** – args: `[x_min, y_min, x_max, y_max]` \n3. **DrawLine** – args: `[x1, y1, x2, y2]`\n\n### Instructions\nProduce **one** of the two response styles below:\n\n**(A) Plan-and-Act** – if you need operations first\n````\n```json\n[\n {"op": "<operation_name>", "args": [...], "purpose": "<why>"},\n …\n]\n```\n````\n\n**(B) Direct Answer** – if no operations are required \n````\n<think><step-by-step explanation></think>\nThe final answer is: \\boxed{<answer>}\n````\n', 'role': 'user'}, {'content': '```json\n[{"op": "DrawBBox", "args": [62, 429, 468, 761], "purpose": "Highlight the AGN subplot to examine the trend of both lines"}, {"op": "DrawBBox", "args": [62, 37, 468, 368], "purpose": "Highlight the Quiescent subplot for comparison - both lines decrease"}, {"op": "DrawBBox", "args": [554, 37, 961, 368], "purpose": "Highlight the Star Forming subplot for comparison - only control line increases"}, {"op": "DrawBBox", "args": [554, 429, 961, 761], "purpose": "Highlight the Composite subplot for comparison - both lines decrease"}]\n```', 'role': 'assistant'}, {'content': '<image>\nHere are the images base on your operation.\n', 'role': 'user'}, {'content': '<think>By examining each subplot: Quiescent shows both lines decreasing, Star Forming shows only the control line increasing while all pairs decreases, AGN shows both lines clearly increasing from left to right, and Composite shows both lines decreasing. Therefore, only 1 subplot (AGN) shows both lines increasing from All Rp to Rp/Re < 10.</think>\nThe final answer is: \\boxed{1}\n', 'role': 'assistant'}], 'images': ['0905.3386_1_origin.png', '0905.3386_1_6_draw.png']}Please download the CS-Bench here.
| Metric | Value |
|---|---|
| Number of Annotations | 692 |
| Number of Unique Images | 600 |
| Question Type | Count | Percentage |
|---|---|---|
| Grounding Question | 350 | 50.58% |
| QA Grounding Question | 342 | 49.42% |
The following model is trained on the reproduced START dataset initialized with Qwen2.5-VL-7B.
| Models | CharXiv (desc) | CharXiv (rea) | ChartQAPro | ChartMimic | CS-Bench (Recall@0.3) | CS-Bench (acc) |
|---|---|---|---|---|---|---|
| Qwen2.5-VL-7B | 73.0 | 43.2 | 41.9 | 47.1 | 23.1 | 50.8 |
| START-RL-7B | 78.5 | 47.6 | 47.2 | 69.9 | 48.6 | 65.5 |
We provide evaluation Code for CS-Bench, ChartQAPro, CharXiv, ChartMimic in this section.
Please refer to CS-Bench Eval.
Please refer to ChartQAPro Eval.
Please refer to CharXiv Eval.
Please refer to ChartMimic Eval.
Training code will come soon.
Zhuoming Liu (zliu2346@wisc.edu)
Please kindly cite our paper if you find this project helpful.
@misc{liu2025startspatialtextuallearning,
title={START: Spatial and Textual Learning for Chart Understanding},
author={Zhuoming Liu and Xiaofeng Gao and Feiyang Niu and Qiaozi Gao and Liu Liu and Robinson Piramuthu},
year={2025},
eprint={2512.07186},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.07186},
}This project was built upon the following repositories with many thanks to their authors.




