Beichen Zhang
·
Yuhang Zang†
·
Xiaoyi Dong
·
Yuhang Cao
Haodong Duan
·
Dahua Lin
·
Jiaqi Wang†
†Corresponding authors.
- 🚀 [2025/11/26] We have released the Inference Code
- 🚀 [2025/11/19] We have released the paper Think Visually, Reason Textually: Vision-Language Synergy in ARC
We integrate Visual Intelligence into ARC-AGI to leverage the respective advantages of vision and text: vision supports global pattern abstraction and verification, whereas language specializes in precise execution.
We achieve this by introducing two synergistic strategies: (1) Vision-Language Synergy Reasoning (VLSR) which decomposes ARC-AGI into modality-aligned subtasks; and (2) Modality-Switch Self-Correction (MSSC), which leverages vision to verify text-based reasoning for intrinsic error correction.
Prepare your environment
git clone https://github.com/InternLM/Arc-VL
conda create -n arcvl python==3.11
conda activate arcvl
pip install -r requirements.txtModify setup_api_key.shand fill in your base_url and API keys. Activate it by running:
source setup_api_key.shPrepare for the data. The data can be downloaded in the following link:
ARC-AGI: https://github.com/fchollet/ARC-AGI
BARC: https://github.com/xu3kev/BARC
Re-ARC: https://github.com/michaelhodel/re-arc
Specify the test dataset, test model and dataset path, and run our vision-language synergy reasoning with the following code.
python inference.py --dataset_name="arc-agi" --model="gpt-4o" --data_path="Your_data_path"
--result_file="result_arcagi_4o.json"
--save_root="images/ARC-AGI/"Finally, score the inference results.
python score.py --input_file="result.json" --output_file="result_scored.json"We conduct an in-depth analysis of the specific outputs of different models (GPT-4o, Gemini-2.5-Pro-thinking-8192, o4-mini) when employing visual thinking versus textual thinking in the ARC-AGI task. Visual thinking demonstrates numerous unique advantages, such as the integration of 2D structural information, a global perspective, and long-range perception capabilities.
If you find this project useful, please kindly cite:
@article{zhang2025think,
title={Think Visually, Reason Textually: Vision-Language Synergy in ARC},
author={Zhang, Beichen and Zang, Yuhang and Dong, Xiaoyi and Cao, Yuhang and Duan, Haodong and Lin, Dahua and Wang, Jiaqi},
journal={arXiv preprint arXiv:2511.15703},
year={2025}
}
Usage and License Notices: The code is intended and licensed for research use only.




