- [2025.11.18] 🔥 AdaptVision is coming! We release the project page, paper, code and models!
The environment follows the Verl.
git clone https://github.com/AdaptVision/AdaptVision.git
conda create -n adaptvision python=3.11 -y
conda activate adaptvision
# veRL
pip3 install -e .
# flash-attn
pip3 install flash-attn==2.7.3 --no-build-isolation
pip install transformers==4.51.0
pip install math_verify
pip install ray[default]
pip install tensordict==0.6.2
pip install qwen_vl_utils
# train file
huggingface-cli download --repo-type dataset --resume-download Senqiao/VisionThink-Smart-Train --local-dir datasets/VisionThink-Smart-Train
# val file
huggingface-cli download --repo-type dataset --resume-download Senqiao/VisionThink-Smart-Val --local-dir datasets/VisionThink-Smart-Val
To use GPT as the reward model, first set the following environment variables:
AZURE_API_KEY=
AZURE_ENDPOINT=
AZURE_API_VERSION=
Run AdaptVision Training:
bash scripts/run_adaptvision.sh
We use lmms-eval to evaluate our model. Setup the evaulation environment by following instructions here.
We provide the evaluation code detail in scripts/vllm_adaptvision.py.
If you find this project useful in your research, please consider citing:
@article{lin2025adaptvision,
title={AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition},
author={Lin, Zichuan and Liu, Yicheng and Yang, Yang and Tao, Lvfang and Ye, Deheng},
journal={arXiv preprint arXiv:2512.03794},
year={2025}
}We would like to thank the following repos for their great work:
- This work is built upon the verl, lmms-eval, and VisionThink.
- This work utilizes models from Qwen, and data from VisionThink.
- AdaptVision is licensed under the Apache License 2.0.