AdaptVision

AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition

Release

[2025.11.18] 🔥 AdaptVision is coming! We release the project page, paper, code and models!

Installation

The environment follows the Verl.

git clone https://github.com/AdaptVision/AdaptVision.git
conda create -n adaptvision python=3.11 -y
conda activate adaptvision
# veRL
pip3 install -e . 
# flash-attn
pip3 install flash-attn==2.7.3 --no-build-isolation

pip install transformers==4.51.0
pip install math_verify
pip install ray[default]
pip install tensordict==0.6.2
pip install qwen_vl_utils

Train

Data Preparation

# train file
huggingface-cli download --repo-type dataset --resume-download Senqiao/VisionThink-Smart-Train --local-dir datasets/VisionThink-Smart-Train

# val file
huggingface-cli download --repo-type dataset --resume-download Senqiao/VisionThink-Smart-Val --local-dir datasets/VisionThink-Smart-Val

Train AdaptVision via Reinforcement Learning

To use GPT as the reward model, first set the following environment variables:

AZURE_API_KEY=
AZURE_ENDPOINT=
AZURE_API_VERSION=

Run AdaptVision Training:

bash scripts/run_adaptvision.sh

Evaluation

We use lmms-eval to evaluate our model. Setup the evaulation environment by following instructions here.

We provide the evaluation code detail in scripts/vllm_adaptvision.py.

Citation

If you find this project useful in your research, please consider citing:

@article{lin2025adaptvision,
  title={AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition},
  author={Lin, Zichuan and Liu, Yicheng and Yang, Yang and Tao, Lvfang and Ye, Deheng},
  journal={arXiv preprint arXiv:2512.03794},
  year={2025}
}

Acknowledgement

We would like to thank the following repos for their great work:

This work is built upon the verl, lmms-eval, and VisionThink.
This work utilizes models from Qwen, and data from VisionThink.

License

AdaptVision is licensed under the Apache License 2.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly