Skip to content

Accio-Lab/SwimBird

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


SwimBird: Eciliting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs

Jintao Tong1, Shilin Yan2†‡, Hongwei Xue2, Xiaojun Tang2, Kunyu Shi2,
Guannan Zhang2, Ruixuan Li1‡, Yixiong Zou1‡

Project Leader Corresponding author

1Huazhong University of Science and Technology, 2Accio Team, Alibaba Group

ArXiv Project HF HF

🔥 News

🌟 Method

We introduce SwimBird, a hybrid autoregressive MLLM that dynamically switches among three reasoning modes conditioned on the input: (1) text-only reasoning, (2) vision-only reasoning (continuous hidden states as visual thoughts), and (3) interleaved vision–text reasoning. By enabling flexible, query-adaptive mode selection, SwimBird preserves strong textual logic while substantially improving performance on vision-dense tasks.

mask

👀 Cases

SwimBird dynamically switches among three reasoning modes conditioned on the input: (1) text-only reasoning, (2) vision-only reasoning, and (3) interleaved vision–text reasoning.

mask

🛠 Preparation

git clone https://github.com/Accio-Lab/SwimBird.git
cd SwimBird

pip install -r requirements.txt
pip install qwen-vl-utils
pip install flash-attn --no-build-isolation

🎯 Training

To train the model, follow these steps:

  1. Replace Qwen3-VL's chat_template.json with ours.
  2. Download the training datasets SwimBird-SFT-92K and add the dataset absolute directory path as a prefix to all image paths in the JSON files:
python data_process.py absolute_path_to_dataset

Example:

python data_process.py /abs_path/SwimBird-ZebraCoT/
python data_process.py /abs_path/SwimBird-MathCanvas/
python data_process.py /abs_path/SwimBird-ThinkMorph/
python data_process.py /abs_path/SwimBird-OpenMMReasoner/
  1. Run the training script with the following command:
bash scripts/train.sh

📖 Evaluation

We adopt VLMEvalKit to conduct the evaluation. You can get started as follows:

1. Setup

cd VLMEvalKit
pip install -e.

2. Inference

bash test.sh

The path to our model: VLMEvalKit-main/vlmeval/vlm/swimbird

See [QuickStar | 快速开始] for more details about arguments.

✉️ Concat

  • If you have any questions about this project, please feel free to contact: tattoo.ysl@gmail.com.
  • We are actively seeking self-motivated researchers and research interns to join our team!

📌 Citation

  • If you find this project useful in your research, please consider citing:
arxiv

👍 Acknowledgment

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •