GitHub - Accio-Lab/SwimBird

SwimBird: Eciliting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs

Jintao Tong¹, Shilin Yan^2†‡, Hongwei Xue², Xiaojun Tang², Kunyu Shi²,
Guannan Zhang², Ruixuan Li^1‡, Yixiong Zou^1‡

^†Project Leader ^‡Corresponding author

¹Huazhong University of Science and Technology, ²Accio Team, Alibaba Group

🔥 News

2025.02.06 🚀 Model and Dataset are released!
2025.02.05 🚀 Training Code is available!
2025.02.05 📝 We release our latest work SwimBird!

🌟 Method

We introduce SwimBird, a hybrid autoregressive MLLM that dynamically switches among three reasoning modes conditioned on the input: (1) text-only reasoning, (2) vision-only reasoning (continuous hidden states as visual thoughts), and (3) interleaved vision–text reasoning. By enabling flexible, query-adaptive mode selection, SwimBird preserves strong textual logic while substantially improving performance on vision-dense tasks.

👀 Cases

SwimBird dynamically switches among three reasoning modes conditioned on the input: (1) text-only reasoning, (2) vision-only reasoning, and (3) interleaved vision–text reasoning.

🛠 Preparation

git clone https://github.com/Accio-Lab/SwimBird.git
cd SwimBird

pip install -r requirements.txt
pip install qwen-vl-utils
pip install flash-attn --no-build-isolation

🎯 Training

To train the model, follow these steps:

Replace Qwen3-VL's chat_template.json with ours.
Download the training datasets SwimBird-SFT-92K and add the dataset absolute directory path as a prefix to all image paths in the JSON files:

python data_process.py absolute_path_to_dataset

Example:

python data_process.py /abs_path/SwimBird-ZebraCoT/
python data_process.py /abs_path/SwimBird-MathCanvas/
python data_process.py /abs_path/SwimBird-ThinkMorph/
python data_process.py /abs_path/SwimBird-OpenMMReasoner/

Run the training script with the following command:

bash scripts/train.sh

📖 Evaluation

We adopt VLMEvalKit to conduct the evaluation. You can get started as follows:

1. Setup

cd VLMEvalKit
pip install -e.

2. Inference

bash test.sh

The path to our model: VLMEvalKit-main/vlmeval/vlm/swimbird

See [QuickStar | 快速开始] for more details about arguments.

✉️ Concat

If you have any questions about this project, please feel free to contact: tattoo.ysl@gmail.com.
We are actively seeking self-motivated researchers and research interns to join our team!

📌 Citation

If you find this project useful in your research, please consider citing:

arxiv

👍 Acknowledgment

We sincerely thank Qwen-VL-Series-Finetune, Skila and others for their contributions, which have provided valuable insights.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
img		img
scripts		scripts
src		src
LICENSE		LICENSE
README.md		README.md
chat_template.json		chat_template.json
data_process.py		data_process.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SwimBird: Eciliting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs

🔥 News

🌟 Method

👀 Cases

🛠 Preparation

🎯 Training

📖 Evaluation

1. Setup

2. Inference

✉️ Concat

📌 Citation

👍 Acknowledgment

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Accio-Lab/SwimBird

Folders and files

Latest commit

History

Repository files navigation

SwimBird: Eciliting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs

🔥 News

🌟 Method

👀 Cases

🛠 Preparation

🎯 Training

📖 Evaluation

1. Setup

2. Inference

✉️ Concat

📌 Citation

👍 Acknowledgment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages