Video Streaming Thinking introduces a new paradigm for streaming video understanding that interleaves active reasoning with continuous video consumption, enabling amortized test-time scaling with real-time responsiveness.
Existing online VideoLLMs focus on efficient streaming perception but lack explicit analytical reasoning. Offline VideoLLMs with Chain-of-Thought (CoT) can reason deeply, but incur high query-answer (QA) latency that violates real-time constraints. VST bridges this gap by shifting the LLM backend from passive waiting to active, intermittent reasoning during video consumption, implementing a thinking-while-watching mechanism inspired by human neural coupling.
Pro20260326-170453.mp4
Instead of deferring all reasoning until a user query arrives, VST continuously processes incoming video clips and produces intermediate streaming thoughts in real time. This front-loads and amortizes the reasoning cost, so the final response is both deeply grounded and instantly available.
| Model | HuggingFace | OVO-Bench | StreamingBench | VideoMME | LongVideoBench | VideoHolmes |
|---|---|---|---|---|---|---|
| VST-3B | 🤗 Link | 56.2 | 75.5 | 59.5 | 54.1 | 36.1 |
| VST-7B | 🤗 Link | 59.3 | 79.5 | 64.9 | 58.0 | 41.9 |
| VST-32B | 🤗 Link | 63.5 | 80.7 | 67.2 | 60.7 | 45.1 |
We release the full training data used for both SFT and RL stages on HuggingFace and ModelScope:
| Dataset | HuggingFace | ModelScope | Description |
|---|---|---|---|
| vst_sft_data | 🤗 Link | 🤖 Link | SFT data including video-text pairs from multiple sources |
| vst_rl_data | 🤗 Link | 🤖 Link | RL data for reinforcement learning stage |
- Release the paper.
- Release checkpoint and eval code.
- Release training code.
- Release training data.
We thank the following great works and open-source repositories:
@article{guan2026videostreamingthinking,
title={Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously},
author={Yiran Guan and Liang Yin and Dingkang Liang and Jianzhong Ju and Zhenbo Luo and Jian Luan and Yuliang Liu and Xiang Bai},
journal={arXiv preprint arXiv:2603.12262},
year={2026},
}