🎬 Video Streaming Thinking

VideoLLMs Can Watch and Think Simultaneously

Video Streaming Thinking introduces a new paradigm for streaming video understanding that interleaves active reasoning with continuous video consumption, enabling amortized test-time scaling with real-time responsiveness.

🔍 Overview

Existing online VideoLLMs focus on efficient streaming perception but lack explicit analytical reasoning. Offline VideoLLMs with Chain-of-Thought (CoT) can reason deeply, but incur high query-answer (QA) latency that violates real-time constraints. VST bridges this gap by shifting the LLM backend from passive waiting to active, intermittent reasoning during video consumption, implementing a thinking-while-watching mechanism inspired by human neural coupling.

Pro20260326-170453.mp4

✨ Key Idea

Instead of deferring all reasoning until a user query arrives, VST continuously processes incoming video clips and produces intermediate streaming thoughts in real time. This front-loads and amortizes the reasoning cost, so the final response is both deeply grounded and instantly available.

🏗️ Model Zoo

Model	HuggingFace	OVO-Bench	StreamingBench	VideoMME	LongVideoBench	VideoHolmes
VST-3B	🤗 Link	56.2	75.5	59.5	54.1	36.1
VST-7B	🤗 Link	59.3	79.5	64.9	58.0	41.9
VST-32B	🤗 Link	63.5	80.7	67.2	60.7	45.1

📦 Training Data

We release the full training data used for both SFT and RL stages on HuggingFace and ModelScope:

Dataset	HuggingFace	ModelScope	Description
vst_sft_data	🤗 Link	🤖 Link	SFT data including video-text pairs from multiple sources
vst_rl_data	🤗 Link	🤖 Link	RL data for reinforcement learning stage

📅 TODO

Release the paper.
Release checkpoint and eval code.
Release training code.
Release training data.

👍 Acknowledgement

We thank the following great works and open-source repositories:

📖 Citation

@article{guan2026videostreamingthinking,
      title={Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously}, 
      author={Yiran Guan and Liang Yin and Dingkang Liang and Jianzhong Ju and Zhenbo Luo and Jian Luan and Yuliang Liu and Xiang Bai},
      journal={arXiv preprint arXiv:2603.12262},
      year={2026},
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
VST-RL		VST-RL
VST-SFT		VST-SFT
eval		eval
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 Video Streaming Thinking

VideoLLMs Can Watch and Think Simultaneously

🔍 Overview

✨ Key Idea

🏗️ Model Zoo

📦 Training Data

📅 TODO

👍 Acknowledgement

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎬 Video Streaming Thinking

VideoLLMs Can Watch and Think Simultaneously

🔍 Overview

✨ Key Idea

🏗️ Model Zoo

📦 Training Data

📅 TODO

👍 Acknowledgement

📖 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages