Skip to content

1ranGuan/VST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎬 Video Streaming Thinking

VideoLLMs Can Watch and Think Simultaneously

arXiv Homepage License Model 3/7/32B Training Data Training Data MS

Video Streaming Thinking introduces a new paradigm for streaming video understanding that interleaves active reasoning with continuous video consumption, enabling amortized test-time scaling with real-time responsiveness.


🔍 Overview

Existing online VideoLLMs focus on efficient streaming perception but lack explicit analytical reasoning. Offline VideoLLMs with Chain-of-Thought (CoT) can reason deeply, but incur high query-answer (QA) latency that violates real-time constraints. VST bridges this gap by shifting the LLM backend from passive waiting to active, intermittent reasoning during video consumption, implementing a thinking-while-watching mechanism inspired by human neural coupling.

Pro20260326-170453.mp4

✨ Key Idea

Instead of deferring all reasoning until a user query arrives, VST continuously processes incoming video clips and produces intermediate streaming thoughts in real time. This front-loads and amortizes the reasoning cost, so the final response is both deeply grounded and instantly available.

🏗️ Model Zoo

Model HuggingFace OVO-Bench StreamingBench VideoMME LongVideoBench VideoHolmes
VST-3B 🤗 Link 56.2 75.5 59.5 54.1 36.1
VST-7B 🤗 Link 59.3 79.5 64.9 58.0 41.9
VST-32B 🤗 Link 63.5 80.7 67.2 60.7 45.1

📦 Training Data

We release the full training data used for both SFT and RL stages on HuggingFace and ModelScope:

Dataset HuggingFace ModelScope Description
vst_sft_data 🤗 Link 🤖 Link SFT data including video-text pairs from multiple sources
vst_rl_data 🤗 Link 🤖 Link RL data for reinforcement learning stage

📅 TODO

  • Release the paper.
  • Release checkpoint and eval code.
  • Release training code.
  • Release training data.

👍 Acknowledgement

We thank the following great works and open-source repositories:

📖 Citation

@article{guan2026videostreamingthinking,
      title={Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously}, 
      author={Yiran Guan and Liang Yin and Dingkang Liang and Jianzhong Ju and Zhenbo Luo and Jian Luan and Yuliang Liu and Xiang Bai},
      journal={arXiv preprint arXiv:2603.12262},
      year={2026},
}

About

Streaming Thinking for VideoLLM Streaming Video Understanding

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors