GitHub - apple/ml-streambridge

[NeurIPS 2025] StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

🌟 StreamBridge is a simple yet powerful framework that enables offline Video-LLMs to perform effectively in streaming scenarios. It features:

A memory buffer with round-decayed compression for long-context, multi-turn interactions.
A decoupled and lightweight activation model that enables proactive, timely responses without affecting the base model’s reasoning capabilities.
A newly built dataset, Stream-IT, tailored for streaming video understanding with interleaved video-text sequences and diverse instructions.

Important

For copyright reasons, we can’t release model weights trained on YouTube or other videos that may contain IP-protected content. However, we’re open-sourcing the model implementation and the synthetic data used for training.

🛠️ Install

Clone this repository and navigate to folder

git clone https://github.com/apple/ml-streambridge
cd ml-streambridge

Install package

conda create -n ml-streambridge python=3.10.14
conda activate ml-streambridge
pip install -e .
pip install flash-attn==2.3.3 --no-build-isolation

🚀 Demo for Quick Start

Download checkpoints: TBD due to video copyright reasons.

Organize as:

├── /your/path/to/checkpoints
│   └── llava-onevision-qwen2-0.5b-ov-hf-seperated
│   └── activation_0.5_ratio_anet_coin_yc2_s2s_fa_mhego_hacs_cha_et_llava-ov_epoch_5.pth
│   └── LLaVA-OV-7B-du2e2hjxik
│   └── Oryx-1.5-7B-jfsvkb3hn8
│   └── Qwen2-VL-7B-jh6p673iyp

Run a demo

Update the your_weight_path in demo.py to match the weight directory above:

python demo.py # activation threshold is set for the response frequency

You should see output like:

18 seconds:  Pour the cooked noodles.
32 seconds:  Cut the lemon.
44 seconds:  Cut the olives in half.
55 seconds:  Chop the parsley.
68 seconds:  Squeeze the lemon juice into the measuring cup.
78 seconds:  Pound the chicken.
...

💡 Evaluation on OVO-Bench (multi-turn streaming) and VideoMME (single-turn offline)

You can download the raw videos for OVO-Bench from [🤗HF] and VideoMME from [🤗HF]. And reorganize the folder as follows:

├── /your/path/to/ovo_bench
│   └── videos
│   └── ovo_bench.json
│   └── ...
├── /your/path/to/videomme
│   └── videos
│   └── videomme.json
│   └── ...

Here, we provide the OVO-Bench's ovo_bench.json and VideoMME's videomme.json in ./assets.

Run evaluation script

Set ANNO_PATH and VIDEO_PATH in scripts/eval.sh for the OVO-Bench and VideoMME you download above, and then run:

bash scripts/eval.sh

Evaluate different models by modifiying MODEL and CKPT in the script.
By default, 8 A100-80G GPUs are used; you can adjust NUM_GPUS and MAX_IMG_TOKEN to reduce memory usage.

Report the results

python eval/metric_report.py

And you should reproduce the results below (see our paper for more details):

Model Name	OVO-Bench-Real-Time (OCR/ACR/ATR/STU/FPD/OJR/AVG.)	VideoMME (w/o subs)
Qwen2-VL-StreamBridge	85.24/67.89/75.00/52.25/70.30/72.28/70.49	63.0
Oryx-1.5-StreamBridge	81.21/70.64/70.69/49.44/74.26/68.48/69.12	64.2
LLaVA-OV-StreamBridge	74.50/78.90/72.41/52.81/78.22/68.68/70.89	61.0

🎬 StreamingQA-120K Dataset

The raw 1.28 million videos of StreamingQA-120K are sourced from [🤗WebVid], [🤗InternVid] and [🤗Panda]. You can also download them from their official repos [WebVid-10M] [InternVid-10M] [Panda-70M]
We concatenate videos with higher similarites from these three datasets and annotate QA pairs for them. We provide the similarity-ordered json file. You can dynamically control the grouping size via GROUP_LEN:

import json

def load_json(path):
    with open(path) as f:
        data = json.load(f)
    return data

GROUP_LEN = 10

anns = load_json("/your/path/to/qa_groups.json")
groups = [i for i in range(len(anns))]
groups = [groups[i : i + GROUP_LEN] for i in range(0, len(groups), GROUP_LEN)]
grouped_anns = []
for group in groups:
    if len(group) != GROUP_LEN:
        continue
    grouped_anns.append(
        {
            "video_ids": [anns[i]["video_id"] for i in group],
            "video_files": [anns[i]["video_file"] for i in group],
            "captions": [anns[i]["caption"] for i in group],
            "questions": [anns[i]["question"] for i in group],
            "answers": [anns[i]["answer"] for i in group],
            "options": [anns[i]["options"] for i in group],
            "types": [anns[i]["type"] for i in group],
        }
    )

print(grouped_anns[0])

📜 License

This software and accompanying data and models have been released under the following licenses:

Code: Apple Sample Code License (ASCL)
Data: CC-BY-NC-ND Deed

✏️ Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝.

@article{wang2025streambridge,
  title={StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant},
  author={Wang, Haibo and Feng, Bo and Lai, Zhengfeng and Xu, Mingze and Li, Shiyu and Ge, Weifeng and Dehghan, Afshin and Cao, Meng and Huang, Ping},
  journal={arXiv preprint arXiv:2505.05467},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
eval		eval
scripts		scripts
streambridge		streambridge
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE_DATA		LICENSE_DATA
README.md		README.md
demo.py		demo.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[NeurIPS 2025] StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

🛠️ Install

🚀 Demo for Quick Start

💡 Evaluation on OVO-Bench (multi-turn streaming) and VideoMME (single-turn offline)

🎬 StreamingQA-120K Dataset

📜 License

✏️ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

apple/ml-streambridge

Folders and files

Latest commit

History

Repository files navigation

[NeurIPS 2025] StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

🛠️ Install

🚀 Demo for Quick Start

💡 Evaluation on OVO-Bench (multi-turn streaming) and VideoMME (single-turn offline)

🎬 StreamingQA-120K Dataset

📜 License

✏️ Citation

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages