Stream-R1:
Reliability-Perplexity Aware Reward Distillation
for Streaming Video Generation

Bin Wu¹, Mengqi Huang^1,†,‡, Shaojin Wu^3,‡, Weinan Jia¹, Yuxin Wang², Zhendong Mao¹, Yongdong Zhang¹

¹ University of Science and Technology of China, ² FrameX.AI, ³ Independent Researcher

_{^† Corresponding author · ^‡ Project lead}

Overview

TL;DR: Existing distribution-matching distillation (DMD) methods for streaming video diffusion treat every rollout, frame, and pixel as equally informative supervision. Stream-R1 instead reweights the DMD objective along two complementary axes — Inter-Reliability across rollouts and Intra-Perplexity across spatiotemporal regions — with a single shared video reward model. The student concentrates updates where the local reward landscape has not yet flattened, converging to the teacher's high-quality mode rather than its full mixture, and surpasses the multi-step Wan2.1 teacher on VBench Total/Semantic at 23.1 FPS with no architectural change and zero inference overhead.

Qualitative results across 30 s / 60 s / 2 min / 3 min: https://stream-r1.github.io/#duration.

Method

Stream-R1 modulates the standard DMD generator loss as

$$\mathcal{L}_{\text{Stream-R1}} ;=; \underbrace{\exp(\beta \cdot r_{\text{final}})}_{\mathbf{W}_{\text{inter}}};\cdot;\text{mean}!\big(\underbrace{\mathbf{W}_{\text{intra}}}_{F\times H\times W},\odot,\mathcal{L}_{\text{DMD}}\big)$$

with three reward-guided components, all derived from one pretrained video reward model:

Inter-Reliability Weighting — the DMD gradient g = f_fake − f_real varies in reliability across rollouts; we exponentially rescale each rollout's loss by exp(β·r_final), so reliable rollouts dominate supervision while low-quality rollouts are attenuated.
Intra-Perplexity Weighting — back-propagates the reward model to obtain a per-pixel saliency volume S ∈ R^{F×H×W}, factorizes it into a temporal profile and per-frame spatial maps, and uses the product as W_intra. Optimization pressure concentrates on the regions and frames where the local reward landscape has not yet flattened — i.e. where further refinement yields the largest expected gain.
Adaptive Reward Balancing — tracks per-axis (VQ / MQ / TA) improvement in a sliding window and subtracts the std of per-axis deltas from the reward, keeping the three quality axes improving at similar rates.

Saliency from the three axes is fused with an adaptive softmax weighting that allocates more attention to the currently weaker axis, so a single reward signal drives both W_inter and W_intra.

Shipped configuration (`configs/exp_stream_r1.yaml`)

Knob	Value	Role
`reward_mode`	`BalancedOverall`	Inter-Reliability + Adaptive Reward Balancing
`spatial_reward` / `spatial_reward_pixel_grad`	`true` / `true`	Intra-Perplexity spatial (pixel-level gradient saliency)
`temporal_saliency_weighting`	`true`	Intra-Perplexity temporal (per-frame importance)
`spatial_reward_combination`	`adaptive`	adaptive saliency fusion across VQ/MQ/TA
`spatial_reward_min_weight`	`0.15`	spatial floor (σ_min)
`temporal_saliency_min_weight`	`0.2`	temporal floor (τ_min)
`full_training_steps` × `gradient_accumulation_steps`	`1000 × 8`	8000 raw steps on 8 GPUs

Requirements

NVIDIA GPU: ≥24 GB for inference, ≥80 GB for training (8 GPUs recommended).
Linux, ≥64 GB RAM.

Installation

git clone https://github.com/FrameX-AI/Stream-R1.git
cd Stream-R1

conda create -n stream_r1 python=3.10
conda activate stream_r1

pip install -r requirements.txt
pip install flash-attn --no-build-isolation
pip install -e .

Pretrained Checkpoints

Required for training (teacher / reward / init) and inference (Stream-R1):

Model	Download
VideoReward	Hugging Face
Wan2.1-T2V-1.3B	Hugging Face
Wan2.1-T2V-14B	Hugging Face
ODE Initialization	Hugging Face
Stream-R1 (T2V-1.3B)	Hugging Face

After downloading:

checkpoints/
├── Videoreward/
├── Wan2.1-T2V-1.3B/
├── Wan2.1-T2V-14B/
├── Stream-R1-T2V-1.3B/
└── ode_init.pt

Or run the helper:

pip install "huggingface_hub[cli]"
bash download_checkpoints.sh

Inference

Place the released Stream-R1 weights at checkpoints/Stream-R1-T2V-1.3B/stream_r1.pt (any filename works — pass it via --checkpoint_path). You can also run inference on a checkpoint produced by your own training run (output/<timestamp>_stream_r1/checkpoint_model_*/generator.pt).

# 5-second video
python inference.py \
    --num_output_frames 21 \
    --config_path configs/stream_r1.yaml \
    --checkpoint_path checkpoints/Stream-R1-T2V-1.3B/stream_r1.pt \
    --output_folder videos/stream_r1-5s \
    --data_path prompts/MovieGenVideoBench_extended.txt \
    --use_ema

# 30-second video
python inference.py \
    --num_output_frames 120 \
    --config_path configs/stream_r1.yaml \
    --checkpoint_path checkpoints/Stream-R1-T2V-1.3B/stream_r1.pt \
    --output_folder videos/stream_r1-30s \
    --data_path prompts/MovieGenVideoBench_extended.txt \
    --use_ema

Training

bash run_stream_r1.sh

The launcher reads configs/exp_stream_r1.yaml, runs train.py via torchrun on 8 GPUs (1000 optimizer steps × grad-accum 8 → 8000 raw steps), then renders 20 evaluation videos from the final checkpoint. Override defaults with environment variables, e.g.:

NUM_GPUS=4 CUDA_VISIBLE_DEVICES=0,1,2,3 bash run_stream_r1.sh

Manual launch:

torchrun --nnodes=1 --nproc_per_node=8 --rdzv_id=5235 --rdzv_backend=c10d \
    --rdzv_endpoint=$MASTER_PORT train.py \
    --config_path configs/exp_stream_r1.yaml \
    --logdir logs/stream_r1 \
    --disable-wandb

For multi-node training, set --nnodes, --node-rank, and --rdzv_endpoint=$MASTER_IP:$MASTER_PORT accordingly.

Results

VBench (5-second, 832×480)

Model	Params	FPS↑	Total↑	Quality↑	Semantic↑
Wan2.1 (multi-step teacher)	1.3B	0.78	84.26	85.30	80.09
LTX-Video	1.9B	8.98	80.00	82.30	70.79
SkyReels-V2	1.3B	0.49	82.67	84.70	74.53
MAGI-1	4.5B	0.19	79.18	82.04	67.74
NOVA	0.6B	0.88	80.12	80.39	79.05
Pyramid Flow	2B	6.7	81.72	84.74	69.62
CausVid	1.3B	17.0	82.88	83.93	78.69
Self Forcing	1.3B	17.0	83.80	84.59	80.64
LongLive	1.3B	20.7	83.22	83.68	81.37
Rolling Forcing	1.3B	17.5	81.22	84.08	69.78
Reward Forcing	1.3B	23.1	84.13	84.84	81.32
Stream-R1 (Ours)	1.3B	23.1	84.40	85.14	81.44

Stream-R1 surpasses its multi-step Wan2.1 teacher on Total and Semantic while running ~30× faster, demonstrating that reward-guided distillation can push the student beyond the teacher's quality frontier. (Underlined = second best, bold = best.)

VideoReward (per-axis)

Model	Visual↑	Dynamic↑	Text↑
SkyReels-V2	3.30	3.05	2.70
CausVid	4.66	3.16	3.32
Self Forcing	3.89	3.44	3.11
LongLive	4.79	3.81	3.98
Reward Forcing	4.82	4.18	4.04
Stream-R1 (Ours)	4.92	4.04	4.11

Project page and qualitative results: https://stream-r1.github.io/

Citation

A BibTeX entry will be added shortly. In the meantime please cite via the arXiv preprint at https://arxiv.org/abs/2605.03849.

Acknowledgements

Built on CausVid, Self Forcing, Wan2.1, and VideoAlign. Stream-R1 extends the Reward Forcing codebase with the Inter-Reliability / Intra-Perplexity formulation.

License

See LICENSE.

Contact

For questions about the project, please open a GitHub issue or reach out to the corresponding author Mengqi Huang.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
checkpoints		checkpoints
configs		configs
demo_utils		demo_utils
model		model
pipeline		pipeline
prompts		prompts
scripts		scripts
templates		templates
trainer		trainer
utils		utils
videoalign		videoalign
wan		wan
LICENSE		LICENSE
README.md		README.md
download_checkpoints.sh		download_checkpoints.sh
environment.yaml		environment.yaml
inference.py		inference.py
requirements.txt		requirements.txt
run_stream_r1.sh		run_stream_r1.sh
setup.py		setup.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stream-R1:
Reliability-Perplexity Aware Reward Distillation
for Streaming Video Generation

Overview

Method

Shipped configuration (`configs/exp_stream_r1.yaml`)

Requirements

Installation

Pretrained Checkpoints

Inference

Training

Results

VBench (5-second, 832×480)

VideoReward (per-axis)

Citation

Acknowledgements

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation

Overview

Method

Shipped configuration (configs/exp_stream_r1.yaml)

Requirements

Installation

Pretrained Checkpoints

Inference

Training

Results

VBench (5-second, 832×480)

VideoReward (per-axis)

Citation

Acknowledgements

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Stream-R1:
Reliability-Perplexity Aware Reward Distillation
for Streaming Video Generation

Shipped configuration (`configs/exp_stream_r1.yaml`)

Packages