🦕 Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

Flash-GRPO, a single-step training framework that outperforms full trajectory training in alignment quality under low computational budgets while substantially improving training efficiency.

🗺️ Roadmap for Flash-GRPO

Flash-GRPO, a single-step training framework that outperforms full trajectory trainingin alignment quality under low computational budgets while substantially improving training efficiency. Flash-GRPO addresses two critical challenges: iso-temporal grouping eliminates timestep-confounded variance by enforcing prompt-wise temporal consistency, decoupling policy performance from timestep difficulty; temporal gradient rectification neutralizes the time-dependent scaling factor that causes vastly inconsistent gradient magnitudes across timesteps. Experiments on 1.3B to 14B parameter models validate Flash-GRPO’s effectiveness, demonstrating substantial training acceleration with consistent stability and state-of-the-art alignment qualit

Welcome Ideas and Contributions. Stay tuned!

🆕 News

We have presented a single-step training framework, Flash-GRPO.

[2026-05-11] We release the code of our paper, and we will release a 8 gpus version of Flash-GRPO (can achieve the same performance, and only need ~40hours). 🔥🔥🔥
[2026-05-28] we have released a 8 gpus (~40 hours) version of Flash-GRPO (The reward curve is as following) !

📕 Training & Evaluation

Preparation

Download the reward model HPSV3 and base model Wan2.1-1.3B.

Training

Reward server

cd flow_grpo/reward-server
gunicorn "app_hpsv3:create_app()"

Wan2.1-1.3B

# Flash-GRPO 96GPUs
bash scripts/multi_node/train_wan2_1_flash.sh

Wan2.1-1.3B-1node

# Flash-GRPO 8GPUs
bash scripts/multi_node/train_wan2_1_flash_1node.sh

📊 Experimental Performance

📺 Visualization

For more details please read our paper.

Acknowledgements

Flow-GRPO: The first method integrating online reinforcement learning (RL) into flow matching models.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
asset		asset
config		config
dataset/video		dataset/video
flow_grpo		flow_grpo
scripts		scripts
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🦕 Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

🗺️ Roadmap for Flash-GRPO

🆕 News

📕 Training & Evaluation

Preparation

Training

Reward server

Wan2.1-1.3B

Wan2.1-1.3B-1node

📊 Experimental Performance

📺 Visualization

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🦕 Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

🗺️ Roadmap for Flash-GRPO

🆕 News

📕 Training & Evaluation

Preparation

Training

Reward server

Wan2.1-1.3B

Wan2.1-1.3B-1node

📊 Experimental Performance

📺 Visualization

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages