Skip to content

Shredded-Pork/Flash-GRPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🦕 Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

arXiv Website

Flash-GRPO, a single-step training framework that outperforms full trajectory training in alignment quality under low computational budgets while substantially improving training efficiency.

LOGO
LOGO

🗺️ Roadmap for Flash-GRPO

Flash-GRPO, a single-step training framework that outperforms full trajectory trainingin alignment quality under low computational budgets while substantially improving training efficiency. Flash-GRPO addresses two critical challenges: iso-temporal grouping eliminates timestep-confounded variance by enforcing prompt-wise temporal consistency, decoupling policy performance from timestep difficulty; temporal gradient rectification neutralizes the time-dependent scaling factor that causes vastly inconsistent gradient magnitudes across timesteps. Experiments on 1.3B to 14B parameter models validate Flash-GRPO’s effectiveness, demonstrating substantial training acceleration with consistent stability and state-of-the-art alignment qualit

Welcome Ideas and Contributions. Stay tuned!

🆕 News

We have presented a single-step training framework, Flash-GRPO.

  • [2026-05-11] We release the code of our paper, and we will release a 8 gpus version of Flash-GRPO (can achieve the same performance, and only need ~40hours). 🔥🔥🔥
  • [2026-05-28] we have released a 8 gpus (~40 hours) version of Flash-GRPO (The reward curve is as following) !

📕 Training & Evaluation

Preparation

Download the reward model HPSV3 and base model Wan2.1-1.3B.

Training

Reward server

cd flow_grpo/reward-server
gunicorn "app_hpsv3:create_app()" 

Wan2.1-1.3B

# Flash-GRPO 96GPUs
bash scripts/multi_node/train_wan2_1_flash.sh

Wan2.1-1.3B-1node

# Flash-GRPO 8GPUs
bash scripts/multi_node/train_wan2_1_flash_1node.sh

📊 Experimental Performance

Performance

📺 Visualization

Visualization

  • For more details please read our paper.

Acknowledgements

Flow-GRPO: The first method integrating online reinforcement learning (RL) into flow matching models.

About

[ICML 2026] Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors