Skip to content

SamsungLabs/VMPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VMPO: Diffusion Alignment with Variance Minimisation

This repo contains PyTorch implementation of the paper "Diffusion Alignment Beyond KL: Variance Minimisation as Effective Policy Optimiser" by Zijing Ou, Jacob Si, Junyi Zhu, Ondrej Bohdal, Mete Ozay, Taha Ceritli, and Yingzhen Li.

We propose Variance Minimisation Policy Optimiser (VMPO), a framework that reformulates diffusion alignment through the lens of Sequential Monte Carlo (SMC) sampling. VMPO treats the denoising process as a proposal distribution and reward guidance as importance weights, shifting the optimization goal from standard KL-divergence to the minimization of log-importance weight variance. We theoretically demonstrate that this variance objective is minimized by the optimal reward-tilted target and that its gradient aligns with KL-based methods under on-policy sampling. By varying potential functions and minimization strategies, VMPO provides a unified perspective that recovers several existing alignment methods while opening new design paths for non-KL-based diffusion refinement.

Environment Setup

Our implementation is based on the Flow-GRPO and the DiffusionNFT codebase, with most environments aligned.

Clone this repository and install packages by:

# Environment setup
conda create -n DiffusionNFT python=3.10.16
conda install -c nvidia cuda-toolkit=12.4
pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124
pip install -e . --no-build-isolation

# OCR reward preparation
pip install paddlepaddle-gpu==2.6.2
pip install paddleocr==2.9.1
pip install python-Levenshtein

Dataset

The repo currently supports the OCR task from Flow-GRPO. The training and testing data can be downloaded from Flow-GRPO and should be placed within dataset/ocr directory.

Training

The default configuration file config/vmpo.py is set for 4 GPUs, and you can customize it as needed.

torchrun --nproc_per_node=4 -m scripts.train_vmpo_sd3 --config config/vmpo.py:sd3_ocr

Citation

If you find this repo useful, please consider citing our paper:

@article{ou2026diffusion,
  title={Diffusion Alignment Beyond KL: Variance Minimisation as Effective Policy Optimiser},
  author={Ou, Zijing and Si, Jacob and Zhu, Junyi and Bohdal, Ondrej and Ozay, Mete and Ceritli, Taha and Li, Yingzhen},
  journal={arXiv preprint arXiv:2602.12229},
  year={2026}
}

About

Diffusion Alignment Beyond KL: Variance Minimisation as Effective Policy Optimiser

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages