Scaling RL post-training of diffusion and flow-matching models.

Prompt: "a red zebra". Left: pretrained SD3.5M. Right: SD3.5M post-trained with RAM on GenEval.
Code for the three text-to-image experiments in the paper: post-training Stable Diffusion 3.5 Medium with RAM on GenEval (compositional generation), OCR (visual text rendering), and PickScore (human preference).
We ran our experiments on a cluster with 4× H100 96GB.
git clone https://github.com/AndreasBergmeister/ram.git
cd ram
python -m venv .venv && source .venv/bin/activate
pip install -e .All dependency versions are pinned in pyproject.toml. Note that paddlepaddle and onedl-mmcv are CUDA-specific; the pinned versions target a CUDA 12.x environment.
Run all commands below from the repository root — the scripts and the GenEval reward use repo-relative paths (
configs/,prompts/,models/).
Training is launched with accelerate. Each config in configs/ reproduces one of the three paper experiments:
# Compositional generation (GenEval)
accelerate launch scripts/training_sd3.py geneval_sd3
# Visual text rendering (OCR)
accelerate launch scripts/training_sd3.py ocr_sd3
# Human-preference alignment (PickScore)
accelerate launch scripts/training_sd3.py pickscore_sd3Outputs (a copy of the resolved config + checkpoints) land in outputs/<config_name>/. Override with --output_dir <path>.
Per-key config overrides on the command line:
accelerate launch scripts/training_sd3.py geneval_sd3 --lr 1e-4 --reward_multiplier 50The configs specify epoch sizes (num_prompts_per_epoch, etc.) as totals across all processes, so the same config reproduces the paper on any number of GPUs. The per-process slice is derived at launch time.
Re-run the same command. If <output_dir>/latest exists, training continues from there; otherwise it starts fresh.
Set wandb_project in the config (or pass --wandb_project <name>) to log metrics and validation samples to Weights & Biases.
Evaluate a trained checkpoint on the held-out task prompts plus DrawBench image-quality metrics:
accelerate launch scripts/evaluate.py \
--checkpoint outputs/geneval_sd3/latest \
--rewards '[Geneval]' \
--drawbench_metrics ImageReward AestheticScore HPSv2 PickScore DeQAScoreThe training config that produced the checkpoint is loaded automatically from <checkpoint>/../config.yaml.
To evaluate the pretrained baseline (no LoRA), use --model_name instead of --checkpoint:
accelerate launch scripts/evaluate.py \
--model_name stabilityai/stable-diffusion-3.5-medium \
--prompts geneval \
--rewards '[Geneval]'Results print to stdout as RESULT <key>=<value> lines.
Most scorers fetch their weights automatically on first use. Four need to be downloaded by hand and placed under models/:
| File | Used by | Source |
|---|---|---|
models/sac+logos+ava1-l14-linearMSE.pth |
AestheticScore |
LAION aesthetic predictor |
models/HPS_v2.1_compressed.pt |
HPSv2 |
HPSv2 release |
models/open_clip_pytorch_model.bin |
HPSv2 (OpenCLIP backbone) |
OpenCLIP ViT-H-14 |
models/mask2former_swin-s-p4-w7-224_8xb2-lsj-50e_coco_20220504_001756-c9d0c4f2.pth |
Geneval |
MMDetection |
ram/
├── reward_models/ # training rewards (Geneval, OCR, PickScore) + eval metrics (Aesthetic, HPSv2, ImageReward, DeQA)
├── scripts/
│ ├── training_sd3.py # RAM training entry point
│ └── evaluate.py # task + DrawBench evaluation
├── configs/ # one YAML per paper experiment
└── prompts/ # benchmark prompt sets (geneval / ocr / pickscore / drawbench)
The RAM training algorithm itself lives in scripts/training_sd3.py — it is self-contained and the natural starting point for reading the code.
@misc{bergmeister2026ram,
title = {Reinforce Adjoint Matching: Scaling RL Post-Training of Diffusion and Flow-Matching Models},
author = {Bergmeister, Andreas and Jegelka, Stefanie and N{\"u}sken, Nikolas and Domingo-Enrich, Carles and Pidstrigach, Jakiw},
year = {2026},
eprint = {2605.10759},
archivePrefix = {arXiv},
primaryClass = {cs.LG}
}The GenEval and OCR reward implementations are adapted from Flow-GRPO and DiffusionNFT. The HPSv2, ImageReward, and PickScore wrappers re-use the official scorer packages. The aesthetic predictor comes from the LAION improved-aesthetic-predictor.
