Reinforce Adjoint Matching (RAM)

Scaling RL post-training of diffusion and flow-matching models.

_{Prompt: "a red zebra". Left: pretrained SD3.5M. Right: SD3.5M post-trained with RAM on GenEval.}

Code for the three text-to-image experiments in the paper: post-training Stable Diffusion 3.5 Medium with RAM on GenEval (compositional generation), OCR (visual text rendering), and PickScore (human preference).

Installation

We ran our experiments on a cluster with 4× H100 96GB.

git clone https://github.com/AndreasBergmeister/ram.git
cd ram
python -m venv .venv && source .venv/bin/activate
pip install -e .

All dependency versions are pinned in pyproject.toml. Note that paddlepaddle and onedl-mmcv are CUDA-specific; the pinned versions target a CUDA 12.x environment.

Run all commands below from the repository root — the scripts and the GenEval reward use repo-relative paths (configs/, prompts/, models/).

Training

Training is launched with accelerate. Each config in configs/ reproduces one of the three paper experiments:

# Compositional generation (GenEval)
accelerate launch scripts/training_sd3.py geneval_sd3

# Visual text rendering (OCR)
accelerate launch scripts/training_sd3.py ocr_sd3

# Human-preference alignment (PickScore)
accelerate launch scripts/training_sd3.py pickscore_sd3

Outputs (a copy of the resolved config + checkpoints) land in outputs/<config_name>/. Override with --output_dir <path>.

Per-key config overrides on the command line:

accelerate launch scripts/training_sd3.py geneval_sd3 --lr 1e-4 --reward_multiplier 50

The configs specify epoch sizes (num_prompts_per_epoch, etc.) as totals across all processes, so the same config reproduces the paper on any number of GPUs. The per-process slice is derived at launch time.

Resuming

Re-run the same command. If <output_dir>/latest exists, training continues from there; otherwise it starts fresh.

W&B logging (optional)

Set wandb_project in the config (or pass --wandb_project <name>) to log metrics and validation samples to Weights & Biases.

Evaluation

Evaluate a trained checkpoint on the held-out task prompts plus DrawBench image-quality metrics:

accelerate launch scripts/evaluate.py \
    --checkpoint outputs/geneval_sd3/latest \
    --rewards '[Geneval]' \
    --drawbench_metrics ImageReward AestheticScore HPSv2 PickScore DeQAScore

The training config that produced the checkpoint is loaded automatically from <checkpoint>/../config.yaml.

To evaluate the pretrained baseline (no LoRA), use --model_name instead of --checkpoint:

accelerate launch scripts/evaluate.py \
    --model_name stabilityai/stable-diffusion-3.5-medium \
    --prompts geneval \
    --rewards '[Geneval]'

Results print to stdout as RESULT <key>=<value> lines.

Reward-scorer weights

Most scorers fetch their weights automatically on first use. Four need to be downloaded by hand and placed under models/:

File	Used by	Source
`models/sac+logos+ava1-l14-linearMSE.pth`	`AestheticScore`	LAION aesthetic predictor
`models/HPS_v2.1_compressed.pt`	`HPSv2`	HPSv2 release
`models/open_clip_pytorch_model.bin`	`HPSv2` (OpenCLIP backbone)	OpenCLIP ViT-H-14
`models/mask2former_swin-s-p4-w7-224_8xb2-lsj-50e_coco_20220504_001756-c9d0c4f2.pth`	`Geneval`	MMDetection

Repository layout

ram/
├── reward_models/           # training rewards (Geneval, OCR, PickScore) + eval metrics (Aesthetic, HPSv2, ImageReward, DeQA)
├── scripts/
│   ├── training_sd3.py      # RAM training entry point
│   └── evaluate.py          # task + DrawBench evaluation
├── configs/                 # one YAML per paper experiment
└── prompts/                 # benchmark prompt sets (geneval / ocr / pickscore / drawbench)

The RAM training algorithm itself lives in scripts/training_sd3.py — it is self-contained and the natural starting point for reading the code.

Citation

@misc{bergmeister2026ram,
  title         = {Reinforce Adjoint Matching: Scaling RL Post-Training of Diffusion and Flow-Matching Models},
  author        = {Bergmeister, Andreas and Jegelka, Stefanie and N{\"u}sken, Nikolas and Domingo-Enrich, Carles and Pidstrigach, Jakiw},
  year          = {2026},
  eprint        = {2605.10759},
  archivePrefix = {arXiv},
  primaryClass  = {cs.LG}
}

Acknowledgements

The GenEval and OCR reward implementations are adapted from Flow-GRPO and DiffusionNFT. The HPSv2, ImageReward, and PickScore wrappers re-use the official scorer packages. The aesthetic predictor comes from the LAION improved-aesthetic-predictor.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
configs		configs
prompts		prompts
reward_models		reward_models
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforce Adjoint Matching (RAM)

Installation

Training

Resuming

W&B logging (optional)

Evaluation

Reward-scorer weights

Repository layout

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Reinforce Adjoint Matching (RAM)

Installation

Training

Resuming

W&B logging (optional)

Evaluation

Reward-scorer weights

Repository layout

Citation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages