Skip to content

AndreasBergmeister/ram

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reinforce Adjoint Matching (RAM)

Scaling RL post-training of diffusion and flow-matching models.

arXiv Blog

SD3.5M (left) vs. RAM-post-trained SD3.5M (right) on the GenEval prompt 'a red zebra'
Prompt: "a red zebra". Left: pretrained SD3.5M. Right: SD3.5M post-trained with RAM on GenEval.

Code for the three text-to-image experiments in the paper: post-training Stable Diffusion 3.5 Medium with RAM on GenEval (compositional generation), OCR (visual text rendering), and PickScore (human preference).

Training-reward curves: RAM vs. Flow-GRPO on GenEval, OCR, and PickScore


Installation

We ran our experiments on a cluster with 4× H100 96GB.

git clone https://github.com/AndreasBergmeister/ram.git
cd ram
python -m venv .venv && source .venv/bin/activate
pip install -e .

All dependency versions are pinned in pyproject.toml. Note that paddlepaddle and onedl-mmcv are CUDA-specific; the pinned versions target a CUDA 12.x environment.

Run all commands below from the repository root — the scripts and the GenEval reward use repo-relative paths (configs/, prompts/, models/).


Training

Training is launched with accelerate. Each config in configs/ reproduces one of the three paper experiments:

# Compositional generation (GenEval)
accelerate launch scripts/training_sd3.py geneval_sd3

# Visual text rendering (OCR)
accelerate launch scripts/training_sd3.py ocr_sd3

# Human-preference alignment (PickScore)
accelerate launch scripts/training_sd3.py pickscore_sd3

Outputs (a copy of the resolved config + checkpoints) land in outputs/<config_name>/. Override with --output_dir <path>.

Per-key config overrides on the command line:

accelerate launch scripts/training_sd3.py geneval_sd3 --lr 1e-4 --reward_multiplier 50

The configs specify epoch sizes (num_prompts_per_epoch, etc.) as totals across all processes, so the same config reproduces the paper on any number of GPUs. The per-process slice is derived at launch time.

Resuming

Re-run the same command. If <output_dir>/latest exists, training continues from there; otherwise it starts fresh.

W&B logging (optional)

Set wandb_project in the config (or pass --wandb_project <name>) to log metrics and validation samples to Weights & Biases.


Evaluation

Evaluate a trained checkpoint on the held-out task prompts plus DrawBench image-quality metrics:

accelerate launch scripts/evaluate.py \
    --checkpoint outputs/geneval_sd3/latest \
    --rewards '[Geneval]' \
    --drawbench_metrics ImageReward AestheticScore HPSv2 PickScore DeQAScore

The training config that produced the checkpoint is loaded automatically from <checkpoint>/../config.yaml.

To evaluate the pretrained baseline (no LoRA), use --model_name instead of --checkpoint:

accelerate launch scripts/evaluate.py \
    --model_name stabilityai/stable-diffusion-3.5-medium \
    --prompts geneval \
    --rewards '[Geneval]'

Results print to stdout as RESULT <key>=<value> lines.

Reward-scorer weights

Most scorers fetch their weights automatically on first use. Four need to be downloaded by hand and placed under models/:

File Used by Source
models/sac+logos+ava1-l14-linearMSE.pth AestheticScore LAION aesthetic predictor
models/HPS_v2.1_compressed.pt HPSv2 HPSv2 release
models/open_clip_pytorch_model.bin HPSv2 (OpenCLIP backbone) OpenCLIP ViT-H-14
models/mask2former_swin-s-p4-w7-224_8xb2-lsj-50e_coco_20220504_001756-c9d0c4f2.pth Geneval MMDetection

Repository layout

ram/
├── reward_models/           # training rewards (Geneval, OCR, PickScore) + eval metrics (Aesthetic, HPSv2, ImageReward, DeQA)
├── scripts/
│   ├── training_sd3.py      # RAM training entry point
│   └── evaluate.py          # task + DrawBench evaluation
├── configs/                 # one YAML per paper experiment
└── prompts/                 # benchmark prompt sets (geneval / ocr / pickscore / drawbench)

The RAM training algorithm itself lives in scripts/training_sd3.py — it is self-contained and the natural starting point for reading the code.


Citation

@misc{bergmeister2026ram,
  title         = {Reinforce Adjoint Matching: Scaling RL Post-Training of Diffusion and Flow-Matching Models},
  author        = {Bergmeister, Andreas and Jegelka, Stefanie and N{\"u}sken, Nikolas and Domingo-Enrich, Carles and Pidstrigach, Jakiw},
  year          = {2026},
  eprint        = {2605.10759},
  archivePrefix = {arXiv},
  primaryClass  = {cs.LG}
}

Acknowledgements

The GenEval and OCR reward implementations are adapted from Flow-GRPO and DiffusionNFT. The HPSv2, ImageReward, and PickScore wrappers re-use the official scorer packages. The aesthetic predictor comes from the LAION improved-aesthetic-predictor.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages