This is the training code for Diffusion-DPO. The script is adapted from the diffusers library.
The below are initialized with StableDiffusion models and trained as described in the paper (replicable with launchers/ scripts assuming 16 GPUs, scale gradient accumulation accordingly).
Use this notebook to compare generations. It also has a sample of automatic quantative evaluation using PickScore.
pip install -r requirements.txt
launchers/is examples of running SD1.5 or SDXL trainingutils/has the scoring models for evaluation or AI feedback (PickScore, HPS, Aesthetics, CLIP)quick_samples.ipynbis visualizations from a pretrained model vs baselinerequirements.txtBasic pip requirementstrain.pyMain script, this is pretty bulky at >1000 lines, training loop starts at ~L1000 at this commit (ctrl-F"for epoch").upload_model_to_hub.pyUploads a model checkpoint to HF (simple utility, current values are placeholder)
Example SD1.5 launch
# from launchers/sd15.sh
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export DATASET_NAME="yuvalkirstain/pickapic_v2"
# Effective BS will be (N_GPU * train_batch_size * gradient_accumulation_steps)
# Paper used 2048. Training takes ~24 hours / 2000 steps
accelerate launch --mixed_precision="fp16" train.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name=$DATASET_NAME \
--train_batch_size=1 \
--dataloader_num_workers=16 \
--gradient_accumulation_steps=1 \
--max_train_steps=2000 \
--lr_scheduler="constant_with_warmup" --lr_warmup_steps=500 \
--learning_rate=1e-8 --scale_lr \
--cache_dir="/export/share/datasets/vision_language/pick_a_pic_v2/" \
--checkpointing_steps 500 \
--beta_dpo 5000 \
--output_dir="tmp-sd15"--pretrained_model_name_or_pathwhat model to train/initalize from--output_dirwhere to save/log to--seedtraining seed (not set by default)--sdxlrun SDXL training--sftrun SFT instead of DPO
--beta_dpoKL-divergence parameter beta for DPO--choice_modelModel for AI feedback (Aesthetics, CLIP, PickScore, HPS)
-
--max_train_stepsHow many train steps to take -
--gradient_accumulation_steps -
--train_batch_sizesee above notes in script for actual BS -
--checkpointing_stepshow often to save model -
--gradient_checkpointingturned on automatically for SDXL -
--learning_rate -
--scale_lrFound this to be very helpful but isn't default in code -
--lr_schedulerType of LR warmup/decay. Default is linear warmup to constant -
--lr_warmup_stepsnumber of scheduler warmup steps -
--use_adafactorAdafactor over Adam (lower memory, default for SDXL)
--dataset_nameif you want to switch from Pick-a-Pic--cache_dirwhere dataset is cached locally (users will want to change this to fit their file system)--resolutiondefaults to 512 for non-SDXL, 1024 for SDXL.--random_cropand--no_hflipchanges data aug--dataloader_num_workersnumber of total dataloader workers
@misc{wallace2023diffusion,
title={Diffusion Model Alignment Using Direct Preference Optimization},
author={Bram Wallace and Meihua Dang and Rafael Rafailov and Linqi Zhou and Aaron Lou and Senthil Purushwalkam and Stefano Ermon and Caiming Xiong and Shafiq Joty and Nikhil Naik},
year={2023},
eprint={2311.12908},
archivePrefix={arXiv},
primaryClass={cs.CV}
}