# Dreambooth-style fine tuning of Stable Diffusion models

Yet another notebook for training Stable Diffusion models.

Tested with [Stable Diffusion v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) and [Stable Diffusion v2-base](https://huggingface.co/stabilityai/stable-diffusion-2-base).

Stay up-to-date on [Github](https://github.com/brian6091/Dreambooth), and leave an [issue](https://github.com/brian6091/Dreambooth/issues) if you run into problems.

[![Brian6091's GitHub stats](https://github-readme-stats.vercel.app/api?username=brian6091&hide=contribs&theme=onedark&show_icons=true)](https://github.com/brian6091/Dreambooth)

[<a href="https://www.buymeacoffee.com/jvsurfsqv" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" height="40px" width="140px" alt="Buy Me A Coffee"></a>](https://www.buymeacoffee.com/jvsurfsqv)

This notebook works with the default Colab python==3.8, diffusers==0.9.0 and accelerate==0.14.0. 

Using accelerate==0.15.0 and diffusers==0.10.0dev results in casting errors.

TODO:
* Using different text encoders
* Test at 768 resolution

# Install dependencies (takes about 1 minute)

In [None]:
%%capture
!cd /content/
!git clone -b drop https://github.com/brian6091/Dreambooth
!pip install -r "Dreambooth/requirements.txt"
!pip install -U --pre triton

In [None]:
#@title xformers
%%capture
!nvidia-smi

# Tested with Tesla T4 and A100 GPUs
!pip install https://github.com/brian6091/xformers-wheels/releases/download/0.0.15.dev0%2B4c06c79/xformers-0.0.15.dev0+4c06c79.d20221205-cp38-cp38-linux_x86_64.whl

# Which model to train from?

In [None]:
#@title Name or path to initial model and VAE
#@markdown Obligatory (e.g., runwayml/stable-diffusion-v1-5 or stabilityai/stable-diffusion-2-base)
MODEL_NAME_OR_PATH = "runwayml/stable-diffusion-v1-5" #@param {type:"string"}

#@markdown Optional (e.g., stabilityai/sd-vae-ft-mse), leaving empty will default to VAE packaged with the model
VAE_NAME_OR_PATH = "stabilityai/sd-vae-ft-mse" #@param {type:"string"}
#if VAE_NAME_OR_PATH=="":
#  VAE_NAME_OR_PATH = None

#@markdown (Not yet implemented), leaving empty will default to text encoder packaged with the model
TEXT_ENCODER_NAME_OR_PATH = "" #@param {type:"string"}

In [None]:
#@title Hugging Face 🤗 credentials

#@markdown If initiating training from official stable diffusion checkpoints (e.g., [stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5)), you must accept the license before using the model. You'll need a [🤗 Hugging Face](https://huggingface.co/) account to do so, after which you can [generate a login token](https://huggingface.co/settings/tokens) and paste it here.
from huggingface_hub import login

HUGGINGFACE_TOKEN = "" #@param {type:"string"}
login(HUGGINGFACE_TOKEN)

In [None]:
#@title Mount Google Drive if initial model stored there (or you want to direct outputs there)
from google.colab import drive
drive.mount('/content/gdrive')

# Set up experiment parameters

In [None]:
import os
from IPython.display import Markdown as md

#@title Training parameters

#@markdown Unique token for specific subject
INSTANCE_TOKEN= "raretoken" #@param{type: 'string'}

#@markdown Are instance image filenames captioned?
CAPTIONED_INSTANCE_IMAGES = False #@param {type:"boolean"}
CAPTIONED_INSTANCE_IMAGES_FLAG = ""
if CAPTIONED_INSTANCE_IMAGES:
  CAPTIONED_INSTANCE_IMAGES_FLAG='--image_captions_filename'

#@markdown Path to instance images. Filenames are irrelevant, unless images are captioned, in which case INSTANCE_TOKEN should appear in relevant filenames as part of the caption.
INSTANCE_DIR="/content/gdrive/MyDrive/InstanceImages/" #@param{type: 'string'}

RESOLUTION = 512 #@param{type: 'number'}

TRAIN_BATCH_SIZE = 4 #@param{type: 'number'}

GRADIENT_ACCUMULATION_STEPS = 1  #@param{type: 'number'}

GRADIENT_CHECKPOINTING = True #@param {type:"boolean"}
if GRADIENT_CHECKPOINTING:
  GRADIENT_CHECKPOINTING_FLAG='--gradient_checkpointing'

ENABLE_PRIOR_PRESERVATION = True #@param {type:"boolean"} 
if ENABLE_PRIOR_PRESERVATION:
  ENABLE_PRIOR_PRESERVATION_FLAG='--with_prior_preservation'

#@markdown Prior loss weight. Note that if you set this to 0, but enable prior preservation and provide a CLASS_DIR, you can monitor class loss.
PRIOR_LOSS_WEIGHT = 1.0 #@param {type:"number"} 

#@markdown If using prior preservation, specify a path to class images
CLASS_DIR="/content/gdrive/MyDrive/RegularizationImages/person/" #@param{type: 'string'}
if (CLASS_DIR !="") and os.path.exists(str(CLASS_DIR)):
  CLASS_DIR=CLASS_DIR
elif (CLASS_DIR !="") and not os.path.exists(str(CLASS_DIR)):
  CLASS_DIR=input('[1;31mThe folder specified does not exist, use the colab file explorer to copy the path :')

#@markdown Prompt for prior preservation class (e.g., 'person', 'a photo of a man', 'dog')
CLASS_PROMPT="a photo of a person" #@param {type:"string"}
#@markdown Instance prompt, {SKS} will be automatically replaced by INSTANCE_TOKEN defined above
INSTANCE_PROMPT="a photo of {SKS} person" #@param {type:"string"}
INSTANCE_PROMPT=INSTANCE_PROMPT.replace("{SKS}",INSTANCE_TOKEN)

#@markdown Specify the number of class images used if prior preservation is enabled. If there are not enough images in CLASS_DIR (or CLASS_DIR is empty), additional images will be generated.
MIN_NUM_CLASS_IMAGES=1500 #@param{type: 'number'}

#@markdown Batch size for generating class images 
SAMPLE_BATCH_SIZE = 4 #@param{type: 'number'}

#@markdown Number of training iterations, e.g., # instance images * 100
STEPS = 399 #@param{type: 'number'}

#@markdown Random number generator seed
SEED = 1275017 #@param{type: 'number'}

#@markdown Enable text encoder training?
TRAIN_TEXT_ENCODER = True #@param{type: 'boolean'}
if TRAIN_TEXT_ENCODER:
  TRAIN_TEXT_ENCODER_FLAG="--train_text_encoder"
else:
  TRAIN_TEXT_ENCODER_FLAG=""

#@markdown ## ADAM optimizer settings

#@markdown Use 8-bit ADAM
USE_8BIT_ADAM = True #@param {type:"boolean"} 
if USE_8BIT_ADAM:
  USE_8BIT_ADAM_FLAG='--use_8bit_adam'
else:
  USE_8BIT_ADAM_FLAG=""

#@markdown The exponential decay rate for the 1st moment estimates (the beta1 parameter for the Adam optimizer).
ADAM_BETA1 = 0.9 #@param {type:"number"}

#@markdown The exponential decay rate for the 2nd moment estimates (the beta2 parameter for the Adam optimizer).
ADAM_BETA2 = 0.999 #@param {type:"number"}

#@markdown Weight decay magnitude for the Adam optimizer.
ADAM_WEIGHT_DECAY = 1e-2 #@param {type:"number"}

#@markdown Epsilon value for the Adam optimizer.
ADAM_EPSILON = 1e-08 #@param {type:"number"}

#@markdown "fp16", "bf16", or "no" according to available VRAM
MIXED_PRECISION = "fp16" #@param{type: 'string'}

#@markdown ## Learning rate parameters
LR_SCHEDULE = "cosine" #@param ["linear", "cosine", "cosine_with_restarts", "polynomial", "constant", "constant_with_warmup"]
LR = 0.000005 #@param{type: 'number'}
LR_WARMUP_STEPS = 25 #@param{type: 'number'}
#@markdown Applies only for cosine_with_restarts schedule
LR_COSINE_NUM_CYCLES = 5 #@param{type: 'number'}

In [None]:
#@title # Experimental training parameters

#@markdown ## [Drop text-conditioning to improve classifier-free guidance sampling](https://arxiv.org/abs/2207.12598)

#@markdown Probability that image (applies to both instance and class images) will be selected for dropout (INSTANCE_PROMPT/CLASS_PROMPT will be replaced with UNCONDITIONAL_PROMPT)
CONDITIONING_DROPOUT_PROB = 0.0 #@param{type: 'number'}
#@markdown Defaults to an empty prompt. Unsure whether anything else would be useful.
UNCONDITIONAL_PROMPT = " " #@param{type: 'string'}

#@markdown ## Exponentially-weight moving average weights (unet only). Will not run on Tesla T4 (out of memory).
USE_EMA = False #@param{type: 'boolean'}
if USE_EMA:
  USE_EMA_FLAG="--use_ema"
else:
  USE_EMA_FLAG=""
EMA_INV_GAMMA = 1.0 #@param{type: 'number'}
EMA_POWER = 0.75 #@param{type: 'number'}
EMA_MIN_VALUE = 0 #@param{type: 'number'}
EMA_MAX_VALUE = 0.9999 #@param{type: 'number'}

In [None]:
#@title Where should outputs get saved?

#@markdown Trained models (and intermediates) saved here
OUTPUT_DIR="/content/models/" #@param{type: 'string'}

#@markdown Training logs saved here
LOGGING_DIR="/content/logs/" #@param{type: 'string'}

if not os.path.exists(LOGGING_DIR):
  !mkdir -p "$LOGGING_DIR"

LOG_GPU = False #@param{type: 'boolean'}
if LOG_GPU:
  LOG_GPU_FLAG="--log_gpu"
else:
  LOG_GPU_FLAG=""


In [None]:
#@title Setup saving of intermediate models
#@markdown To save intermediate checkpoints, set START_SAVING_FROM_STEP < STEPS

#@markdown Number of steps between intermediate saves
SAVE_CHECKPOINT_EVERY = 100 #@param{type: 'number'}
if SAVE_CHECKPOINT_EVERY==None:
  SAVE_CHECKPOINT_EVERY = STEPS+1

START_SAVING_FROM_STEP=100 #@param{type: 'number'}
if START_SAVING_FROM_STEP==None:
  START_SAVING_FROM_STEP=STEPS

#@markdown At each intermediate checkpoint, infer this many samples using SAVE_SAMPLE_PROMPT
N_SAVE_SAMPLES=3 #@param{type: 'number'}

#@markdown {SKS} is automatically replaced by INSTANCE_TOKEN. Give multiple prompts using // as a separator
SAVE_SAMPLE_PROMPT= "a photo of a person // a photo of {SKS} person // a photo of a cat" #@param{type: 'string'}
if SAVE_SAMPLE_PROMPT=="":
  SAVE_SAMPLE_PROMPT=None
else:
  SAVE_SAMPLE_PROMPT=SAVE_SAMPLE_PROMPT.replace("{SKS}",INSTANCE_TOKEN)

#@markdown The negative prompt, on the other hand, applies to all SAVE_SAMPLE_PROMPTs
SAVE_SAMPLE_NEGATIVE_PROMPT="hands, border" #@param{type: 'string'}

# Train!

In [None]:
#@title (optional) Tensorboard visualization of loss and learning rate
#@markdown Once the Tensorboard panel is launched (takes a good 10 seconds), click on the gear icon in upper right, and check Reload data. Then, after launching training in the next cell, click on TIME SERIES in upper left to see updates.
#%load_ext tensorboard
!rm -rf /content/logs
%reload_ext tensorboard
%tensorboard --logdir $LOGGING_DIR 

In [None]:
#@title Launch training
!lsb_release -a | grep Description
!pip freeze | grep diffusers
!pip freeze | grep torchvision
!pip freeze | grep transformers
!pip freeze | grep xformers
!accelerate env

!accelerate launch \
    --mixed_precision=$MIXED_PRECISION \
    --num_machines=1 \
    --num_processes=1 \
    /content/Dreambooth/train_dreambooth.py \
    $CAPTIONED_INSTANCE_IMAGES_FLAG \
    $TRAIN_TEXT_ENCODER_FLAG \
    --revision="fp16" \
    --pretrained_model_name_or_path=$MODEL_NAME_OR_PATH \
    --pretrained_vae_name_or_path=$VAE_NAME_OR_PATH \
    --instance_data_dir="$INSTANCE_DIR" \
    --class_data_dir="$CLASS_DIR" \
    --output_dir="$OUTPUT_DIR" \
    --logging_dir="$LOGGING_DIR" \
    $LOG_GPU_FLAG \
    $ENABLE_PRIOR_PRESERVATION_FLAG \
    --prior_loss_weight=$PRIOR_LOSS_WEIGHT \
    --instance_prompt="$INSTANCE_PROMPT" \
    --class_prompt="$CLASS_PROMPT" \
    --conditioning_dropout_prob=$CONDITIONING_DROPOUT_PROB \
    --unconditional_prompt="$UNCONDITIONAL_PROMPT" \
    --seed=$SEED \
    --resolution=$RESOLUTION \
    --train_batch_size=$TRAIN_BATCH_SIZE \
    --gradient_accumulation_steps=$GRADIENT_ACCUMULATION_STEPS \
    $GRADIENT_CHECKPOINTING_FLAG \
    --mixed_precision=$MIXED_PRECISION \
    $USE_8BIT_ADAM_FLAG \
    --adam_beta1=$ADAM_BETA1 \
    --adam_beta2=$ADAM_BETA2 \
    --adam_weight_decay=$ADAM_WEIGHT_DECAY \
    --adam_epsilon=$ADAM_EPSILON \
    --learning_rate=$LR \
    --lr_scheduler=$LR_SCHEDULE \
    --lr_warmup_steps=$LR_WARMUP_STEPS \
    --lr_cosine_num_cycles=$LR_COSINE_NUM_CYCLES \
    $USE_EMA_FLAG \
    --ema_inv_gamma=$EMA_INV_GAMMA \
    --ema_power=$EMA_POWER \
    --ema_min_value=$EMA_MIN_VALUE \
    --ema_max_value=$EMA_MAX_VALUE \
    --max_train_steps=$STEPS \
    --num_class_images=$MIN_NUM_CLASS_IMAGES \
    --sample_batch_size=$SAMPLE_BATCH_SIZE \
    --save_min_steps=$START_SAVING_FROM_STEP \
    --save_interval=$SAVE_CHECKPOINT_EVERY \
    --n_save_sample=$N_SAVE_SAMPLES \
    --save_sample_prompt="$SAVE_SAMPLE_PROMPT" \
    --save_sample_negative_prompt="$SAVE_SAMPLE_NEGATIVE_PROMPT"



# Do inference with trained model(s)

Cells in this section can be run to generate grids of images using the trained model(s). I find this useful for probing overtraining, concept bleeding, quality, etc.

In [None]:
#@title Some imports and utility functions
import torch
from diffusers import DiffusionPipeline, StableDiffusionPipeline, DPMSolverMultistepScheduler, AutoencoderKL
from PIL import Image
import os
import json
import random
import string

device = "cuda"

def image_grid(imgs, rows, cols):
    assert len(imgs) == rows*cols
    w, h = imgs[0].size
    grid = Image.new('RGB', size=(cols*w, rows*h))
    grid_w, grid_h = grid.size
    for i, img in enumerate(imgs):
        grid.paste(img, box=(i%cols*w, i//cols*h))
    return grid

def get_pipeline(model_name_or_path, 
                 vae_name_or_path=None, 
                 text_encoder_name_or_path=None,
                 feature_extractor_name_or_path=None,
                 revision="fp16"):
    #scheduler = DPMSolverMultistepScheduler.from_pretrained(model_name_or_path, subfolder="scheduler")
    scheduler = DPMSolverMultistepScheduler(
        beta_start=0.00085,
        beta_end=0.012,
        beta_schedule="scaled_linear",
        num_train_timesteps=1000,
        trained_betas=None,
        prediction_type="epsilon",
        thresholding=False,
        algorithm_type="dpmsolver++",
        solver_type="midpoint",
        lower_order_final=True,
    )

    pipe = DiffusionPipeline.from_pretrained(
        model_name_or_path,
        custom_pipeline="lpw_stable_diffusion",
        safety_checker=None,
        revision=revision,
        scheduler=scheduler,
        vae=AutoencoderKL.from_pretrained(
            vae_name_or_path or model_name_or_path,
            subfolder=None if vae_name_or_path else "vae",
            revision=None if vae_name_or_path else revision,
            torch_dtype=torch.float16,
        ),
        feature_extractor=feature_extractor_name_or_path,
        torch_dtype=torch.float16
    ).to("cuda")

    #https://github.com/huggingface/diffusers/issues/1552
    #pipe.enable_attention_slicing()
    pipe.enable_xformers_memory_efficient_attention()
    return pipe

def get_config(filename=None,
               save_dir=None,
               prompt=None, negative_prompt=None,
               seeds=None,
               num_samples=4,
               width=512, height=512,
               inference_steps=20,
               guidance_scale=7.5,
               ):
    if filename==None:
        num_prompts = len(prompt)
        if seeds==None:
            generator = torch.Generator("cuda")
            seeds = []
            for i in range(num_samples):
                seed = generator.seed()
                seeds.append(seed)
        else:
            num_samples = len(seeds)

        tag = ''.join(random.choice(string.ascii_letters) for _ in range(8))
        config = {
            "tag": tag,
            "prompt": prompt,
            "negative_prompt": negative_prompt,
            "num_prompts": num_prompts, 
            "num_samples": num_samples, 
            "seeds": seeds,
            "height": height,
            "width": width,
            "inference_steps": inference_steps,
            "guidance_scale": guidance_scale,
        }

        with open(os.path.join(save_dir, "config_"+tag+".json"), "w") as outfile:
            json.dump(config, outfile)
    else:
        f = open(filename)
        config = json.load(f)
    
    return config

def get_images(pipe, sample_config, device="cuda"):
    generator = torch.Generator("cuda")
    with torch.autocast(device):
        num_cfg = len(sample_config['guidance_scale'])
        # Loop in order to use defined seed for each image in a batch
        all_images = []
        for i in range(sample_config['num_samples']):
        #for _ in sample_config['num_samples']:
            for cfg in sample_config['guidance_scale']:
                # Manually generate latent
                seed = sample_config['seeds'][i]
                generator = generator.manual_seed(seed)
                latent = torch.randn(
                    (1, pipe.unet.in_channels, sample_config['height'] // 8, sample_config['width'] // 8),
                    generator = generator,
                    device = device
                )
                images = pipe(sample_config['prompt'],
                    negative_prompt=sample_config['negative_prompt'],
                    num_inference_steps=int(sample_config['inference_steps']),
                    guidance_scale=cfg,
                    latents=latent.repeat(sample_config['num_prompts'], 1, 1, 1),
                ).images
                all_images.extend(images)

    grid = image_grid(all_images, rows=num_cfg*sample_config['num_samples'], cols=sample_config['num_prompts'])
    return grid

In [None]:
#@title Specify which models to do inference with
model_list = [os.path.join(OUTPUT_DIR,'100'),
              os.path.join(OUTPUT_DIR,'200'),
              os.path.join(OUTPUT_DIR,'300'),
              os.path.join(OUTPUT_DIR,'399'),
              ]

print(model_list)

In [None]:
#@title Generate or load a configuration for inference

config_name = None
#config_name = os.path.join(OUTPUT_DIR, "config_ZMasiqkP.json")

if config_name is None:
    num_samples = 8
    prompt = ["photo of a cat",
              "photo of a person",
              "close-up studio portrait photo of Keanu Reeves, film, detail, studio lighting",
              "close-up studio portrait photo of Caterina Murino, film, detail, studio lighting",
              "close-up studio portrait photo of {SKS} person, film, detail, studio lighting",
              "beautiful white (marble:1.1) bust of {SKS} person, highly detailed",
              "oil painting of {SKS} person on the beach",
              "portrait of {SKS} person as an elf princess in Lord of the Rings, natural lighting, erthereal, outdoors, intricate, highly detailed, digital painting, artstation, concept art, DeviantArt, illustration, art by artgerm and greg rutkowski and alphonse mucha",
    ]
    negative_prompt = "hands, nude, nudity, duplicate, frame, border"
    guidance_scale = [1.0, 3.0, 7.0, 15.0]

    config = get_config(save_dir=OUTPUT_DIR,
                        prompt=prompt, negative_prompt=negative_prompt,
                        num_samples=num_samples,
                        width=512, height=512, 
                        inference_steps=20, guidance_scale=guidance_scale
                        )
else:
    config = get_config(filename=config_name)

config['prompt'] = [sub.replace('{SKS}', INSTANCE_TOKEN) for sub in config['prompt']]
print(config)

In [None]:
#@title Infer!
for model in model_list:
    print(model)
    pipe = get_pipeline(model)
    grid = get_images(pipe, config)
    #grid.save(os.path.join(OUTPUT_DIR, "grid_"+os.path.split(model)[1]+"_"+config['tag']+".png"))
    grid.save(os.path.join(OUTPUT_DIR, "grid_"+os.path.split(model)[1]+"_"+config['tag']+".jpg"), quality=90, optimize=True)
    del pipe
    if torch.cuda.is_available():
        torch.cuda.empty_cache()

In [None]:
#@title Generate grids for base model using same config
model_name_or_path = MODEL_NAME_OR_PATH #'runwayml/stable-diffusion-v1-5'
vae_name_or_path = VAE_NAME_OR_PATH #'stabilityai/sd-vae-ft-mse'
pipe = get_pipeline(model_name_or_path, vae_name_or_path=vae_name_or_path)
grid = get_images(pipe, config)
#grid.save(os.path.join(OUTPUT_DIR, "grid_"+os.path.split(model_name_or_path)[1]+"_"+config['tag']+".png"))
grid.save(os.path.join(OUTPUT_DIR, "grid_"+os.path.split(model_name_or_path)[1]+"_"+config['tag']+".jpg"), quality=90, optimize=True)

del pipe
if torch.cuda.is_available():
    torch.cuda.empty_cache()

# Convert to checkpoint (ckpt) or safetensors format

In [None]:
!wget -q https://raw.githubusercontent.com/huggingface/diffusers/main/scripts/convert_diffusers_to_original_stable_diffusion.py

MODEL_PATH = os.path.join(OUTPUT_DIR, '500')
CKPT_PATH = os.path.join(OUTPUT_DIR, '500.ckpt')

!python /content/convert_diffusers_to_original_stable_diffusion.py \
  --model_path $MODEL_PATH \
  --checkpoint_path $CKPT_PATH \
  --half

# Close Colab instance

In [None]:
from google.colab import runtime
runtime.unassign()