##DreamBooth with Stable Diffusion V2

This notebook is [KaliYuga](https://twitter.com/KaliYuga_ai)'s very basic fork of [Shivam Shrirao](https://github.com/ShivamShrirao)'s DreamBooth notebook. In addition to a vew minor formatting and QoL additions, I've added Stable Diffusion V2 as the default training option and optimized the training settings to reflect what I've found to be the best general ones. They are only suggestions; feel free to tweak anything and everything if my defaults don't do it for you.

**I also [wrote a guide](https://peakd.com/hive-158694/@kaliyuga/training-a-dreambooth-model-using-stable-diffusion-v2-and-very-little-code)** that should take you through building a dataset and training a model using this notebook. If this is your first time creating a model from scratch, I reccommend you check it out!

In [1]:
#@markdown Check type of GPU and VRAM available.
#!nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader
!nvidia-smi

Thu Mar  2 21:29:29 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   43C    P0    26W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
#@title Login to Google Dive
#@markdown Access Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


https://github.com/KaliYuga-ai/diffusers/tree/main/examples/dreambooth

In [3]:
#@title Login to HuggingFace 🤗

#@markdown You need to accept the model license before downloading or using the Stable Diffusion weights. Please, visit the [model card](https://huggingface.co/stabilityai/stable-diffusion-2), read the license and tick the checkbox if you agree. You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work.
# https://huggingface.co/settings/tokens
!mkdir -p ~/.huggingface
HUGGINGFACE_TOKEN = "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" #@param {type:"string"}
!echo -n "{HUGGINGFACE_TOKEN}" > ~/.huggingface/token

#if HUGGINGFACE_TOKEN == "":
#    raise ValueError("No Huggingface token found")

## Install Requirements

In [4]:
%pip install torch==1.13.0+cu116 torchvision==0.14.0+cu116 torchaudio==0.13.0 torchtext==0.14.0 --extra-index-url https://download.pytorch.org/whl/cu116
%cd /content

import os
if os.path.isfile("/content/train_dreambooth.py"):
    os.unlink("/content/train_dreambooth.py")
if os.path.isfile("/content/convert_diffusers_to_original_stable_diffusion.py"):
    os.unlink("/content/convert_diffusers_to_original_stable_diffusion.py")

#!wget -q https://github.com/ShivamShrirao/diffusers/raw/main/examples/dreambooth/train_dreambooth.py
# !wget -q https://github.com/ethan-team/diffusers_colab/raw/main/examples/dreambooth/train_dreambooth.py
!wget -q https://github.com/AlexZheng-UCLA/dreambooth/raw/main/script/train_dreambooth.py
!wget -q https://github.com/ShivamShrirao/diffusers/raw/main/scripts/convert_diffusers_to_original_stable_diffusion.py

%pip install git+https://github.com/ShivamShrirao/diffusers
%pip install -U --pre triton
%pip install accelerate==0.12.0 transformers ftfy bitsandbytes 
%pip install gradio natsort

%pip install https://github.com/brian6091/xformers-wheels/releases/download/0.0.15.dev0%2B4c06c79/xformers-0.0.15.dev0+4c06c79.d20221205-cp38-cp38-linux_x86_64.whl
%pip install safetensors

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/, https://download.pytorch.org/whl/cu116
Collecting torch==1.13.0+cu116
  Downloading https://download.pytorch.org/whl/cu116/torch-1.13.0%2Bcu116-cp38-cp38-linux_x86_64.whl (1983.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 GB[0m [31m897.7 kB/s[0m eta [36m0:00:00[0m
[?25hCollecting torchvision==0.14.0+cu116
  Downloading https://download.pytorch.org/whl/cu116/torchvision-0.14.0%2Bcu116-cp38-cp38-linux_x86_64.whl (24.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.2/24.2 MB[0m [31m17.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting torchaudio==0.13.0
  Downloading https://download.pytorch.org/whl/cu116/torchaudio-0.13.0%2Bcu116-cp38-cp38-linux_x86_64.whl (4.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.2/4.2 MB[0m [31m23.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting torchtext==0.14.0
  Downloadi

## Settings and run

In [5]:
#@markdown If model weights should be saved directly in google drive (takes around 4-5 GB).
save_to_gdrive = True #@param {type:"boolean"}
if save_to_gdrive:
    from google.colab import drive
    drive.mount('/content/drive')

#@markdown Name/Path of the initial model.
MODEL_NAME = "runwayml/stable-diffusion-v1-5" #@param 
VAE_NAME = "stabilityai/sd-vae-ft-mse" #@param

#@markdown Enter the directory name to save model at.

OUTPUT_DIR = "stable_diffusion_weights/zyc" #@param {type:"string"}
if save_to_gdrive:
    OUTPUT_DIR = "/content/drive/MyDrive/" + OUTPUT_DIR
else:
    OUTPUT_DIR = "/content/" + OUTPUT_DIR

print(f"[*] Weights will be saved at {OUTPUT_DIR}")

!mkdir -p $OUTPUT_DIR

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
[*] Weights will be saved at /content/drive/MyDrive/stable_diffusion_weights/zyc


In [6]:
#@markdown ###Load required models
from diffusers import AutoencoderKL, DDIMScheduler, DDPMScheduler, StableDiffusionPipeline, UNet2DConditionModel
import torch
import ipywidgets as widgets
from io import BytesIO
import requests
import tqdm
from PIL import Image
import matplotlib.pyplot as plt

### Model Samples 

In [7]:
MODEL_SAMPLES = False #@param {type: 'boolean'}
ANOTHER_MODEL_NAME = "" #@param
prompt = "marilyn monroe, illustration style" #@param
if MODEL_SAMPLES and not ANOTHER_MODEL_NAME:
  pipe = StableDiffusionPipeline.from_pretrained(MODEL_NAME, torch_dtype=torch.float16)
  pipe = pipe.to("cuda")

  if not os.path.exists("samples"):
    %mkdir "samples"
    
  image1 = pipe(prompt).images[0]
  image1.save("samples/1.png")

  image2 = pipe(prompt).images[0]
  image2.save("samples/2.png")
  plt.imshow(image2)

elif MODEL_SAMPLES and ANOTHER_MODEL_NAME:
  pipe = StableDiffusionPipeline.from_pretrained(ANOTHER_MODEL_NAME, torch_dtype=torch.float16)
  pipe = pipe.to("cuda")

  if not os.path.exists("samples"):
    %mkdir "samples"

  image1 = pipe(prompt).images[0]
  image1.save("samples/1.png")

  image2 = pipe(prompt).images[0]
  image2.save("samples/2.png")
  plt.imshow(image2)

### Define Your Concepts List
You can add multiple concepts here. Try tweaking `--max_train_steps` accordingly.
It's a good idea to test class prompts in Stable Diffusion V2 before committing to them. If the images V2 generates at a CFG of 7 and 50 steps aren't great, consider a different class prompt. 

In [8]:
import json
import os

READ_PROMPT_FROM_TXTS = True
token = "zyc" 

instance_dir_list = ["zyc"]
num_class_images = [120]  

class_prompt_list = ["photo of a man"]
class_dir_list = ["class/sd-man"]


dataset_dir = "dataset/"
instance_dir_list = [dataset_dir + dir for dir in instance_dir_list]
class_dir_list = [dataset_dir + dir for dir in class_dir_list]

concepts_list = [

    {
       "instance_prompt":       token,
       "class_prompt":          class_prompt_list[0],
       "instance_data_dir":     instance_dir_list[0],
       "class_data_dir":        class_dir_list[0],
       "num_class_images":      num_class_images[0]
    },
    
]

# `class_data_dir` contains regularization images
for concept in concepts_list:
    os.makedirs(concept["instance_data_dir"], exist_ok=True)

with open("concepts_list.json", "w") as f:
    json.dump(concepts_list, f, indent=4)

In [9]:
GENERATE_PROMPT = False #@param {type: "boolean"}

if READ_PROMPT_FROM_TXTS  and GENERATE_PROMPT:
  from transformers import BlipProcessor, BlipForConditionalGeneration

  processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")
  model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large", torch_dtype=torch.float16).to("cuda")
  
  for instance_dir in instance_dir_list:
    for filename in os.listdir(instance_dir):
      
      extension = filename.split(".")[-1]
      if extension != "txt":

        txtname = filename + ".txt"
        image = Image.open(instance_dir+"/"+filename).convert('RGB')

        # text = "a photo of"
        text = ""
        inputs = processor(image, text, return_tensors="pt").to("cuda", torch.float16)

        out = model.generate(**inputs)
        out = processor.decode(out[0], skip_special_tokens=True)
        prompt = out.replace("boy",token).replace("man", token).replace("male", token).replace("asian", "")
        # prompt = out.replace("girl",token).replace("woman", token).replace("female", token).replace("asian", "")
        print(prompt)
        with open(instance_dir+"/"+txtname, 'w') as f:
              f.write(prompt)

In [10]:
import ipywidgets as widgets
from io import BytesIO
#@markdown #Captions

#@markdown - Open a tool to manually `create` captions or edit existing captions of the instance images.

paths=""
out=""
widgets_l=""

instance_dir = 0 #@param
DIR = instance_dir_list[instance_dir]
def Caption(path):
  
    if path!="Select an instance image to caption":
      
      name = os.path.basename(path)
      ext=os.path.splitext(os.path.basename(path))[-1][1:]
      if ext=="jpg" or "JPG":
        ext="JPEG"
      else:
        ext="PNG"      

      if os.path.exists(DIR + "/"+ name + '.txt'):
        with open(DIR + "/" + name + '.txt', 'r') as f:
            text = f.read()
      else:
        with open(DIR + "/" + name + '.txt', 'w') as f:
            f.write("")
            with open(DIR + "/" + name + '.txt', 'r') as f:
                text = f.read()   

      img=Image.open(os.path.join(DIR,path))
      img = img.convert('RGB').resize((420, 420))
      image_bytes = BytesIO()
      img.save(image_bytes, format=ext, qualiy=10)
      image_bytes.seek(0)
      image_data = image_bytes.read()
      img= image_data
      image = widgets.Image(
          value=img,
          width=420,
          height=420
      )
      text_area = widgets.Textarea(value=text, description='', disabled=False, layout={'width': '500px', 'height': '120px'})
      

      def update_text(text):
          with open(DIR+"/"+ name + '.txt', 'w') as f:
              f.write(text)

      button = widgets.Button(description='Save', button_style='success')
      button.on_click(lambda b: update_text(text_area.value))
      
      # return widgets.VBox([widgets.HBox([text_area, button])])
      return widgets.HBox([image, widgets.VBox([text_area, button])])


paths = os.listdir(DIR)
widgets_l = widgets.Select(options=["Select an instance image to caption"]+paths, rows=25)


out = widgets.Output()

def click(change):
    with out:
        out.clear_output()
        display(Caption(change.new))

widgets_l.observe(click, names='value')
display(widgets.HBox([widgets_l, out]))

HBox(children=(Select(options=('Select an instance image to caption',), rows=25, value='Select an instance ima…

In [11]:
print("Find results at: ", OUTPUT_DIR)
print("MODEL: ", MODEL_NAME)

Find results at:  /content/drive/MyDrive/stable_diffusion_weights/zyc
MODEL:  runwayml/stable-diffusion-v1-5


### Define Testing Prompt List

In [12]:
man_prompts_list = [f"{token} with wings",
           f"{token} clothed in metal armor"
           ]
               

prompts_list = man_prompts_list

with open("prompts_list.json", "w") as f:
    json.dump(prompts_list, f, indent=4)

### Define Steps setting List

In [18]:
steps_setting = {
    "max_train_steps": 800, 
    "save_interval": 200,
    "lr_warmup_steps": 100,
    "save_min_steps":0,
    "only_save_steps": [600, 800]
  }

with open("steps_setting.json", "w") as f:
    json.dump(steps_setting, f, indent=4)

### Training Settings

In [13]:
# --train_text_encoder \
if READ_PROMPT_FROM_TXTS:
  !accelerate launch train_dreambooth.py \
    --pretrained_model_name_or_path=$MODEL_NAME \
    --pretrained_vae_name_or_path=$VAE_NAME \
    --output_dir=$OUTPUT_DIR \
    --revision="fp16" \
    --with_prior_preservation \
    --prior_loss_weight=1.0 \
    --seed=1337 \
    --resolution=512 \
    --train_batch_size=1 \
    --mixed_precision="fp16" \
    --use_8bit_adam \
    --gradient_accumulation_steps=1 \
    --gradient_checkpointing \
    --learning_rate=1.5e-6 \
    --lr_scheduler="polynomial" \
    --num_class_images=120 \
    --sample_batch_size=4 \
    --train_text_encoder \
    --concepts_list="concepts_list.json" \
    --prompts_list="prompts_list.json" \
    --steps_setting="steps_setting.json" \
    --n_save_sample=2 \
    --read_prompts_from_txts 

The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `1`
	`--num_machines` was set to a value of `1`
	`--mixed_precision` was set to a value of `'no'`
	`--num_cpu_threads_per_process` was set to `2` to improve out-of-box performance
2023-03-02 21:33:38.817865: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-02 21:33:38.817955: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
usage: train_dreambooth.py
       [-h]
       --pretrained_model_name_or_path
       PRETRAINED_MODEL_NAME_OR_PATH
       [-

In [14]:
# --train_text_encoder \
if not READ_PROMPT_FROM_TXTS:
if READ_PROMPT_FROM_TXTS:
  !accelerate launch train_dreambooth.py \
    --pretrained_model_name_or_path=$MODEL_NAME \
    --pretrained_vae_name_or_path=$VAE_NAME \
    --output_dir=$OUTPUT_DIR \
    --revision="fp16" \
    --with_prior_preservation \
    --prior_loss_weight=1.0 \
    --seed=1337 \
    --resolution=512 \
    --train_batch_size=1 \
    --mixed_precision="fp16" \
    --use_8bit_adam \
    --gradient_accumulation_steps=1 \
    --gradient_checkpointing \
    --learning_rate=1.5e-6 \
    --lr_scheduler="polynomial" \
    --num_class_images=120 \
    --sample_batch_size=4 \
    --train_text_encoder \
    --concepts_list="concepts_list.json" \
    --prompts_list="prompts_list.json" \
    --steps_setting="steps_setting.json" \
    --n_save_sample=2

In [15]:
#@markdown Run to generate a grid of preview images from the last saved weights.
import os
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

weights_folder = OUTPUT_DIR
folders = sorted([f for f in os.listdir(weights_folder) if f != "0"], key=lambda x: int(x))

row = len(folders)
col = len(os.listdir(os.path.join(weights_folder, folders[0], "samples")))
scale = 5
fig, axes = plt.subplots(row, col, figsize=(col*scale, row*scale), gridspec_kw={'hspace': 0, 'wspace': 0})

for i, folder in enumerate(folders):
    folder_path = os.path.join(weights_folder, folder)
    image_folder = os.path.join(folder_path, "samples")
    images = [f for f in os.listdir(image_folder)]
    
    for j, image in enumerate(images):
        if row == 1:
            currAxes = axes[j]
        else:
            currAxes = axes[i, j]
        if i == 0:
            currAxes.set_title(prompts_list_token[j][3:15])
        if j == 0:
            currAxes.text(-0.1, 0.5, folder, rotation=0, va='center', ha='center', transform=currAxes.transAxes)
        image_path = os.path.join(image_folder, image)
        img = mpimg.imread(image_path)
        currAxes.imshow(img, cmap='gray')
        currAxes.axis('off')

plt.tight_layout()
plt.savefig(f'{OUTPUT_DIR}/grid.png', dpi=72)

IndexError: ignored

In [None]:
STOPHERE()  # a wrong function to force stop here before next step, you can skip this cell and continue run celles after this cell

## Testing your new model

Once your model has finished training (or has reached a checkpoint you like), run the following cells to test it out.

In [None]:
OUTPUT_DIR = "stable_diffusion_weights/zyc"
OUTPUT_DIR = "/content/drive/MyDrive/" + OUTPUT_DIR

In [None]:
#@markdown Specify the weights directory to use (leave blank for latest)
WEIGHTS_DIR = "/1000" #@param {type:"string"}
WEIGHTS_DIR = OUTPUT_DIR + WEIGHTS_DIR

if WEIGHTS_DIR == OUTPUT_DIR:
    from natsort import natsorted
    from glob import glob
    import os
    WEIGHTS_DIR = natsorted(glob(OUTPUT_DIR + os.sep + "*"))[-1]

print(f"[*] WEIGHTS_DIR={WEIGHTS_DIR}")

#### Convert weights to ckpt to use in web UIs like AUTOMATIC1111.

In [None]:
CONVERT = False #@param {type:"boolean"}
if CONVERT:
  ckpt_path = WEIGHTS_DIR + "/model.ckpt"

  half_arg = ""
  #@markdown  Whether to convert to fp16, takes half the space (2GB).
  fp16 = True #@param {type: "boolean"}
  if fp16:
      half_arg = "--half"
  !python convert_diffusers_to_original_stable_diffusion.py --model_path $WEIGHTS_DIR  --checkpoint_path $ckpt_path $half_arg
  print(f"[*] Converted ckpt saved at {ckpt_path}")

#### Inference

In [None]:
import torch
from torch import autocast
from diffusers import StableDiffusionPipeline, DDIMScheduler
from IPython.display import display

model_path = WEIGHTS_DIR             # If you want to use previously trained model saved in gdrive, replace this with the full path of model in gdrive

scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False)
pipe = StableDiffusionPipeline.from_pretrained(model_path, scheduler=scheduler, safety_checker=None, torch_dtype=torch.float16).to("cuda")

g_cuda = None

In [None]:
#@markdown Can set random seed here for reproducibility.
g_cuda = torch.Generator(device='cuda')
seed = 1500 #@param {type:"number"}
g_cuda.manual_seed(seed)

In [None]:
prompt = "lebron james" #@param {type:"string"}
negative_prompt = "" #@param {type:"string"}
num_samples = 2 #@param {type:"number"}
guidance_scale = 7.5 #@param {type:"number"}
num_inference_steps = 50 #@param {type:"number"}
height = 512 #@param {type:"number"}
width = 512 #@param {type:"number"}

with autocast("cuda"), torch.inference_mode():
    images = pipe(
        prompt,
        height=height,
        width=width,
        negative_prompt=negative_prompt,
        num_images_per_prompt=num_samples,
        num_inference_steps=num_inference_steps,
        guidance_scale=guidance_scale,
        generator=g_cuda
    ).images

for img in images:
    display(img)

In [None]:
STOP

In [None]:
#@markdown Run Gradio UI for generating images.
import gradio as gr

def inference(prompt, negative_prompt, num_samples, height=512, width=512, num_inference_steps=50, guidance_scale=7.5):
    with torch.autocast("cuda"), torch.inference_mode():
        return pipe(
                prompt, height=int(height), width=int(width),
                negative_prompt=negative_prompt,
                num_images_per_prompt=int(num_samples),
                num_inference_steps=int(num_inference_steps), guidance_scale=guidance_scale,
                generator=g_cuda
            ).images

with gr.Blocks() as demo:
    with gr.Row():
        with gr.Column():
            prompt = gr.Textbox(label="Prompt", value="photo of zwx dog in a bucket")
            negative_prompt = gr.Textbox(label="Negative Prompt", value="")
            run = gr.Button(value="Generate")
            with gr.Row():
                num_samples = gr.Number(label="Number of Samples", value=4)
                guidance_scale = gr.Number(label="Guidance Scale", value=7.5)
            with gr.Row():
                height = gr.Number(label="Height", value=512)
                width = gr.Number(label="Width", value=512)
            num_inference_steps = gr.Slider(label="Steps", value=50)
        with gr.Column():
            gallery = gr.Gallery()

    run.click(inference, inputs=[prompt, negative_prompt, num_samples, height, width, num_inference_steps, guidance_scale], outputs=gallery)

demo.launch(debug=True)

In [None]:
#@title (Optional) Delete diffuser and old weights and only keep the ckpt to free up drive space.

#@markdown [ ! ] Caution, Only execute if you are sure u want to delete the diffuser format weights and only use the ckpt.
import shutil
from glob import glob
import os
for f in glob(OUTPUT_DIR+os.sep+"*"):
    if f != WEIGHTS_DIR:
        shutil.rmtree(f)
        print("Deleted", f)
for f in glob(WEIGHTS_DIR+"/*"):
    if not f.endswith(".ckpt") or not f.endswith(".json"):
        try:
            shutil.rmtree(f)
        except NotADirectoryError:
            continue
        print("Deleted", f)

In [None]:
#@title Free runtime memory
exit()