# StableDiffusion finetuning with Dreambooth

based on HuggingFace examples and customized scripts from GitHub:ShivamShrirao

https://huggingface.co/runwayml/stable-diffusion-v1-5

https://github.com/ShivamShrirao

## Check type of GPU and VRAM available

If this returns 'command not found' you are not using a GPU with your notebook

This notebook works best on Tesla T4 with 16gb

In [None]:
!nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader

## Set HuggingFace token

uncomment lines and replace 'INSERT_TOKEN_HERE' with your HF token

only needs to be run once as it is saved to disc

In [None]:
#ONLY RUN ONCE
#HUGGINGFACE_TOKEN = "INSERT_TOKEN_HERE" 
#!mkdir -p ~/.huggingface
#!echo -n "{HUGGINGFACE_TOKEN}" > ~/.huggingface/token

## Configure a list of concepts to finetune on top of the normal StableDiffusion model

For this example, we will only use 1 new concept - but you can add multiple concepts here and tweak '--max_training_steps' accordingly

-instance_prompt - the prompt we would type to generate the image we are attempting to fine tune

-class_prompt - denotes a prompt without the unique identifier/instance. This prompt is used for generating "class images" for prior preservation. For our example, this prompt is - "a photo of a person" versus a photo of a specific person.

-instance_data_dir - the location where our training images are stored for finetuning

-class_data_dir - sample images for the general class of prompt we are fine tuning - if there are no images here, samples will be generated.  Otherwise, you can provie ~20 images of the general concept you want to generate (but not the actual instance images that we finetune on)

In [None]:
concepts_list = [ 
    {
         "instance_prompt":      "photo of cc person",
         "class_prompt":         "photo of a person",
         "instance_data_dir":    "./content/data/cc",
         "class_data_dir":       "./content/data/person"
    },  
] 

import json
import os
for c in concepts_list:
    os.makedirs(c["instance_data_dir"], exist_ok=True)

with open("concepts_list_cc.json", "w") as f:
    json.dump(concepts_list, f, indent=4)

## General imports and variable setting

In [None]:
import torch
import random
from datetime import datetime

torch.cuda.empty_cache() 

#these can be hardcoded to reduce randomness and increase the likelyhood of seeing the same generations
accelerate_seed = random.randint(100, 60000)
cude_seed = random.randint(100, 60000) 

OUTPUT_DIR = "stable_diffusion_weights/cc" 
OUTPUT_DIR = "./content/" + OUTPUT_DIR
print(f"[*] Weights will be saved at {OUTPUT_DIR}")

!mkdir -p $OUTPUT_DIR

## Download the training file and the diffusion script 

We are utilizing the custom files from 
https://github.com/ShivamShrirao/diffusers

In [None]:
#!wget -q https://github.com/ShivamShrirao/diffusers/raw/main/examples/dreambooth/train_dreambooth.py -O train_dreambooth_ShivamShrirao.py
!wget -q https://raw.githubusercontent.com/ShivamShrirao/diffusers/6f3cbefb6a0aa13340613b4dafea5d8bb53e51f3/examples/dreambooth/train_dreambooth.py -O train_dreambooth_ShivamShrirao.py
!wget -q https://github.com/ShivamShrirao/diffusers/raw/main/scripts/convert_diffusers_to_original_stable_diffusion.py

## PIP install a few other required libaries

In [None]:
%pip install -qq git+https://github.com/ShivamShrirao/diffusers.git@25045fd
%pip install -q -U --pre triton==2.0.0.dev20230217
%pip install -q accelerate==0.12.0 transformers==4.24.0 ftfy==6.1.1 bitsandbytes==0.35.0 gradio==3.20.1 natsort==8.3.1
#%pip install -q https://github.com/brian6091/xformers-wheels/releases/download/0.0.15.dev0%2B4c06c79/xformers-0.0.15.dev0+4c06c79.d20221205-cp38-cp38-linux_x86_64.whl
%pip install xformers
%pip freeze > requirements.txt

!export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512

## Start training

Run the StableDuffion+Dreambooth training via huggingFace accelerate

https://huggingface.co/docs/accelerate/index 

In [None]:
torch.cuda.empty_cache() 

MODEL_NAME = "runwayml/stable-diffusion-v1-5"
PRECISION = "fp16"
MAX_TRAIN_STEPS = 1200

#!accelerate launch --help
!accelerate launch --mixed_precision="fp16" --num_processes=1 --num_machines=1 --num_cpu_threads_per_process=2 \
train_dreambooth_ShivamShrirao.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --pretrained_vae_name_or_path="stabilityai/sd-vae-ft-mse" \
  --output_dir=$OUTPUT_DIR \
  --revision="fp16" \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --seed=1337 \
  --resolution=512 \
  --train_batch_size=1 \
  --train_text_encoder \
  --mixed_precision="fp16" \
  --use_8bit_adam \
  --gradient_accumulation_steps=1 \
  --learning_rate=5e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=50 \
  --sample_batch_size=1 \
  --max_train_steps=$MAX_TRAIN_STEPS \
  --save_interval=100000 \
  --save_sample_prompt="photo of cc person" \
  --concepts_list="concepts_list_cc.json" \
  --gradient_checkpointing

### UpdateWeights

In [None]:
WEIGHTS_DIR = "" 
if WEIGHTS_DIR == "":
    from natsort import natsorted
    from glob import glob
    import os
    WEIGHTS_DIR = natsorted(glob(OUTPUT_DIR + os.sep + "*"))[-1]
print(f"[*] WEIGHTS_DIR={WEIGHTS_DIR}")

### Generate a grid of preview images

These will represent a pure generation of the instance_prompt supplied above.  It will not include additional directives or embelihsments, such as 'a picture of cc as cool wizard' you just get some pictures of cc.

In [None]:
import os
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

weights_folder = OUTPUT_DIR
folders = sorted([f for f in os.listdir(weights_folder) if f != "0"], key=lambda x: int(x))

row = len(folders)
col = len(os.listdir(os.path.join(weights_folder, folders[0], "samples")))
scale = 4
fig, axes = plt.subplots(row, col, figsize=(col*scale, row*scale), gridspec_kw={'hspace': 0, 'wspace': 0})

for i, folder in enumerate(folders):
    folder_path = os.path.join(weights_folder, folder)
    image_folder = os.path.join(folder_path, "samples")
    images = [f for f in os.listdir(image_folder)]
    for j, image in enumerate(images):
        if row == 1:
            currAxes = axes[j]
        else:
            currAxes = axes[i, j]
        if i == 0:
            currAxes.set_title(f"Image {j}")
        if j == 0:
            currAxes.text(-0.1, 0.5, folder, rotation=0, va='center', ha='center', transform=currAxes.transAxes)
        image_path = os.path.join(image_folder, image)
        img = mpimg.imread(image_path)
        currAxes.imshow(img, cmap='gray')
        currAxes.axis('off')
        
plt.tight_layout()
plt.savefig('content/grid.png', dpi=72)

Convert weights to ckpt to use in web UIs like AutoMatic1111

In [None]:
ckpt_path = WEIGHTS_DIR + "/model.ckpt"

half_arg = ""

fp16 = True #@param {type: "boolean"}

if fp16:
    half_arg = "--half"
!python convert_diffusers_to_original_stable_diffusion_ShivamShrirao.py --model_path $WEIGHTS_DIR  --checkpoint_path $ckpt_path $half_arg
print(f"[*] Converted ckpt saved at {ckpt_path}")

## Inference

In [None]:
import torch
from torch import autocast
from diffusers import StableDiffusionPipeline, DDIMScheduler
from IPython.display import display

#You can replace this with another directory of weights if you have another pre-trained model that you want to just drop in and use
#for example:
#model_path = './content/stable_diffusion_weights/cc/1200'           

model_path = WEIGHTS_DIR  

#You can use a different schedule but DDIM seems to be better when working on faces (which is the primary use case for this example)
#https://huggingface.co/blog/dreambooth 
scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False)
pipe = StableDiffusionPipeline.from_pretrained(model_path, scheduler=scheduler, safety_checker=None, torch_dtype=torch.float16).to("cuda")

g_cuda = None
g_cuda = torch.Generator(device='cuda') 
g_cuda.manual_seed(cuda_seed)

In [None]:
#this is the normal prompt you are used to when generating images from text
#make sure to include the phrase 'photo of XX person' 
#to force the model to use your finetuned results as a starting point
prompt = "hyper-maximalist overdetailed comic book illustration headshot photo of cc person as hero.  Give him a long, luxurious beard like Dumbledore. Make the image dark and gritty, like Sin City or Underworld movies"

#negative prompts allow for removing/limiting what will be included in generated images
#commonly would use 'dupliate' to ensure you don't get multiple copies 
#of the instance iamge in a single output
negative_prompt = "duplicate"      

with autocast("cuda"), torch.inference_mode():
    images = pipe(
        prompt,
        height=512,
        width=512,
        negative_prompt=negative_prompt,
        num_images_per_prompt=1,
        num_inference_steps=100,
        guidance_scale=8.5,
        generator=g_cuda
    ).images 
    
    for img in images:
            dt = datetime.now() 
            ts = datetime.timestamp(dt)

            display(img)
            img.save('./content/ccOutputs/'+str(ts) + ".jpg", "JPEG")

## (Optional) Delete diffuser and old weights and only keep the ckpt to free up drive space.
[ ! ] Caution, Only execute if you are sure you want to delete the diffuser format weights and only use the ckpt.

In [None]:
#Remove following line to run
"""
import shutil
from glob import glob
import os
for f in glob(OUTPUT_DIR+os.sep+"*"):
    if f != WEIGHTS_DIR:
        shutil.rmtree(f)
        print("Deleted", f)
for f in glob(WEIGHTS_DIR+"/*"):
    if not f.endswith(".ckpt") or not f.endswith(".json"):
        try:
            shutil.rmtree(f)
        except NotADirectoryError:
            continue
        print("Deleted", f)
"""
#Remove preceeding line to run

# Clean up

next cell will clean up torch and exit the kernel freeing up any memory

In [None]:
torch.cuda.empty_cache() 

exit()