# Dreambooth training 
### We recommend [official Diffusers tutorial](https://huggingface.co/docs/diffusers/training/dreambooth) which helps to get familiar with this type of fine-tuning, explains the environment setup and used parameters in detail.   
### Our fine-tuned models and training data are available on OneDrive.

## Crop and resize selected images to 512x512.

In [None]:
from PIL import Image
import os
from torchvision.transforms import functional as F
from torchvision.transforms import InterpolationMode
import random
import json

In [2]:
folder = "/path/to/training/images"
folder_cropped = "/path/to/training/images_croped"
os.makedirs(folder_cropped, exist_ok=True)

for filename in os.listdir(folder):
    if filename.endswith('.jpeg') or filename.endswith('.png') or filename.endswith('.jpg'):
        image = Image.open(os.path.join(folder, filename)).convert("RGB")
        image = F.center_crop(image, 480) # this size (eg. 480) depends on cholect video
        image = F.resize(image, 512, interpolation = InterpolationMode.BILINEAR)
        image.save(os.path.join(folder_cropped, filename))

## Creating json concept list

In [3]:
concepts_list = [
    {
        "instance_prompt":      "cholect45",
        "class_prompt":         "",
        "instance_data_dir":    f"{folder_cropped}"
    },
]

with open("./concepts_list.json", "w") as f:
    json.dump(concepts_list, f, indent=4)

## DREAMBOOTH TRAINING
### Following commands run in Terminal. Also remember to have `diffusers venv` activated!

In [None]:
'''
cd training
source diffusers_venv/bin/activate
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="/path/to/save/checkpoints"
'''

### Template for Dreabooth training command

In [None]:
#template
'''
!accelerate launch train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --pretrained_vae_name_or_path="stabilityai/sd-vae-ft-mse" \
  --output_dir=$OUTPUT_DIR \
  --revision="fp16" \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --seed=1337 \
  --resolution=512 \
  --train_batch_size=1 \
  --train_text_encoder \
  --mixed_precision="fp16" \
  --use_8bit_adam \
  --gradient_accumulation_steps=1 \
  --learning_rate=1e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=50 \
  --sample_batch_size=4 \
  --max_train_steps=800 \
  --save_interval=10000 \
  --save_sample_prompt="photo of zwx dog" \
  --concepts_list="concepts_list.json"
'''

### The actual command (with proper parameters) we used for training all styles - run in Terminal

In [None]:
'''
CUDA_VISIBLE_DEVICES=1 accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--output_dir=$OUTPUT_DIR \
--concepts_list="concepts_list.json" \
--revision="fp16" \
--train_text_encoder \
--seed=1337 \
--resolution=512 \
--train_batch_size=4 \
--mixed_precision="fp16" \
--gradient_accumulation_steps=1 \
--learning_rate=1e-6 \
--lr_warmup_steps=0 \
--num_class_images=50 \
--save_interval=500 \
--max_train_steps=3000
'''

### Quick inference for sanity check using diffusers pipeline

In [None]:
# #Quick inference for sanity check
# from diffusers import StableDiffusionPipeline
# import torch

# model_id = "/path/to/save/checkpoints"
# pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

# prompt = "cholect45"
# image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]

# image

### If you want to upload the fine-tuned model to WebUI, convert diffusers format to original SD format and save it to proper WebUI models folder

In [4]:
!python convert_diffusers_to_sd.py --model_path ./cholect_vid52_56_v2_ckpts/2000 \
    --checkpoint_path ../stable-diffusion-webui/models/Stable-diffusion/cholect_vid52_56_v2_2000.safetensors --half --use_safetensors


Reshaping encoder.mid.attn_1.q.weight for SD format
Reshaping encoder.mid.attn_1.k.weight for SD format
Reshaping encoder.mid.attn_1.v.weight for SD format
Reshaping encoder.mid.attn_1.proj_out.weight for SD format
Reshaping decoder.mid.attn_1.q.weight for SD format
Reshaping decoder.mid.attn_1.k.weight for SD format
Reshaping decoder.mid.attn_1.v.weight for SD format
Reshaping decoder.mid.attn_1.proj_out.weight for SD format
