# Dreambooth
### Notebook implementation by Joe Penna, David Bielejeski
### Adapted for those following the [Corridor Digital Dreambooth Tutorial](https://www.corridordigital.com/video/2551) from Feb 2023 and have found that the repo has changed since then.

### ***Prerequisites*** : Your training and regularization image sets are stored in zip files in your google drive. 

More information on:
https://github.com/JoePenna/Dreambooth-Stable-Diffusion

## 1. Build Environment
This might take a few minutes...

In [None]:
# BUILD ENV
!pip install numpy==1.23.1
!pip install pytorch-lightning==1.7.6
!pip install csv-logger
!pip install torchmetrics==0.11.1
!pip install torch-fidelity==0.3.0
!pip install albumentations==1.1.0
!pip install opencv-python==4.7.0.72
!pip install pudb==2019.2
!pip install omegaconf==2.1.1
!pip install pillow==9.4.0
!pip install einops==0.4.1
!pip install transformers==4.25.1
!pip install kornia==0.6.7
!pip install diffusers[training]==0.3.0
!pip install captionizer==1.0.1
!pip install -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
!pip install -e git+https://github.com/openai/CLIP.git@main#egg=clip
!pip install -e .
!pip install huggingface_hub
!pip install gdown
!pip install gitpython


## 2. Download the 1.5 model from Hugging Face
This might take a few minutes...<br/>
You can also provide your own v1.* model for training by uploading it and renaming it to "model.ckpt".  It should be in the same directory as dreambooth_runpod_joepenna.ipynb

In [None]:
# Download the 1.5 sd model
from IPython.display import clear_output
from huggingface_hub import hf_hub_download

downloaded_model_path = hf_hub_download(
 repo_id="panopstor/EveryDream",
 filename="sd_v1-5_vae.ckpt"
)

# Move the sd_v1-5_vae.ckpt to the root of this directory as "model.ckpt"
actual_locations_of_model_blob = !readlink -f {downloaded_model_path}
!mv {actual_locations_of_model_blob[-1]} model.ckpt
clear_output()
print("✅ model.ckpt successfully downloaded")

## 3. Create Training Image Folders

In [None]:
import os

# The following folder names reflect what was used in the C.D. tutorial, 
# rename them if you want, according to your probject's preferences.
training_images_root = "trainingImages" # <-- change this (optional)

subject_token_word = "nikopueringer" # <-- change this!
subject_class_word = "man" # <-- change this!

style_token_word = "vmphntd" # <-- change this!
style_class_word = "aesthetic" # <-- change this!

subject_path = f'{training_images_root}/{subject_token_word}/{subject_class_word}'
style_path = f'{training_images_root}/{style_token_word}/{style_class_word}'

# create folders 
if os.path.exists(subject_path) == False:
  os.makedirs(subject_path)
else:
  print(f'{subject_path} already exists.')

if os.path.exists(style_path) == False:
  os.makedirs(style_path)
else:
  print(f'{style_path} already exists.')



## 4. Download Training Images
##### If there aren't many, you can probably drag & drop the images into the folders manually via the file manager to the left. Otherwise you can download the zip file(s) and extract them to the relevent folders.  Make sure share access to the file is set to "anyone with the link". A GDrive share link typically looks like : https://drive.google.com/file/d/1FeHTdwDXcxoW3Nv7486FsbIm679ek5zI/view?usp=sharing
##### You just need the File ID part, in this case _**1FeHTdwDXcxoW3Nv7486FsbIm679ek5zI**_
##### Also to make life easier, make sure your zip files have their images in their root and not buried under a nest of folders. 

In [None]:


# set file_name and file_id for your training image zip files
# Note!: file_names do not need to match the actual filename themselves, 
# they are only used for convenience (so that the name is known when extracting).

file_name_subject = 'training_images_subject.zip'   # <-- change this (optional)
file_id_subject = '1YGEgEPD-9oKf9830VHZgcbmpolI5AoqP' # <-- change this!

file_name_style = 'training_images_style.zip' # <-- change this!
file_id_style = '1b_DJ8Y-F_et7It1T57QO2Hf6pQ6zFE-6' # <-- change this!

# ====================================================
# download them
!gdown $file_id_subject -O $file_name_subject
!gdown $file_id_style -O $file_name_style

# ====================================================
# now extract them into the correct locations
import zipfile as z

# extract subject training images
zf = z.ZipFile(f'{file_name_subject}','r')
zf.extractall(f'{subject_path}')
zf.close()

# extract style training images
zf = z.ZipFile(f'{file_name_style}','r')
zf.extractall(f'{style_path}')
zf.close()

# optional : delete zip files after
# os.remove(f'{file_name_subject}')
# os.remove(f'{file_name_style}')

# ====================================================
# remove any non-image files & warn if any additional folders exist
import os
import shutil
from glob import glob
folder_path = f'{subject_path}'

# Get a list of all files in the folder
files = glob(folder_path + '/*', recursive=False)

# Iterate over the files and delete the ones that are not JPG or PNG
for file_path in files:
    if not (file_path.endswith('.jpg') or file_path.endswith('.png')):
        if os.path.isfile(file_path):
            os.remove(file_path)
        elif os.path.isdir(file_path):
            print(f'\033[91m folder {file_path} was found in style training images folder.  Check and remove it.\033[0m')

# force remove hidden .ipynb_checkpoints folder in images folder. 
if os.path.exists(f'{folder_path}/.ipynb_checkpoints'):
    shutil.rmtree(f'{folder_path}/.ipynb_checkpoints')

# =========================================================
# now do the same for the style training images folder
folder_path = f'{style_path}'
files = glob(folder_path + '/*', recursive=False)

# Iterate over the files and delete the ones that are not JPG or PNG
for file_path in files:
    if not (file_path.endswith('.jpg') or file_path.endswith('.png')):
        if os.path.isfile(file_path):
            os.remove(file_path)
        elif os.path.isdir(file_path):
            print(f'\033[91m folder {file_path} was found in style training images folder.  Check and remove it.\033[0m')

# force remove hidden .ipynb_checkpoints folder in images folder. 
if os.path.exists(f'{folder_path}/.ipynb_checkpoints'):
    shutil.rmtree(f'{folder_path}/.ipynb_checkpoints')


## 5. Create Regularization Image Folders

In [None]:
import os

# The following folder names reflect what was used in the C.D. tutorial, 
# rename them if you want, according to your probject's preferences.
reg_images_root = "regImages"

# subject_class_word and style_class_word used below were already defined when creating the training image folders
# will be reused here. 
reg_subject_path = f'{reg_images_root}/{subject_class_word}'
reg_style_path = f'{reg_images_root}/{style_class_word}'

# create folders 
if os.path.exists(reg_subject_path) == False:
  os.makedirs(reg_subject_path)
else:
  print(f'{reg_subject_path} already exists.')

if os.path.exists(reg_style_path) == False:
  os.makedirs(reg_style_path)
else:
  print(f'{reg_style_path} already exists.')

## 6. Download Regularization Images

In [None]:

# set file_name and file_id for your regularization image zip files
# Note!: file_names do not need to match the actual filename themselves, 
# they are only used for convenience (so that the name is known when extracting).

file_name_subject_class = 'reg_man.zip'  # <-- change this (optional)
file_id_subject_class = '1b_JT1yCrsw3DrLvnHqrRaywEU53jA-Tj'  # <-- change this!

file_name_style_class = 'reg_aesthetic.zip'  # <-- change this (optional)
file_id_style_class = '1h1EogKPXU8NIue00VZDRscqzqWI1D43M'  # <-- change this!

# ====================================================
# download them
!gdown $file_id_subject_class -O $file_name_subject_class
!gdown $file_id_style_class -O $file_name_style_class

# ====================================================
# now extract them into the correct locations
import zipfile as z

# extract subject reg images
zf = z.ZipFile(f'{file_name_subject_class}','r')
zf.extractall(f'{reg_subject_path}')
zf.close()

# extract style reg images
zf = z.ZipFile(f'{file_name_style_class}','r')
zf.extractall(f'{reg_style_path}')
zf.close()

# optional : delete zip files after
# os.remove(f'{file_name_subject_class}')
# os.remove(f'{file_name_style_class}')

# ====================================================
# delete any non-image files & warn if any additional folders
import os
from glob import glob
folder_path = f'{reg_subject_path}'

# Get a list of all files in the folder
files = glob(folder_path + '/*', recursive=False)

# Iterate over the files and delete the ones that are not JPG or PNG
for file_path in files:
    if not (file_path.endswith('.jpg') or file_path.endswith('.png')):
        if os.path.isfile(file_path):
            os.remove(file_path)
        elif os.path.isdir(file_path):
            print(f'\033[91m folder {file_path} was found in style reg images folder.  Check and remove it.\033[0m')

# force remove hidden .ipynb_checkpoints folder in images folder. 
if os.path.exists(f'{folder_path}/.ipynb_checkpoints'):
    shutil.rmtree(f'{folder_path}/.ipynb_checkpoints')

# ===============================================
# now do the same for the style training images folder
folder_path = f'{reg_style_path}'
files = glob(folder_path + '/*', recursive=False)

# Iterate over the files and delete the ones that are not JPG or PNG
for file_path in files:
    if not (file_path.endswith('.jpg') or file_path.endswith('.png')):
        if os.path.isfile(file_path):
            os.remove(file_path)
        elif os.path.isdir(file_path):
            print(f'\033[91m folder {file_path} was found in style reg images folder.  Check and remove it.\033[0m')

# force remove hidden .ipynb_checkpoints folder in images folder. 
if os.path.exists(f'{folder_path}/.ipynb_checkpoints'):
    shutil.rmtree(f'{folder_path}/.ipynb_checkpoints')


## 7. Setting Training Image Repeats

#### In the Corridor video, the training **repeats** value was changed in the `v1-finetune_unfrozen.yaml` file. This is now moved to the file : `dreambooth_helpers/dreambooth_trainer_configurations.py`
#### On Line 192 in the file you will find the relevant repeats value. If you prefer this to be done for you, run the next cell, otherwise skip it and move to Cell 8. 


In [None]:
repeats_val = 50

filename = "dreambooth_helpers/dreambooth_trainer_configurations.py"
file = open(filename)
lines = file.readlines()

repeats_line = lines[192-1]

print(repeats_line)

pattern = '"repeats":'.rjust(30) # note: The argument to rjust needs to be the length of the final string + padding, not amount of padding
print(pattern)
if repeats_line.startswith(pattern):    
    # replace the line
    new_line = ''
    if repeats_val >0 and repeats_val <=9:
        new_line = f'"repeats": {repeats_val},\n'.rjust(34)
    elif repeats_val >=10 and repeats_val <=99:
        new_line = f'"repeats": {repeats_val},\n'.rjust(35)
    elif repeats_val >=100 and repeats_val <=999:
        new_line = f'"repeats": {repeats_val},\n'.rjust(36)

    lines[192-1] = new_line

    file = open(filename, "w")

    for line in lines:
        file.write(line)

    file.close()

## 8. Training


In [None]:
import time
from datetime import timedelta


# Token and class will be ignored because the folder structure used for the training & regularization images will be used for token and class
# However the token paramater is still required when launching main.py. 
token_word="xxx"  
class_word="yyy"
max_steps=3000  # <-- change this!
save_every_x_steps=0  # <-- change this (optional)
model_path="model.ckpt"
train_img_path={training_images_root}
reg_img_path={reg_images_root}
proj_name="myProject"  # <-- change this (optional)

start = time.time()

# Start Training
!python "main.py" \
--project_name "{proj_name}" \
--token "{token_word}" \
--max_training_steps {max_steps} \
--save_every_x_steps {save_every_x_steps} \
--regularization_images "{reg_img_path}" \
--training_images "{train_img_path}" \
--training_model "{model_path}" \
--flip_p 0 \
--learning_rate 1.0e-06


# ==================================================
# Show Training Time
# ==================================================
end = time.time()
print(f'Elapsed Time: {timedelta(seconds=end-start)}')


# Big Important Note!

The way to use your token is `<token> <class>` ie `joepenna person` and not just `joepenna`

## Generate Images With Your Trained Model!

In [None]:
!python scripts/stable_txt2img.py \
 --ddim_eta 0.0 \
 --n_samples 1 \
 --n_iter 4 \
 --scale 7.0 \
 --ddim_steps 50 \
 --ckpt "/workspace/Dreambooth-Stable-Diffusion/trained_models/{file_name}" \
 --prompt "joepenna person as a masterpiece portrait painting by John Singer Sargent in the style of Rembrandt"