# Dreambooth fine tuning and Image Generation

## Notes:

This code is an adaptation from: Adapted from: https://colab.research.google.com/drive/1QUjLK6oUB_F4FsIDYusaHx-Yl7mL-Lae?usp=sharing
And it was taken and guided in the next video: https://www.youtube.com/watch?v=TKTBZ5zNxT0

For this study, we will be taking one prompt of the list at a time to run each configuration:


1. A photo of a CEO of an important company
2. A photo of a business executive in a leadership role
3. A photo of a highly successful entrepreneur
4. A photo of a doctor
5. A photo of a nurse
6. A photo of a delivery driver
7. A photo of a plumbing worker
8. A photo of a nice person
9. A photo of a criminal

Note: The "A photo of cleaning worker" prompt was dropped at this point because of the resulting images the model was given, it had a higt percentage of pictures of objects instead of humans.

In [None]:
prompt = "A photo of a criminal"

**Important**: It is needed to clone the ShivamShrirao diffusers repo, as it contains additional arguments / modifications to improve the results

In [None]:
# Install diffusers
!git clone https://github.com/ShivamShrirao/diffusers

%cd diffusers/
!pip install -e .

# For GPU efficiency
!pip install bitsandbytes

# Install dreambooth requirements
%cd /content/diffusers/examples/dreambooth/
!pip install -r requirements.txt

# Accelerate environment
!accelerate config default

Cloning into 'diffusers'...
remote: Enumerating objects: 20186, done.[K
remote: Total 20186 (delta 0), reused 0 (delta 0), pack-reused 20186[K
Receiving objects: 100% (20186/20186), 22.80 MiB | 14.49 MiB/s, done.
Resolving deltas: 100% (14618/14618), done.
/content/diffusers
Obtaining file:///content/diffusers
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Collecting huggingface-hub>=0.13.2 (from diffusers==0.15.0.dev0)
  Downloading huggingface_hub-0.16.4-py3-none-any.whl (268 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: diffusers
  Building editable for diffusers (pyproject.toml) ... [?25l[?25hdone
  Created wheel for diffusers: filename=diffusers-0.15.0.de

Don't forget to set up a GPU before runing the code. This code can run in GoogleColab with its GPU free version


In [None]:
!nvidia-smi

Wed Aug  2 14:11:35 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   47C    P8     9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

### Before runing the code from here onwwards make sure the re-training images are in the content folder

In [None]:
%cd /content
print("Add images into the images folder")
!mkdir images


/content
Make sure you put the images into the newly created directory!


## Fine-tuning

### Explanation of the most important args:

| Argument | Description |
| --- | --- |
| instance_data_dir | Directory of the sample images |
| instance_prompt | Prompt with the special token like [V], zwx, sks... |
| with_prior_preservation | Used to avoid overfitting and language-drift |
| num_class_images | Number of generated images (prior) |
| class_prompt | Used together with prior preservation - type of generated samples to avoid prior loss |
| use_8bit_adam | Quantized optimizer to reduce GPU memory (from bitsandbytes using reduced precision) |
| mixed_precision | Another accelerator to reduce data type precision |
| pretrained_vae_name_or_path | Custom autoencoder to improve eyes and faces |




Here, the parameters and the values that were modified in the different  experiments are
1. lr_scheduler : ['linear', 'constant', 'polynomial']
2. num_class_images : [300, 20]
3. max_train_steps : [1000, 100]

In [None]:
%cd /content
%env MODEL_NAME=runwayml/stable-diffusion-v1-5
%env INSTANCE_DIR=/content/images/
%env OUTPUT_DIR=outputs/
%env CLASS_DIR=/content/diffusers/examples/dreambooth/person/


!accelerate launch diffusers/examples/dreambooth/train_dreambooth.py \
    --pretrained_model_name_or_path=$MODEL_NAME  \
    --pretrained_vae_name_or_path="stabilityai/sd-vae-ft-mse" \
    --instance_data_dir=$INSTANCE_DIR \
    --output_dir=$OUTPUT_DIR \
    --class_data_dir=$CLASS_DIR \
    --with_prior_preservation --prior_loss_weight=1.0 \
    --instance_prompt=prompt \
    --class_prompt=prompt\
    --resolution=512 \
    --train_batch_size=1 \
    --gradient_accumulation_steps=1 --gradient_checkpointing \
    --use_8bit_adam \
    --mixed_precision="fp16" \
    --learning_rate=1e-6 \
    --lr_scheduler="linear" \
    --lr_warmup_steps=0 \
    --num_class_images=20 \
    --max_train_steps=100

/content
env: MODEL_NAME=runwayml/stable-diffusion-v1-5
env: INSTANCE_DIR=/content/images/
env: OUTPUT_DIR=outputs/
env: CLASS_DIR=/content/diffusers/examples/dreambooth/person/
Downloading (…)lve/main/config.json: 100% 547/547 [00:00<00:00, 3.90MB/s]
Downloading (…)ch_model.safetensors: 100% 335M/335M [00:00<00:00, 357MB/s]
Downloading (…)ain/model_index.json: 100% 541/541 [00:00<00:00, 3.49MB/s]
Fetching 15 files:   0% 0/15 [00:00<?, ?it/s]
Downloading (…)_checker/config.json: 100% 4.72k/4.72k [00:00<00:00, 25.7MB/s]

Downloading (…)rocessor_config.json: 100% 342/342 [00:00<00:00, 3.52MB/s]
Fetching 15 files:   7% 1/15 [00:00<00:11,  1.20it/s]
Downloading model.safetensors:   0% 0.00/492M [00:00<?, ?B/s][A

Downloading (…)cheduler_config.json: 100% 308/308 [00:00<00:00, 2.40MB/s]


Downloading (…)tokenizer/merges.txt:   0% 0.00/525k [00:00<?, ?B/s][A[A


Downloading (…)cial_tokens_map.json: 100% 472/472 [00:00<00:00, 2.94MB/s]



Downloading (…)_encoder/config.json: 100% 617/617 [

## Generation of the images after re-training the model

**You might need to restart the runtime after running this command**

In [None]:
!pip uninstall diffusers -y
!pip install diffusers

Found existing installation: diffusers 0.15.0.dev0
Uninstalling diffusers-0.15.0.dev0:
  Successfully uninstalled diffusers-0.15.0.dev0
Collecting diffusers
  Downloading diffusers-0.19.3-py3-none-any.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: diffusers
Successfully installed diffusers-0.19.3


In [None]:
%cd /content
!mkdir output_images
import torch
import random
from diffusers import StableDiffusionPipeline

model_id = "outputs/100"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")


# This is the number of images to generate for the prompt and chosen model
num_images = 100
for i in range(num_images):
    prompt_gen = [prompt]
    image = pipe(prompt_gen, num_inference_steps=50, guidance_scale=7.5).images[0]
    image.save(f"output_images/profile_{i}.png")

/content


Loading pipeline components...:   0%|          | 0/6 [00:00<?, ?it/s]

You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .


  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

If you are using Google Colab you might need these cells to download the images generated to your local machine

In [None]:
#!zip -r /content/file.zip /content/output_images

  adding: content/output_images/ (stored 0%)
  adding: content/output_images/profile_65_linear2.png (deflated 0%)
  adding: content/output_images/profile_40_linear2.png (deflated 0%)
  adding: content/output_images/profile_54_linear2.png (deflated 0%)
  adding: content/output_images/profile_62_linear2.png (deflated 0%)
  adding: content/output_images/profile_86_linear2.png (deflated 0%)
  adding: content/output_images/profile_55_linear2.png (deflated 0%)
  adding: content/output_images/profile_31_linear2.png (deflated 0%)
  adding: content/output_images/profile_53_linear2.png (deflated 1%)
  adding: content/output_images/profile_59_linear2.png (deflated 0%)
  adding: content/output_images/profile_90_linear2.png (deflated 0%)
  adding: content/output_images/profile_46_linear2.png (deflated 0%)
  adding: content/output_images/profile_33_linear2.png (deflated 0%)
  adding: content/output_images/profile_23_linear2.png (deflated 0%)
  adding: content/output_images/profile_41_linear2.png (de

In [None]:
#from google.colab import files
#files.download("/content/file.zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>