# Fine-Tuning StableDiffusion XL with DreamBooth

Over the past few years Generative AI models have popped up everywhere - from creating realistic responses to complex questions, to generating images and music to impress art critics around the globe. In this notebook we use the Hugging Face [Stable Diffusion XL (SDXL)](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) model to create images from text prompts. You'll see how to import the SDXL model and use it to generate an image. 

From there, you'll see how you can fine-tune the model using [DreamBooth](https://huggingface.co/docs/diffusers/training/dreambooth), a method for easily fine-tuning a text-to-image model. We'll use a small number of photos of [Toy Jensen](https://blogs.nvidia.com/blog/2022/12/22/toy-jensen-jingle-bells/) in this notebook to fine-tune SDXL. This will allow us to generate new images that include Toy Jensen! 

After that, you'll have the chance to fine-tune the model on your own images. Perhaps you want to create an image of you at the bottom of the ocean, or in outer space? By the end of this notebook you will be able to! 

**IMPORTANT:** This project will utilize additional third-party open source software. Review the license terms of these open source projects before use. Third party components used as part of this project are subject to their separate legal notices or terms that accompany the components. You are responsible for confirming compliance with third-party component license terms and requirements.

### Stable Diffusion XL Model

First, we import the classes and libraries we need to run the notebook.

In [None]:
import torch
from diffusers import StableDiffusionXLPipeline, DiffusionPipeline

Next, from the Hugging Face `diffusers` library, we create a `StableDiffusionXLPipeline` object from the SDXL base model. 

In [None]:
pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")

Let's use the SDXL model to generate an image. 

In [None]:
prompt = "toy jensen in space"
image = pipe(prompt=prompt).images[0]

image

Hmmm, looks like the Hugging Face SDXL model doesn't know about Toy Jensen! Imagine that! 

✅ Try using the SDXL model to generate some other images by editing the text in the first line of the cell above. 


## Fine-Tuning the model with DreamBooth

Fine-Tuning is used to train an existing Machine Learning Model, given new information. In our case, we want to teach the SDXL model about Toy Jensen. This will allow us to create the perfect image of Toy Jensen in Space!

[DreamBooth](https://arxiv.org/abs/2208.12242) provides a way to fine-tune a text-to-image model using only a few images. Let's use this to tune our SDXL Model so that it knows about Toy Jensen!

We have 8 photos of Toy Jensen in our dataset - let's take a look at one of them.

In [None]:
from IPython.display import Image

display(Image(filename='../data/toy-jensen/tj1.png'))

Now we can use Hugging Face and DreamBooth to fine-tune this model. To do this we create a config, then specify some flags like an instance prompt, a resolution and a number of training steps for the fine-tuning algorithm to run. 

In [None]:
from accelerate.utils import write_basic_config
write_basic_config()

If you have multiple GPU, we can take advantage of Distributed DataParallel (DDP) to speed up the fine-tuning process. You may read further of DDP at https://huggingface.co/docs/transformers/perf_train_gpu_many. 

In [None]:
!accelerate launch --num_processes=NUMBER_OF_GPUs --multi_gpu /workspace/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py \
  --pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0  \
  --instance_data_dir=/project/data/toy-jensen \
  --output_dir=/project/models/tuned-toy-jensen \
  --mixed_precision="fp16" \
  --instance_prompt="a photo of toy jensen" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --learning_rate=1e-4 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=100 \
  --seed="0" 

Now that the model is fine-tuned, let's tell our notebook where to find it.

In [None]:
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
pipe.load_lora_weights("/project/models/tuned-toy-jensen")

Finally, we can use our fine-tuned model to create an image with Toy Jensen in it. Let's give it a go! 

In [None]:
image = pipe("A picture of toy jensen in space", num_inference_steps=75).images[0]

image

Wow - look at him go! 

### Trying out some more examples


The SDXL model we are using was trained on historical data, and knows about everything from celebrities to famous buildings. However, it was trained on data up to a fixed point in time and isn't up to date with things and people who have become famous in the last few months.

For example, King Charles III became king of the United Kingdom in September 2022. Let's ask our SDXL Model for an image of King Charles in Space:

In [None]:
prompt = "King Charles in space"
image = pipe(prompt=prompt).images[0]

image

Did it give you an image of a King Charles spaniel? Or maybe King Charles II? That's not what we were hoping for! 

1. Let's gather some (10ish) images of King Charles III from your favourite search engine. Copy those images into the `data/charles-3/` folder. You can download then to your machine and move them to this folder. 

    **Reminder:** Third party components used as part of this project are subject to their separate legal notices or terms that accompany the components; you are responsible for reviewing and confirming compliance with third-party component license terms and requirements.
2. Run the code below to fine-tune the model on your images of King Charles. 

In [None]:
# Remove the .gitkeep file in the 'charles-3' folder.
!rm ../data/charles-3/.gitkeep

In [None]:
!accelerate launch /workspace/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py \
  --pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0  \
  --instance_data_dir=/project/data/charles-3 \
  --output_dir=/project/models/tuned-charles-3 \
  --mixed_precision="fp16" \
  --instance_prompt="a photo of King Charles" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --learning_rate=1e-4 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=100 \
  --seed="0" 

Now we load the model and use it to generate an image of King Charles. 

In [None]:
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
pipe.load_lora_weights("/project/models/tuned-charles-3")

In [None]:
image = pipe("A picture of King Charles in space", num_inference_steps=75).images[0]

image

How is the model performing? Do you need to train it on a few more images? If so, add some more images to the folder then run the cells above to retrain. 

Now, the model knows what King Charles III looks like and is able to generate realistic images.


## Fine-tuning the Model on your own data

✅ Why not try out training the SDXL model on your own set of images? Follow the steps below to get set up to train your own model. 

**Reminder:** Third party components used as part of this project are subject to their separate legal notices or terms that accompany the components; you are responsible for reviewing and confirming compliance with third-party component license terms and requirements.


1. You'll need to find around 10 different pictures of your chosen item. Why not find some of your pet or your car? 

2. Save those images into the `data/my-data` folder we have created for you, similarly to as you have done with the input images of King Charles III.

3. Edit the 'instance_prompt' line the code below so that it reflects your item. For example, you could change it to 
```--instance_prompt="a photo of my cat alice"```

4. Once you've updated the prompt, run the cells below to train the model on your data. 


In [None]:
# Remove the .gitkeep file in the 'my-data' folder.
!rm ../data/my-data/.gitkeep

In [None]:
!accelerate launch /workspace/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py \
  --pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0  \
  --instance_data_dir=/project/data/my-data \
  --output_dir=/project/models/tuned-my-data \
  --mixed_precision="fp16" \
  --instance_prompt="a photo of [CHANGE THIS]" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --learning_rate=1e-4 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=100 

Now that your model has been trained we can load it:

In [None]:
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
pipe.load_lora_weights("/project/models/tuned-my-data")

And finally, use the code below to generate images. Change the prompt to something which includes your item. For example:

`image = pipe("A picture of my cat alice in space)`. 

In [None]:
image = pipe("A picture of [CHANGE THIS] in space", num_inference_steps=75).images[0]

image