# Fine-Tuning the Text to Image Model


### Check GPU

**It's recommended that you shut down any other notebook kernels.**

This fine tuning process uses a lot of video memory.   Here, we'll check on how much we have available.

In [None]:
!nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader

## Install Requirements

In [None]:
!pip install pandas
!pip install torch
!pip install torchvision
!pip install transformers==4.39.3
!pip install accelerate==0.28.0
!pip install flash-attn
!pip install xformers==0.0.25.post1
!pip install bitsandbytes==0.43.0
!pip install ftfy==6.2.0
!pip install git+https://github.com/huggingface/diffusers

In [None]:
!pip list | grep -E "boto|grpcio|pandas|torch|torchvision|diffusers|transformers|accelerate|flash-attn|ftfy|xformers|protobuf"

## Settings

Here we set up all the options for training.  Most are environment variables which will allow us to override values from pipelines and run this notebook with different setting such as the base model or number of training steps and learning rate.

In [None]:
import os
from datetime import datetime

WORKING_DIR = os.environ.get("working_dir", f"/opt/app-root/src/pipelines-pvc/")
MODEL_NAME = os.environ.get("model_name", "runwayml/stable-diffusion-v1-5")
OUTPUT_DIR = os.path.join(os.getcwd(), f"{WORKING_DIR}/stable_diffusion_weights/dreambooth")
DATA_DIR = os.path.join(os.getcwd(), f"{WORKING_DIR}/data")
INSTANCE_DATA_URL = os.environ.get("instance_data_url", "https://rhods-public.s3.amazonaws.com/sample-data/images/redhat-dog.tar.gz")
INSTANCE_DIR = os.path.join(DATA_DIR, "instance_dir")
CLASS_DIR = os.path.join(DATA_DIR, "class_dir")
INSTANCE_PROMPT = os.environ.get("instance_prompt", "photo of a rhteddy dog")
CLASS_PROMPT = os.environ.get("class_prompt", "a photo of dog")

NUM_CLASS_IMAGES = int(os.environ.get("num_class_images", "100"))
MAX_TRAIN_STEPS = int(os.environ.get("max_train_steps", "800"))

ONNX_OUTPUT_DIR = os.path.join(os.getcwd(), f"{WORKING_DIR}/stable_diffusion_weights/onnx-redhat-dog")

## Training

### Set up the Training Job

In [None]:
!rm -rf $OUTPUT_DIR
!rm -rf $CLASS_DIR

In [None]:
os.makedirs(OUTPUT_DIR, exist_ok=True)
os.makedirs(CLASS_DIR, exist_ok=True)

print(f"Weights will be saved at {OUTPUT_DIR}")
print(f"It will be based on the model {MODEL_NAME}")
print(f"Training data located in downloaded from {INSTANCE_DATA_URL}")
print(f"We're going to train the difference between \"{INSTANCE_PROMPT}\" and \"{CLASS_PROMPT}\"")

In [None]:
!accelerate config default

### Start Training

Here we kick off the training job with our chosen settings.  This will take about 15 minutes depending on settings and hardware.

In [None]:
!echo "MODEL_NAME=$MODEL_NAME"
!echo "OUTPUT_DIR=$OUTPUT_DIR"
!echo "DATA_DIR=$DATA_DIR"
!echo "INSTANCE_DIR=$INSTANCE_DIR"
!echo "CLASS_DATA_URL=$CLASS_DATA_URL"
!echo "CLASS_DIR=$CLASS_DIR"
!echo "INSTANCE_PROMPT=$INSTANCE_PROMPT"
!echo "CLASS_PROMPT=$CLASS_PROMPT"
!echo "NUM_CLASS_IMAGES=$NUM_CLASS_IMAGES"
!echo "MAX_TRAIN_STEPS=$MAX_TRAIN_STEPS"

In [None]:
!accelerate launch train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --class_data_dir=$CLASS_DIR \
  --output_dir=$OUTPUT_DIR \
  --with_prior_preservation \
  --prior_loss_weight=1.0 \
  --instance_prompt="$INSTANCE_PROMPT" \
  --class_prompt="$CLASS_PROMPT" \
  --gradient_checkpointing \
  --gradient_accumulation_steps=2 \
  --num_class_images=$NUM_CLASS_IMAGES \
  --max_train_steps=$MAX_TRAIN_STEPS \
  --mixed_precision="bf16" \
  --enable_xformers_memory_efficient_attention 

In [None]:
!ls $OUTPUT_DIR

# Export to ONNX

In order to to serve the model as an API, we need to use a format the model server understands.  The [Open Neural Network Exchange (ONNX)](https://onnx.ai/) is an open format built to represent machine learning models that enables AI developers to use models with a variety of frameworks, tools, runtimes, and compilers.  Next we'll convert the model to ONNX using the process the Huggingface diffusers library recommends in its [documentation](https://huggingface.co/docs/diffusers/v0.20.0/en/optimization/onnx).


### Install Dependencies

A new key dependency will be the [optimum](https://github.com/huggingface/optimum) library.

In [None]:
!pip install optimum["onnxruntime"]

By loading the model into an `ORTStableDiffusionPipeline` and `save_pretrained`, we will end up with 4 separate ONNX models, `text_encoder`, `unet`, `vae_decoder`, and `vae_encoder`.

Also, let's see what a generated image of `rhteddy` looks like.  Before, the model had no idea, but now, it should generate a picture of Teddy.

In [None]:
from optimum.onnxruntime import ORTStableDiffusionPipeline
import torch

model_id = OUTPUT_DIR
pipeline = ORTStableDiffusionPipeline.from_pretrained(model_id, export=True)

device = "cpu"
print(device)
pipeline.to(device)
prompt = "photo of a rhteddy dog"
image = pipeline(prompt).images[0]
pipeline.save_pretrained(ONNX_OUTPUT_DIR)
image