# 5.1. Fine-Tuning
## Dreambooth + LoRa (Campusbier Fine-Tuning)

This notebook is based on: 

https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/SDXL_DreamBooth_LoRA_.ipynb#scrollTo=XUxRrLfLMnBb

All fine-tuned models can be found here:

https://huggingface.co/erikhsos


# Evaluation

In order to obtain the optimum fine-tuning results,


1. **Number of training images**

[1, 3, 5, 7, 10, 15, 20, 30]

and then the

2. **Hyperparameter `Learning rate` and `Training steps`**

$5 \times 10^{-4}$, 
$1 \times 10^{-4}$, 
$5 \times 10^{-5}$

and

[500, 1000, 1500, 2000]


were evaluated. 

For each training, **50 output images** were generated, which were rated as “achieved” and “not achieved” in the following categories:

1. **brand coherence**
2. **target design**
3. **visual aesthetics**

---

**Find the evaluated image-grids under:**

anhang/anhang 01_number training images_grids

anhang/anhang 02_learning rate + training steps_grids

## Training 

The **Campusbier-Training** was done with the `instance prompt` consisting of the `unique identifier` and the `subject class`

**`"A [CB] bottle photo"`**

The **Nesquik-Training** was done with

**`"A TOK cereal box photo"`**

-----
     

In [None]:
%env HF_HOME=/cluster/user/ehoemmen/.cache
%env HF_DATASETS_CACHE=/cluster/user/ehoemmen/.cache

In [None]:
# Install dependencies
!pip install -U diffusers bitsandbytes transformers accelerate peft compel scipy torch -q

In [None]:
# Download Diffusers Dreambooth x LoRa script
!wget https://raw.githubusercontent.com/huggingface/diffusers/main/examples/dreambooth/train_dreambooth_lora_sdxl.py

In [None]:
#prepare grid

from PIL import Image

def image_grid(imgs, rows, cols, resize=256):

    if resize is not None:
        imgs = [img.resize((resize, resize)) for img in imgs]
    w, h = imgs[0].size
    grid = Image.new("RGB", size=(cols * w, rows * h))
    grid_w, grid_h = grid.size

    for i, img in enumerate(imgs):
        grid.paste(img, box=(i % cols * w, i // cols * h))
    return grid

 ### Generate custom captions with BLIP

 Load BLIP to auto caption your images.

In [None]:
import requests
from transformers import AutoProcessor, BlipForConditionalGeneration
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

# load the processor and the captioning model
blip_processor = AutoProcessor.from_pretrained("Salesforce/blip-image-captioning-base",cache_dir="/cluster/user/ehoemmen/.cache")
blip_model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base",cache_dir="/cluster/user/ehoemmen/.cache",torch_dtype=torch.float16).to(device)

# captioning utility
def caption_images(input_image):
    inputs = blip_processor(images=input_image, return_tensors="pt").to(device, torch.float16)
    pixel_values = inputs.pixel_values

    generated_ids = blip_model.generate(pixel_values=pixel_values, max_length=50)
    generated_caption = blip_processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
    return generated_caption

In [None]:
import glob
from PIL import Image

# create a list of (Pil.Image, path) pairs
local_dir = ""   #enter path
imgs_and_paths = [(path,Image.open(path)) for path in glob.glob(f"{local_dir}*.png")]

Now let's add the concept `token identifier`

I chose this combination of `identifier` and `class`:

"[CB] bottle photo ", because "Campusbier" contains the word "beer" and I want to avoid this having an influence on the training. I also only use "bottle". The reason is that I tested the class "beer bottle ", which resulted in beer cans being generated frequently.

In [None]:
import json

caption_prefix = "TOK cereal box photo" #@param
with open(f'{local_dir}metadata.jsonl', 'w') as outfile:
  for img in imgs_and_paths:
      caption = caption_prefix + caption_images(img[1]).split("\n")[0]
      entry = {"file_name":img[0].split("/")[-1], "prompt": caption}
      json.dump(entry, outfile)
      outfile.write('\n')

In [None]:
#free up some memory

import gc

# delete the BLIP pipelines and free up some memory
del blip_processor, blip_model
gc.collect()
torch.cuda.empty_cache()

# Preprare for Training

Initialize `accelerate`:

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

!accelerate config default

### Log into your Hugging Face account
Pass [your **write** access token](https://huggingface.co/settings/tokens) so that we can push the trained checkpoints to the Hugging Face Hub:

In [None]:
from huggingface_hub import notebook_login
notebook_login()

# Training

Training takes about 1 - 2 hours (depends on the number of `training steps`)

#### Set Hyperparameters
Für die **Evaluation der Trainingsbilder** wurde folgende Trainingsparamter-Konfiguration verwendet:

* `unique identifier`  [CB]
* `subject class`   bottle photo
* `instance prompt`   A [CB] bottle photo
* `learning rate`   0,0004
* `training steps`   500
* `number training images`   [1, 3, 5, 7, 10, 15, 20, 30]



Training will be logged to **weights and biases** and pushed to **huggingface**

In [None]:
#!/usr/bin/env bash
!accelerate launch train_dreambooth_lora_sdxl.py \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" \
  --pretrained_vae_model_name_or_path="madebyollin/sdxl-vae-fp16-fix" \
  --dataset_name="" \  #enter path
  --output_dir="" \   #enter path
  --caption_column="prompt"\
  --mixed_precision="fp16" \
  --instance_prompt="[CB] bottle photo" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=3 \
  --gradient_checkpointing \
  --learning_rate=1e-4 \
  --snr_gamma=5.0 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --report_to="wandb" \
  --mixed_precision="fp16" \
  --use_8bit_adam \
  --max_train_steps=1000 \
  --validation_prompt="A pink [CB] bottle photo" \
  --validation_epochs=50 \
  --sample_batch_size=4 \
  --train_text_encoder \
  --seed="0" \
  --push_to_hub

# Inference

Setup the Pipeline and load your own trained LoRa Weights

In [None]:
import torch
from diffusers import DiffusionPipeline, AutoencoderKL, AutoPipelineForImage2Image

vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache"
)
pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    vae=vae,
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
    cache_dir="/cluster/user/ehoemmen/.cache"
)

pipe.load_lora_weights("erikhsos/cbbier_06-15-images_LoRA_lr5-4_1500")

pipe.enable_sequential_cpu_offload()

In [None]:
#unload LoRa weights

pipe.unload_lora_weights()

## Create 50 experiment images

Change `lora_checkpoint`for specific generation

In [None]:
import torch
from diffusers import DiffusionPipeline, AutoencoderKL, AutoPipelineForImage2Image
import os
from PIL import Image
import re

# Load the VAE
vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix", 
    torch_dtype=torch.float16, 
    cache_dir="/cluster/user/ehoemmen/.cache"
)

# Define the base pipeline
pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    vae=vae,
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
    cache_dir="/cluster/user/ehoemmen/.cache"
)

pipe.enable_sequential_cpu_offload()

# List of LoRA checkpoints (number training images)
lora_checkpoints = [
    "cbbier_01-1-image_LoRA_lr1-4_500",
    # "cbbier_02-3-images_LoRA_lr1-4_500",
    # "cbbier_03-5-images_LoRA_lr1-4_500",
    # "cbbier_04-7-images_LoRA_lr1-4_500",
    # "cbbier_05-10-images_LoRA_lr1-4_500",
    # "cbbier_06-15-images_LoRA_lr1-4_500",
    # "cbbier_07-20-images_LoRA_lr1-4_500",
    # "cbbier_08-30-images_LoRa_lr1-4_500"
]

# List of LoRA checkpoints (learning rate + training steps)
lora_checkpoints = [
    # "erikhsos/cbbier_06-15-images_LoRA_lr1-4_500",
    # "erikhsos/cbbier_06-15-images_LoRA_lr1-4_1000",
    # "erikhsos/cbbier_06-15-images_LoRA_lr1-4_1500",
    # "erikhsos/cbbier_06-15-images_LoRA_lr1-4_2000",
    # "erikhsos/cbbier_06-15-images_LoRA_lr5-4_500",
    # "erikhsos/cbbier_06-15-images_LoRA_lr5-4_1000",
    # "erikhsos/cbbier_06-15-images_LoRA_lr5-4_1500",
    # "erikhsos/cbbier_06-15-images_LoRA_lr5-4_2000",
    # "erikhsos/cbbier_06-15-images_LoRA_lr5-5_500",
    # "erikhsos/cbbier_06-15-images_LoRA_lr5-5_1000",
    # "erikhsos/cbbier_06-15-images_LoRA_lr5-5_1500",
    # "erikhsos/cbbier_06-15-images_LoRA_lr5-5_2000"
]

# Function to save image grid
def save_images(images, folder, base_name, start_index, num_digits):
    os.makedirs(folder, exist_ok=True)
    for i, img in enumerate(images):
        img_path = os.path.join(folder, f"{base_name}_{str(start_index + i).zfill(num_digits)}.png")
        img.save(img_path)

# Function to get the next image index
def get_next_image_index(folder, base_name):
    existing_images = [f for f in os.listdir(folder) if re.match(rf"{base_name}_\d+\.png", f)]
    if not existing_images:
        return 1, 1
    max_index = max([int(re.search(rf"{base_name}_(\d+)\.png", f).group(1)) for f in existing_images])
    num_digits = len(str(max_index))
    return max_index + 1, num_digits

# Generate images for each checkpoint
for checkpoint in lora_checkpoints:
    # Load LoRA weights
    pipe.load_lora_weights(checkpoint)

    # Define the prompt and number of images
    num_images = 50
    
    prompt = ["a [CB] bottle photo with a pink label and the text CAMPUSBIER"] * num_images
    negative_prompt = ""

    # Define the folder and base name
    folder = f"{checkpoint}_experiment_images"
    base_name = checkpoint

    # Get the starting index for image numbering and the number of digits
    start_index, num_digits = get_next_image_index(folder, base_name)

    # Generate images
    images = pipe(prompt=prompt,
                  negative_prompt=negative_prompt,
                  num_inference_steps=25, 
                  height=1024, width=1024,
                  guidance_scale=4,
                 ).images
    
    # Save images
    save_images(images, folder, base_name, start_index, num_digits)

    # Unload LoRA weights
    pipe.unload_lora_weights()

print("Image generation and saving completed.")

## Save Images as Grid (PDF)

In [None]:
import os
from PIL import Image

# Pfad zu deinem Bildordner
image_folder = "" #enter path to image folder
output_pdf = "nesquik_rot_dreambooth only_output_grid.pdf"

# Alle Bilddateien im Ordner sammeln
image_files = [os.path.join(image_folder, file) for file in os.listdir(image_folder) if file.endswith(('png', 'jpg', 'jpeg'))]

# Sicherstellen, dass du genau 50 Bilder hast
image_files = image_files[:50]

# Zielgröße für das Grid (z.B. 5x10)
grid_size = (10, 5)

# Lade das erste Bild, um die Bildgröße herauszufinden
img_sample = Image.open(image_files[0])
img_width, img_height = img_sample.size

# Berechne die Größe des Gesamtrasters
grid_width = grid_size[1] * img_width
grid_height = grid_size[0] * img_height

# Neues Bild für das Grid erstellen
grid_image = Image.new('RGB', (grid_width, grid_height))

# Bilder in das Grid einfügen
for idx, image_file in enumerate(image_files):
    img = Image.open(image_file)
    row = idx // grid_size[1]
    col = idx % grid_size[1]
    grid_image.paste(img, (col * img_width, row * img_height))

# Das Grid als PDF speichern
grid_image.save(output_pdf, "PDF", resolution=300)

print(f"Grid PDF gespeichert als {output_pdf}")

## Results

Best results were archieved with the following hyperparameter configuration:

`number of training images` **15**,

`learning rate` **$1 \times 10^{-4}$**, 

`training steps` **2000**

The results still lack quality and text can only rarely be displayed correctly.