# 5.2. ControlNet
## Dreambooth + ControlNet

Controlling image diffusion models by conditioning the model with an additional input image

----

Based on the **fine-tuning** evaluation, **controlnet** is now also used to control image generation more precisely and further improve image quality of the food packaging designs.


All image generations were done with the following **hyperparameters**:

`number of training images` **15**,

`learning rate` **$1 \times 10^{-4}$**, 

`training steps` **2000**

The **dreambooth x controlnet evaluation** was based on 50 output-images, which were rated as “achieved” and “not achieved” in the following categories:

1. **brand coherence**
2. **target design**
3. **visual aesthetics**

---

**Find the evaluated image-grids under:**

anhang/anhang 03_dreambooth x controlnet

anhang/anhang 05_nesquik_dreambooth x controlnet

## Setup

In [None]:
%env HF_HOME=/cluster/user/ehoemmen/.cache
%env HF_DATASETS_CACHE=/cluster/user/ehoemmen/.cache

In [None]:
!pip install diffusers --upgrade -q
!pip install opencv-python transformers mediapipe matplotlib accelerate -q

In [None]:
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, AutoencoderKL, UniPCMultistepScheduler
from diffusers.utils import load_image
import numpy as np
import torch

import cv2
from PIL import Image

In [None]:
# create grid
from PIL import Image

#Image Grid
def image_grid(imgs, rows, cols):
    assert len(imgs) == rows*cols

    w, h = imgs[0].size
    grid = Image.new('RGB', size=(cols*w, rows*h))
    grid_w, grid_h = grid.size
    
    for i, img in enumerate(imgs):
        grid.paste(img, box=(i%cols*w, i//cols*h))
    return grid

## Inference

In [None]:
# initialize the models and pipeline
controlnet_conditioning_scale = 0.5  # recommended for good generalization

controlnet = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache",
)
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache",
)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",  controlnet=controlnet, vae=vae, torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache",
)

#load fine-tuned lora-weights
pipe.load_lora_weights("erikhsos/nesquik_15-images_LoRA_lr1-4_2000")

pipe.enable_sequential_cpu_offload()

In [None]:
#unload lora weights

pipe.unload_lora_weights()

In [None]:
num_images = 4

# list of colors
colors = ["light green", "light blue", "olive", "grey"]

# prompt for different colors
prompts = [f"a [CB] bottle photo with a {color} label with the text CAMPUSBIER" for color in colors]
neg_prompt="green label, brown bottle"

# load original image
image = load_image(
   '/cluster/user/ehoemmen/development/tests_sonstiges/05_Masterarbeit/03_inpainting/campusbier_input.png' #enter path
)

# create canny edge image
image = np.array(original_image)
image = cv2.Canny(image, 100, 200)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)

# use specific seed, if necessary
generator = torch.manual_seed(493)

# create images
generated_images = pipe(
    prompts,
    negative_prompt=neg_prompt,
    num_inference_steps=25,
    controlnet_conditioning_scale=controlnet_conditioning_scale, 
    image=canny_image,
    generator=generator
).images

# create image grid
grid = image_grid(generated_images, rows=1, cols=num_images)

grid

## Results

The results could be improved in all evaluated categories so that more relevant images can be generated overall.

The image quality improves a lot and now it's possible to generate text.