# 04. Advanced Image Generation

## 01. ControlNET

Specific tests for the ControlNETs can be found directly in the respective sections below.

#### Content

1. [Canny Edge](#cannyedge)
2. [Open Pose](#openpose)
3. [Depth](#depth)
4. [Scribble](#scribble)
5. [M-LSD Line](#mlsdline)
6. [HED Boundary Vision](#hedboundary)
7. [Image Segmentation](#segmentation)
8. [Normal Map](#normalmap)

--- 

09. [Dreambooth x ControlNet](#dreamboothcontrolnet)
10. [Combining Multible Conditionings](#combining)
11. [Key-Findings](#keyfindings)

## Description + Links

* controlling image diffusion models by conditioning the model with an additional input image
* there are several ways of conditioning (canny edge, user sketching, human pose, depth, and more)

If you want to know more about ControlNET check out this [<u>Definitions Notebook</u>](../1.0_general/02_definitions.ipynb) under point 06. ControlNET.

---
**Documentation**

https://huggingface.co/docs/diffusers/api/pipelines/controlnet_sdxl

https://huggingface.co/docs/diffusers/main/en/using-diffusers/controlnet#controlnet

https://huggingface.co/blog/controlnet

**Paper**

[Zhang, L., et al. (2023): Adding conditional Control to Text-to-Image Diffusion Models](https://arxiv.org/abs/2302.05543)

## Setup

In [None]:
%env HF_HOME=/cluster/user/ehoemmen/.cache
%env HF_DATASETS_CACHE=/cluster/user/ehoemmen/.cache
%env TRANSFORMERS_CACHE=/cluster/user/ehoemmen/.cache

In [None]:
pip install -q -U diffusers controlnet_aux transformers accelerate mediapipe matplotlib opencv-python 

In [None]:
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, AutoencoderKL, UniPCMultistepScheduler
from diffusers.utils import load_image
from controlnet_aux import OpenposeDetector, MLSDdetector
from transformers import DPTFeatureExtractor, DPTForDepthEstimation
import numpy as np
import torch
import matplotlib.pyplot as plt
import cv2
from PIL import Image

<a id="cannyedge"></a>
## 01. Canny Edge

In [None]:
# initialize the models and pipeline
controlnet_conditioning_scale = 0.5  # recommended for good generalization
controlnet = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache",
)

vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache",
)

pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",  controlnet=controlnet, vae=vae, torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache",
)

pipe.enable_model_cpu_offload()
#pipe.enable_sequential_cpu_offload()

#### Some General Tests

In [None]:
prompt = "happy, red cat in the jungle"
#negative_prompt = "low quality, bad quality, sketches"

#download an image
original_image = load_image(
'../5.0_pictures/majestic_lion.png'
)

# get canny image
image = np.array(original_image)
image = cv2.Canny(image, 100, 200)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)

# generate image
generated_image = pipe(
    prompt, controlnet_conditioning_scale=controlnet_conditioning_scale, image=canny_image
).images[0]

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(20, 8))

# Ursprüngliches Bild anzeigen
axes[0].imshow(original_image)
axes[0].axis('off')
axes[0].set_title('Original Image')

# Canny Image anzeigen
axes[1].imshow(canny_image)
axes[1].axis('off')
axes[1].set_title('Canny Image')

# #Generierte Bild anzeigen
axes[2].imshow(generated_image)
axes[2].axis('off')
axes[2].set_title('Generated Image')

plt.tight_layout()
plt.show()

In [None]:
prompt = "elephant"
negative_prompt = "low quality, bad quality"
n_steps=50
controlnet_conditioning_scale = 0.5  # recommended for good generalization

# download an image
image = load_image(
'../5.0_pictures/majestic_lion.png'
)

# generate image
generated_image = pipe(
    prompt,
    negative_prompt = negative_prompt,
    controlnet_conditioning_scale=controlnet_conditioning_scale, 
    num_inference_steps=n_steps,
    image=canny_image,
    #guess_mode=True,
).images[0]

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(20, 8))

# Ursprüngliches Bild anzeigen
axes[0].imshow(original_image)
axes[0].axis('off')
axes[0].set_title('Original Image')

# Canny Image anzeigen
axes[1].imshow(canny_image)
axes[1].axis('off')
axes[1].set_title('Canny Image')

# #Generierte Bild anzeigen
axes[2].imshow(generated_image)
axes[2].axis('off')
axes[2].set_title('Generated Image')

plt.tight_layout()
plt.show()

In [None]:
# Das ursprüngliche Bild laden
original_image = load_image(../5.0_pictures/majestic_lion.png')

# Das generierte Bild erstellen
generated_image = pipe(
    prompt, 
    controlnet_conditioning_scale=controlnet_conditioning_scale, 
    image=canny_image
).images[0]

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(10, 5))

# Ursprüngliches Bild anzeigen
axes[0].imshow(original_image)
axes[0].axis('off')
axes[0].set_title('Original Image')

# Canny Image anzeigen
axes[1].imshow(canny_image)
axes[1].axis('off')
axes[1].set_title('Canny Image')

# #Generierte Bild anzeigen
axes[2].imshow(generated_image)
axes[2].axis('off')
axes[2].set_title('Generated Image')

plt.tight_layout()
plt.show()

In [None]:
# Das ursprüngliche Bild laden
original_image = load_image('../5.0_pictures/majestic_lion.png')

# Das generierte Bild erstellen
generated_image = pipe(
    prompt, 
    controlnet_conditioning_scale=controlnet_conditioning_scale, 
    image=canny_image
).images[0]

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(10, 5))

# Ursprüngliches Bild anzeigen
axes[0].imshow(original_image)
axes[0].axis('off')
axes[0].set_title('Original Image')

# Canny Image anzeigen
axes[1].imshow(canny_image)
axes[1].axis('off')
axes[1].set_title('Canny Image')

# #Generierte Bild anzeigen
axes[2].imshow(generated_image)
axes[2].axis('off')
axes[2].set_title('Generated Image')

plt.tight_layout()
plt.show()

#### Canny Edge - Kellogg's Test

In [None]:
prompt = "Honey Bites Cornflakes, with Bees"
negative_prompt = "low quality, bad quality"
n_steps=50
controlnet_conditioning_scale = 0.7  # recommended for good generalization

# download an image
image = load_image(
'../5.0_pictures/kellogsfrosties_removebg_preview.jpg'
)

# download an image
original_image = load_image(
'../5.0_pictures/kellogsfrosties_removebg_preview.jpg'
)

# get canny image
image = np.array(image)
image = cv2.Canny(image, 100, 200)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)

# generate image
generated_image = pipe(
    prompt,
    negative_prompt = negative_prompt,
    controlnet_conditioning_scale=controlnet_conditioning_scale, 
    num_inference_steps=n_steps,
    image=canny_image
).images[0]

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(10, 5))

# Ursprüngliches Bild anzeigen
axes[0].imshow(original_image)
axes[0].axis('off')
axes[0].set_title('Original Image')

# Canny Image anzeigen
axes[1].imshow(canny_image)
axes[1].axis('off')
axes[1].set_title('Canny Image')

# #Generierte Bild anzeigen
axes[2].imshow(generated_image)
axes[2].axis('off')
axes[2].set_title('Generated Image')

plt.tight_layout()
plt.show()

#### Canny Edge - Kellogg's Test 

Without a Prompt --> "Guess Mode"

In the Guess Mode the model predicts the output, without further inputs

In [None]:
prompt = ""
negative_prompt = "low quality, bad quality"
n_steps=50
controlnet_conditioning_scale = 0.7  # recommended for good generalization


# download an image
image = load_image(
'../5.0_pictures/kellogsfrosties-simple.jpg'
)

# download an image
original_image = load_image(
'../5.0_pictures/kellogsfrosties-simple.jpg'
)
cluster/upload/5.0_pictures/kellogsfrosties-simple.jpg
# get canny image
image = np.array(image)
image = cv2.Canny(image, 100, 200)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)

# generate image
generated_image = pipe(
    prompt,
    negative_prompt = negative_prompt,
    controlnet_conditioning_scale=controlnet_conditioning_scale, 
    num_inference_steps=n_steps,
    image=canny_image
).images[0]

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(10, 5))

# Ursprüngliches Bild anzeigen
axes[0].imshow(original_image)
axes[0].axis('off')
axes[0].set_title('Original Image')

# Canny Image anzeigen
axes[1].imshow(canny_image)
axes[1].axis('off')
axes[1].set_title('Canny Image')

# #Generierte Bild anzeigen
axes[2].imshow(generated_image)
axes[2].axis('off')
axes[2].set_title('Generated Image')

plt.tight_layout()
plt.show()

<a id="openpose"></a>
## 2. Open Pose

In [None]:
# Compute openpose conditioning image
openpose = OpenposeDetector.from_pretrained("lllyasviel/ControlNet",cache_dir="/cluster/user/ehoemmen/.cache")

image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/person.png"
)
openpose_image = openpose(image)

# Initialize ControlNet pipeline
controlnet = ControlNetModel.from_pretrained("thibaud/controlnet-openpose-sdxl-1.0", torch_dtype=torch.float16,cache_dir="/cluster/user/ehoemmen/.cache",)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet, torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache",
)
pipe.enable_model_cpu_offload()

#### General Open Pose Tests

In [None]:
# Image Generation
prompt = "Elon Musk in a desert, high quality"
negative_prompt = "low quality, bad quality"
images = pipe(
    prompt, 
    negative_prompt=negative_prompt,
    num_inference_steps=40,
    image=openpose_image.resize((1024, 1024)),
    generator=torch.manual_seed(97),
).images

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(20, 8))

# Ursprüngliches Bild anzeigen
axes[0].imshow(image)
axes[0].axis('off')
axes[0].set_title('Original Image')

# Canny Image anzeigen
axes[1].imshow(openpose_image)
axes[1].axis('off')
axes[1].set_title('Openpose Image')

# #Generierte Bild anzeigen
axes[2].imshow(images[0])
axes[2].axis('off')
axes[2].set_title('Generated Image')

plt.tight_layout()
plt.show()

In [None]:
image = load_image(
'../5.0_pictures/pngwing.com.png'
)
openpose_image = openpose(image)

# Image Generation
prompt = "Happy Cat"
negative_prompt = "low quality, bad quality"
images = pipe(
    prompt, 
    negative_prompt=negative_prompt,
    num_inference_steps=40,
    image=openpose_image.resize((1024, 1024)),
    generator=torch.manual_seed(97),
).images
images[0]

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(20, 8))

# Ursprüngliches Bild anzeigen
axes[0].imshow(image)
axes[0].axis('off')
axes[0].set_title('Original Image')

# Canny Image anzeigen
axes[1].imshow(openpose_image)
axes[1].axis('off')
axes[1].set_title('Openpose Image')

# #Generierte Bild anzeigen
axes[2].imshow(images[0])
axes[2].axis('off')
axes[2].set_title('Generated Image')

plt.tight_layout()
plt.show()

<a id="depth"></a>
## 3. Depth
https://huggingface.co/diffusers/controlnet-depth-sdxl-1.0

In [None]:
depth_estimator = DPTForDepthEstimation.from_pretrained("Intel/dpt-hybrid-midas").to("cuda")
feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-hybrid-midas")
controlnet = ControlNetModel.from_pretrained(
    "diffusers/controlnet-depth-sdxl-1.0",
    variant="fp16",
    use_safetensors=True,
    torch_dtype=torch.float16,
    cache_dir="/cluster/user/ehoemmen/.cache"
).to("cuda")
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16,cache_dir="/cluster/user/ehoemmen/.cache").to("cuda")
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    vae=vae,
    variant="fp16",
    use_safetensors=True,
    torch_dtype=torch.float16,
    cache_dir="/cluster/user/ehoemmen/.cache"
)
pipe.enable_model_cpu_offload()

In [None]:
def get_depth_map(image):
    image = feature_extractor(images=image, return_tensors="pt").pixel_values.to("cuda")
    with torch.no_grad(), torch.autocast("cuda"):
        depth_map = depth_estimator(image).predicted_depth

    depth_map = torch.nn.functional.interpolate(
        depth_map.unsqueeze(1),
        size=(1024, 1024),
        mode="bicubic",
        align_corners=False,
    )
    depth_min = torch.amin(depth_map, dim=[1, 2, 3], keepdim=True)
    depth_max = torch.amax(depth_map, dim=[1, 2, 3], keepdim=True)
    depth_map = (depth_map - depth_min) / (depth_max - depth_min)
    image = torch.cat([depth_map] * 3, dim=1)

    image = image.permute(0, 2, 3, 1).cpu().numpy()[0]
    image = Image.fromarray((image * 255.0).clip(0, 255).astype(np.uint8))
    return image

prompt = "stormtrooper lecture, photorealistic"
image = load_image("https://huggingface.co/lllyasviel/sd-controlnet-depth/resolve/main/images/stormtrooper.png")
controlnet_conditioning_scale = 0.5  # recommended for good generalization

depth_image = get_depth_map(image)

images = pipe(
    prompt, 
    image=depth_image, 
    num_inference_steps=30, 
    controlnet_conditioning_scale=controlnet_conditioning_scale
).images
pipe.enable_sequential_cpu_offload()

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(20, 8))

# Ursprüngliches Bild anzeigen
axes[0].imshow(image)
axes[0].axis('off')
axes[0].set_title('Original Image')

# Canny Image anzeigen
axes[1].imshow(depth_image)
axes[1].axis('off')
axes[1].set_title('Depth Image')

# #Generierte Bild anzeigen
axes[2].imshow(images[0])
axes[2].axis('off')
axes[2].set_title(prompt)

plt.tight_layout()
plt.show()

#### Depth - Kellogg's Test

In [None]:
# Cereals generation
def get_depth_map(image):
    image = feature_extractor(images=image, return_tensors="pt").pixel_values.to("cuda")
    with torch.no_grad(), torch.autocast("cuda"):
        depth_map = depth_estimator(image).predicted_depth

    depth_map = torch.nn.functional.interpolate(
        depth_map.unsqueeze(1),
        size=(1024, 1024),
        mode="bicubic",
        align_corners=False,
    )
    depth_min = torch.amin(depth_map, dim=[1, 2, 3], keepdim=True)
    depth_max = torch.amax(depth_map, dim=[1, 2, 3], keepdim=True)
    depth_map = (depth_map - depth_min) / (depth_max - depth_min)
    image = torch.cat([depth_map] * 3, dim=1)

    image = image.permute(0, 2, 3, 1).cpu().numpy()[0]
    image = Image.fromarray((image * 255.0).clip(0, 255).astype(np.uint8))
    return image

prompt = "organic cereals for kids"
image = load_image('../5.0_pictures/kellogsfrosties_removebg_preview.jpg')
controlnet_conditioning_scale = 0.5  # recommended for good generalization

depth_image = get_depth_map(image)

images = pipe(
    prompt, 
    image=depth_image, 
    num_inference_steps=30, 
    controlnet_conditioning_scale=controlnet_conditioning_scale
).images

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(20, 8))

# Ursprüngliches Bild anzeigen
axes[0].imshow(image)
axes[0].axis('off')
axes[0].set_title('Original Image')

# Canny Image anzeigen
axes[1].imshow(depth_image)
axes[1].axis('off')
axes[1].set_title('Depth Image')

# #Generierte Bild anzeigen
axes[2].imshow(images[0])
axes[2].axis('off')
axes[2].set_title(prompt)

plt.tight_layout()
plt.show()

<a id="scribble"></a>
## 4. (Fake) Scribble
Fake Scribble -> If you don't want to draw your own scribbles. The script use the same scribble-based model but use a simple algorithm to synthesize scribbles from input images.

In [None]:
from PIL import Image
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
import torch
from controlnet_aux import HEDdetector
from diffusers.utils import load_image

hed = HEDdetector.from_pretrained('lllyasviel/Annotators', cache_dir="/cluster/user/ehoemmen/.cache")

oimage = load_image("https://huggingface.co/lllyasviel/sd-controlnet-scribble/resolve/main/images/bag.png")

image = hed(oimage, scribble=True)

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/sd-controlnet-scribble", torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache"
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None, torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache"
)

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

pipe.enable_model_cpu_offload()

images = pipe("bag", image, num_inference_steps=20).images[0]

# image.save('images/bag_scribble_out.png')

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(20, 8))

# Ursprüngliches Bild anzeigen
axes[0].imshow(oimage)
axes[0].axis('off')
axes[0].set_title('Original Image')

# Canny Image anzeigen
axes[1].imshow(image)
axes[1].axis('off')
axes[1].set_title('Sketch')

# #Generierte Bild anzeigen
axes[2].imshow(images)
axes[2].axis('off')
axes[2].set_title(prompt)

plt.tight_layout()
plt.show()

#### (Fake Scribble) - Kellogg's Test

In [None]:
hed = HEDdetector.from_pretrained('lllyasviel/Annotators', cache_dir="/cluster/user/ehoemmen/.cache")

oimage = load_image('../5.0_pictures/kellogsfrosties_removebg_preview.jpg')

image = hed(oimage, scribble=True)

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/sd-controlnet-scribble", torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache"
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None, torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache"
)

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

pipe.enable_model_cpu_offload()

images = pipe("cornflakes chocolate flavour", image, num_inference_steps=20).images[0]

# image.save('images/bag_scribble_out.png')

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(20, 8))

# Ursprüngliches Bild anzeigen
axes[0].imshow(oimage)
axes[0].axis('off')
axes[0].set_title('Original Image')

# Canny Image anzeigen
axes[1].imshow(image)
axes[1].axis('off')
axes[1].set_title('Sketch')

# #Generierte Bild anzeigen
axes[2].imshow(images)
axes[2].axis('off')
axes[2].set_title(prompt)

plt.tight_layout()
plt.show()

<a id="mlsdline"></a>
## 5. M - LSD Line
A monochrome image composed only of white straight lines on a black background

In [None]:
from PIL import Image
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
import torch
from controlnet_aux import MLSDdetector
from diffusers.utils import load_image

mlsd = MLSDdetector.from_pretrained('lllyasviel/ControlNet', cache_dir="/cluster/user/ehoemmen/.cache")

image = load_image("https://huggingface.co/lllyasviel/sd-controlnet-mlsd/resolve/main/images/room.png")

newimage = mlsd(image)

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/sd-controlnet-mlsd", torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache"
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None, torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache"
)

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

images = pipe("children's room", 
              newimage, 
              num_inference_steps=20
             ).images[0]

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(20, 8))

# Ursprüngliches Bild anzeigen
axes[0].imshow(image)
axes[0].axis('off')
axes[0].set_title('Original Image')

# Canny Image anzeigen
axes[1].imshow(newimage)
axes[1].axis('off')
axes[1].set_title('Depth Image')

# #Generierte Bild anzeigen
axes[2].imshow(images)
axes[2].axis('off')
axes[2].set_title('Generated Image')

plt.tight_layout()
plt.show()

<a id="hedboundary"></a>
## 6. HED Boundary Version
A monochrome image with white soft edges on a black background

In [None]:
from PIL import Image
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
import torch
from controlnet_aux import HEDdetector
from diffusers.utils import load_image

hed = HEDdetector.from_pretrained('lllyasviel/Annotators', cache_dir="/cluster/user/ehoemmen/.cache")

image = load_image("https://huggingface.co/lllyasviel/sd-controlnet-hed/resolve/main/images/man.png")

newimage = hed(image)

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/sd-controlnet-hed", torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache"
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None, torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache"
)

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

pipe.enable_model_cpu_offload()

images = pipe("oil painting of handsome old man, masterpiece", newimage, num_inference_steps=20).images[0]

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(20, 8))

# Ursprüngliches Bild anzeigen
axes[0].imshow(image)
axes[0].axis('off')
axes[0].set_title('Original Image')

# Canny Image anzeigen
axes[1].imshow(newimage)
axes[1].axis('off')
axes[1].set_title('HED Image')

# #Generierte Bild anzeigen
axes[2].imshow(images)
axes[2].axis('off')
axes[2].set_title('Generated Image')

plt.tight_layout()
plt.show()

<a id="segmentation"></a>
## 7. Image Segmentation - ADE20K

In [None]:
# 1. Create a Color Palette

palette = np.asarray([
    [0, 0, 0],
    [120, 120, 120],
    [180, 120, 120],
    [6, 230, 230],
    [80, 50, 50],
    [4, 200, 3],
    [120, 120, 80],
    [140, 140, 140],
    [204, 5, 255],
    [230, 230, 230],
    [4, 250, 7],
    [224, 5, 255],
    [235, 255, 7],
    [150, 5, 61],
    [120, 120, 70],
    [8, 255, 51],
    [255, 6, 82],
    [143, 255, 140],
    [204, 255, 4],
    [255, 51, 7],
    [204, 70, 3],
    [0, 102, 200],
    [61, 230, 250],
    [255, 6, 51],
    [11, 102, 255],
    [255, 7, 71],
    [255, 9, 224],
    [9, 7, 230],
    [220, 220, 220],
    [255, 9, 92],
    [112, 9, 255],
    [8, 255, 214],
    [7, 255, 224],
    [255, 184, 6],
    [10, 255, 71],
    [255, 41, 10],
    [7, 255, 255],
    [224, 255, 8],
    [102, 8, 255],
    [255, 61, 6],
    [255, 194, 7],
    [255, 122, 8],
    [0, 255, 20],
    [255, 8, 41],
    [255, 5, 153],
    [6, 51, 255],
    [235, 12, 255],
    [160, 150, 20],
    [0, 163, 255],
    [140, 140, 140],
    [250, 10, 15],
    [20, 255, 0],
    [31, 255, 0],
    [255, 31, 0],
    [255, 224, 0],
    [153, 255, 0],
    [0, 0, 255],
    [255, 71, 0],
    [0, 235, 255],
    [0, 173, 255],
    [31, 0, 255],
    [11, 200, 200],
    [255, 82, 0],
    [0, 255, 245],
    [0, 61, 255],
    [0, 255, 112],
    [0, 255, 133],
    [255, 0, 0],
    [255, 163, 0],
    [255, 102, 0],
    [194, 255, 0],
    [0, 143, 255],
    [51, 255, 0],
    [0, 82, 255],
    [0, 255, 41],
    [0, 255, 173],
    [10, 0, 255],
    [173, 255, 0],
    [0, 255, 153],
    [255, 92, 0],
    [255, 0, 255],
    [255, 0, 245],
    [255, 0, 102],
    [255, 173, 0],
    [255, 0, 20],
    [255, 184, 184],
    [0, 31, 255],
    [0, 255, 61],
    [0, 71, 255],
    [255, 0, 204],
    [0, 255, 194],
    [0, 255, 82],
    [0, 10, 255],
    [0, 112, 255],
    [51, 0, 255],
    [0, 194, 255],
    [0, 122, 255],
    [0, 255, 163],
    [255, 153, 0],
    [0, 255, 10],
    [255, 112, 0],
    [143, 255, 0],
    [82, 0, 255],
    [163, 255, 0],
    [255, 235, 0],
    [8, 184, 170],
    [133, 0, 255],
    [0, 255, 92],
    [184, 0, 255],
    [255, 0, 31],
    [0, 184, 255],
    [0, 214, 255],
    [255, 0, 112],
    [92, 255, 0],
    [0, 224, 255],
    [112, 224, 255],
    [70, 184, 160],
    [163, 0, 255],
    [153, 0, 255],
    [71, 255, 0],
    [255, 0, 163],
    [255, 204, 0],
    [255, 0, 143],
    [0, 255, 235],
    [133, 255, 0],
    [255, 0, 235],
    [245, 0, 255],
    [255, 0, 122],
    [255, 245, 0],
    [10, 190, 212],
    [214, 255, 0],
    [0, 204, 255],
    [20, 0, 255],
    [255, 255, 0],
    [0, 153, 255],
    [0, 41, 255],
    [0, 255, 204],
    [41, 0, 255],
    [41, 255, 0],
    [173, 0, 255],
    [0, 245, 255],
    [71, 0, 255],
    [122, 0, 255],
    [0, 255, 184],
    [0, 92, 255],
    [184, 255, 0],
    [0, 133, 255],
    [255, 214, 0],
    [25, 194, 194],
    [102, 255, 0],
    [92, 0, 255],
])

from transformers import AutoImageProcessor, UperNetForSemanticSegmentation
from PIL import Image
import numpy as np
import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
from diffusers.utils import load_image

image_processor = AutoImageProcessor.from_pretrained("openmmlab/upernet-convnext-small", cache_dir="/cluster/user/ehoemmen/.cache")
image_segmentor = UperNetForSemanticSegmentation.from_pretrained("openmmlab/upernet-convnext-small", cache_dir="/cluster/user/ehoemmen/.cache")

image = load_image("https://huggingface.co/lllyasviel/sd-controlnet-seg/resolve/main/images/house.png").convert('RGB')

pixel_values = image_processor(image, return_tensors="pt").pixel_values

with torch.no_grad():
  outputs = image_segmentor(pixel_values)

seg = image_processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]

color_seg = np.zeros((seg.shape[0], seg.shape[1], 3), dtype=np.uint8) # height, width, 3

for label, color in enumerate(palette):
    color_seg[seg == label, :] = color

color_seg = color_seg.astype(np.uint8)

newimage = Image.fromarray(color_seg)

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/sd-controlnet-seg", torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache"
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None, torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache"
)

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

pipe.enable_model_cpu_offload()

images = pipe("ancient roman house", image, num_inference_steps=20).images[0]

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(20, 8))

# Ursprüngliches Bild anzeigen
axes[0].imshow(image)
axes[0].axis('off')
axes[0].set_title('Original Image')

# Canny Image anzeigen
axes[1].imshow(newimage)
axes[1].axis('off')
axes[1].set_title('Segmenation Image')

# #Generierte Bild anzeigen
axes[2].imshow(images)
axes[2].axis('off')
axes[2].set_title('Generated Image')

plt.tight_layout()
plt.show()

#### Image Segmentation - Test Kellogg's

In [None]:
# 1. Create a Color Palette

palette = np.asarray([
    [0, 0, 0],
    [120, 120, 120],
    [180, 120, 120],
    [6, 230, 230],
    [80, 50, 50],
    [4, 200, 3],
    [120, 120, 80],
    [140, 140, 140],
    [204, 5, 255],
    [230, 230, 230],
    [4, 250, 7],
    [224, 5, 255],
    [235, 255, 7],
    [150, 5, 61],
    [120, 120, 70],
    [8, 255, 51],
    [255, 6, 82],
    [143, 255, 140],
    [204, 255, 4],
    [255, 51, 7],
    [204, 70, 3],
    [0, 102, 200],
    [61, 230, 250],
    [255, 6, 51],
    [11, 102, 255],
    [255, 7, 71],
    [255, 9, 224],
    [9, 7, 230],
    [220, 220, 220],
    [255, 9, 92],
    [112, 9, 255],
    [8, 255, 214],
    [7, 255, 224],
    [255, 184, 6],
    [10, 255, 71],
    [255, 41, 10],
    [7, 255, 255],
    [224, 255, 8],
    [102, 8, 255],
    [255, 61, 6],
    [255, 194, 7],
    [255, 122, 8],
    [0, 255, 20],
    [255, 8, 41],
    [255, 5, 153],
    [6, 51, 255],
    [235, 12, 255],
    [160, 150, 20],
    [0, 163, 255],
    [140, 140, 140],
    [250, 10, 15],
    [20, 255, 0],
    [31, 255, 0],
    [255, 31, 0],
    [255, 224, 0],
    [153, 255, 0],
    [0, 0, 255],
    [255, 71, 0],
    [0, 235, 255],
    [0, 173, 255],
    [31, 0, 255],
    [11, 200, 200],
    [255, 82, 0],
    [0, 255, 245],
    [0, 61, 255],
    [0, 255, 112],
    [0, 255, 133],
    [255, 0, 0],
    [255, 163, 0],
    [255, 102, 0],
    [194, 255, 0],
    [0, 143, 255],
    [51, 255, 0],
    [0, 82, 255],
    [0, 255, 41],
    [0, 255, 173],
    [10, 0, 255],
    [173, 255, 0],
    [0, 255, 153],
    [255, 92, 0],
    [255, 0, 255],
    [255, 0, 245],
    [255, 0, 102],
    [255, 173, 0],
    [255, 0, 20],
    [255, 184, 184],
    [0, 31, 255],
    [0, 255, 61],
    [0, 71, 255],
    [255, 0, 204],
    [0, 255, 194],
    [0, 255, 82],
    [0, 10, 255],
    [0, 112, 255],
    [51, 0, 255],
    [0, 194, 255],
    [0, 122, 255],
    [0, 255, 163],
    [255, 153, 0],
    [0, 255, 10],
    [255, 112, 0],
    [143, 255, 0],
    [82, 0, 255],
    [163, 255, 0],
    [255, 235, 0],
    [8, 184, 170],
    [133, 0, 255],
    [0, 255, 92],
    [184, 0, 255],
    [255, 0, 31],
    [0, 184, 255],
    [0, 214, 255],
    [255, 0, 112],
    [92, 255, 0],
    [0, 224, 255],
    [112, 224, 255],
    [70, 184, 160],
    [163, 0, 255],
    [153, 0, 255],
    [71, 255, 0],
    [255, 0, 163],
    [255, 204, 0],
    [255, 0, 143],
    [0, 255, 235],
    [133, 255, 0],
    [255, 0, 235],
    [245, 0, 255],
    [255, 0, 122],
    [255, 245, 0],
    [10, 190, 212],
    [214, 255, 0],
    [0, 204, 255],
    [20, 0, 255],
    [255, 255, 0],
    [0, 153, 255],
    [0, 41, 255],
    [0, 255, 204],
    [41, 0, 255],
    [41, 255, 0],
    [173, 0, 255],
    [0, 245, 255],
    [71, 0, 255],
    [122, 0, 255],
    [0, 255, 184],
    [0, 92, 255],
    [184, 255, 0],
    [0, 133, 255],
    [255, 214, 0],
    [25, 194, 194],
    [102, 255, 0],
    [92, 0, 255],
])

from transformers import AutoImageProcessor, UperNetForSemanticSegmentation
from PIL import Image
import numpy as np
import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
from diffusers.utils import load_image

image_processor = AutoImageProcessor.from_pretrained("openmmlab/upernet-convnext-small", cache_dir="/cluster/user/ehoemmen/.cache")
image_segmentor = UperNetForSemanticSegmentation.from_pretrained("openmmlab/upernet-convnext-small", cache_dir="/cluster/user/ehoemmen/.cache")

image = load_image('../5.0_pictures/kellogsfrosties_removebg_preview.jpg').convert("RGB")

pixel_values = image_processor(image, return_tensors="pt").pixel_values

with torch.no_grad():
  outputs = image_segmentor(pixel_values)

seg = image_processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]

color_seg = np.zeros((seg.shape[0], seg.shape[1], 3), dtype=np.uint8) # height, width, 3

for label, color in enumerate(palette):
    color_seg[seg == label, :] = color

color_seg = color_seg.astype(np.uint8)

newimage = Image.fromarray(color_seg)

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/sd-controlnet-seg", torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache"
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None, torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache"
)

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

pipe.enable_model_cpu_offload()

images = pipe("honey flavoured cornflakes package", image, num_inference_steps=20).images[0]

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(20, 8))

# Ursprüngliches Bild anzeigen
axes[0].imshow(image)
axes[0].axis('off')
axes[0].set_title('Original Image')

# Canny Image anzeigen
axes[1].imshow(newimage)
axes[1].axis('off')
axes[1].set_title('Segmenation Image')

# #Generierte Bild anzeigen
axes[2].imshow(images)
axes[2].axis('off')
axes[2].set_title('Generated Image')

plt.tight_layout()
plt.show()

<a id="normalmap"></a>
## 8. Normal Map Version
This checkpoint corresponds to the ControlNet conditioned on Normal Map Estimation.

In [None]:
from PIL import Image
from transformers import pipeline
import numpy as np
import cv2
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
import torch
from diffusers.utils import load_image

image = load_image("https://huggingface.co/lllyasviel/sd-controlnet-normal/resolve/main/images/toy.png").convert("RGB")

depth_estimator = pipeline("depth-estimation", model ="Intel/dpt-hybrid-midas" )

image = depth_estimator(image)['predicted_depth'][0]

image = image.numpy()

image_depth = image.copy()
image_depth -= np.min(image_depth)
image_depth /= np.max(image_depth)

bg_threhold = 0.4

x = cv2.Sobel(image, cv2.CV_32F, 1, 0, ksize=3)
x[image_depth < bg_threhold] = 0

y = cv2.Sobel(image, cv2.CV_32F, 0, 1, ksize=3)
y[image_depth < bg_threhold] = 0

z = np.ones_like(x) * np.pi * 2.0

image = np.stack([x, y, z], axis=2)
image /= np.sum(image ** 2.0, axis=2, keepdims=True) ** 0.5
image = (image * 127.5 + 127.5).clip(0, 255).astype(np.uint8)
image = Image.fromarray(image)

controlnet = ControlNetModel.from_pretrained(
    "fusing/stable-diffusion-v1-5-controlnet-normal", torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache"
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None, torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache"
)

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

pipe.enable_model_cpu_offload()

images = pipe("cute little toy", 
              image, 
              num_inference_steps=20
             ).images[0]

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(20, 8))

# Ursprüngliches Bild anzeigen
axes[0].imshow(load_image("https://huggingface.co/lllyasviel/sd-controlnet-normal/resolve/main/images/toy.png").convert("RGB"))
axes[0].axis('off')
axes[0].set_title('Original Image')

# Canny Image anzeigen
axes[1].imshow(image)
axes[1].axis('off')
axes[1].set_title('Segmenation Image')

# #Generierte Bild anzeigen
axes[2].imshow(images)
axes[2].axis('off')
axes[2].set_title('Generated Image')

plt.tight_layout()
plt.show()

####  Normal Map - Kellogg's Test

In [None]:
from PIL import Image
from transformers import pipeline
import numpy as np
import cv2
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
import torch
from diffusers.utils import load_image

image = load_image('../5.0_pictures/kellogsfrosties_removebg_preview.jpg').convert("RGB")

depth_estimator = pipeline("depth-estimation", model ="Intel/dpt-hybrid-midas" )

image = depth_estimator(image)['predicted_depth'][0]

image = image.numpy()

image_depth = image.copy()
image_depth -= np.min(image_depth)
image_depth /= np.max(image_depth)

bg_threhold = 0.4

x = cv2.Sobel(image, cv2.CV_32F, 1, 0, ksize=3)
x[image_depth < bg_threhold] = 0

y = cv2.Sobel(image, cv2.CV_32F, 0, 1, ksize=3)
y[image_depth < bg_threhold] = 0

z = np.ones_like(x) * np.pi * 2.0

image = np.stack([x, y, z], axis=2)
image /= np.sum(image ** 2.0, axis=2, keepdims=True) ** 0.5
image = (image * 127.5 + 127.5).clip(0, 255).astype(np.uint8)
image = Image.fromarray(image)

controlnet = ControlNetModel.from_pretrained(
    "fusing/stable-diffusion-v1-5-controlnet-normal", torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache"
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None, torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache"
)

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

pipe.enable_model_cpu_offload()

images = pipe("chocolate cereals", 
              image, 
              num_inference_steps=20
             ).images[0]

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(20, 8))

# Ursprüngliches Bild anzeigen
axes[0].imshow(load_image('../5.0_pictures/kellogsfrosties_removebg_preview.jpg').convert("RGB"))
axes[0].axis('off')
axes[0].set_title('Original Image')

# Canny Image anzeigen
axes[1].imshow(image)
axes[1].axis('off')
axes[1].set_title('Segmenation Image')

# #Generierte Bild anzeigen
axes[2].imshow(images)
axes[2].axis('off')
axes[2].set_title('Generated Image')

plt.tight_layout()
plt.show()

<a id="dreamboothcontrolnet"></a>
## 09. Dreambooth x ControlNet

Documentation:
https://huggingface.co/blog/controlnet

Dreambooth Model:
https://huggingface.co/sd-dreambooth-library/mr-potato-head

In [None]:
from diffusers.utils import load_image

import cv2
from PIL import Image
import numpy as np

image = load_image(
    "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
)
image = np.array(image)

low_threshold = 100
high_threshold = 200

image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)

def image_grid(imgs, rows, cols):
    assert len(imgs) == rows * cols

    w, h = imgs[0].size
    grid = Image.new("RGB", size=(cols * w, rows * h))
    grid_w, grid_h = grid.size

    for i, img in enumerate(imgs):
        grid.paste(img, box=(i % cols * w, i // cols * h))
    return grid

controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache")

model_id = "sd-dreambooth-library/mr-potato-head"
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    model_id,
    controlnet=controlnet,
    safety_checker=None,
    torch_dtype=torch.float16, 
    cache_dir="/cluster/user/ehoemmen/.cache"
)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

generator = torch.manual_seed(10)
prompt = " a photo of sks mr potato head, best quality, extremely detailed"
images = pipe(
    prompt,
    canny_image,
    negative_prompt="monochrome, lowres, bad anatomy, low quality",
    num_inference_steps=20,
    generator=generator,
).images[0]

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(20, 8))

# Ursprüngliches Bild anzeigen
axes[0].imshow(load_image("https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"))
axes[0].axis('off')
axes[0].set_title('Original Image')

# Canny Image anzeigen
axes[1].imshow(canny_image)
axes[1].axis('off')
axes[1].set_title('Canny Image')

# #Generierte Bild anzeigen
axes[2].imshow(images)
axes[2].axis('off')
axes[2].set_title('Dreambooth x ControlNet Generation')

plt.tight_layout()
plt.show()

<a id="combining"></a>

## 10. Combining Multible Conditionings
Multiple ControlNet conditionings can be combined for a single image generation. Pass a list of ControlNets to the pipeline's constructor and a corresponding list of conditionings to __call__.

Here the **canny** conditioning will be combined with the **open pose** conditioning.

#### Canny Conditioning

In [None]:
from diffusers.utils import load_image
from PIL import Image
import cv2
import numpy as np
from diffusers.utils import load_image

In [None]:
#Original Image
image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/landscape.png"
)
image

In [None]:
#Canny Image

canny_image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/landscape.png"
)
canny_image = np.array(canny_image)

low_threshold = 100
high_threshold = 200

canny_image = cv2.Canny(canny_image, low_threshold, high_threshold)

# zero out middle columns of image where pose will be overlayed
zero_start = canny_image.shape[1] // 4
zero_end = zero_start + canny_image.shape[1] // 2
canny_image[:, zero_start:zero_end] = 0

canny_image = canny_image[:, :, None]
canny_image = np.concatenate([canny_image, canny_image, canny_image], axis=2)
canny_image = Image.fromarray(canny_image)
canny_image

#### Open Pose Conditioning

In [None]:
from controlnet_aux import OpenposeDetector
from diffusers.utils import load_image

In [None]:
#Original Image
image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/person.png"
)
image

In [None]:
#Open Pose Image
openpose = OpenposeDetector.from_pretrained("lllyasviel/ControlNet", cache_dir="/cluster/user/ehoemmen/.cache")

openpose_image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/person.png"
)
openpose_image = openpose(openpose_image)

openpose_image

In [None]:
#Combining Both

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
import torch

controlnet = [
    ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache"),
    ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache"),
]

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache"
)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

pipe.enable_model_cpu_offload()

prompt = "a giant standing in a fantasy landscape, best quality"
negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality"

generator = torch.Generator(device="cpu").manual_seed(1)

images = [openpose_image, canny_image]

image = pipe(
    prompt,
    images,
    num_inference_steps=20,
    generator=generator,
    negative_prompt=negative_prompt,
    controlnet_conditioning_scale=[1.0, 0.8],
).images[0]

image

<a id="keyfindings"></a>

## 11. Key Findings

The various conditionings via ControlNET make it possible to retain certain aspects and structures of an image when generating a new image. If the respective conditionings are used correctly and for the appropriate application, very good results can be achieved. However, the majority of ControlNETs are not suitable for food packaging.

For packaging development applications, the **Canny Edge** is the most promising conditioning method. Especially with regard to numerous design elements that are predefined in the packaging design (e.g. logo, design structure, ...) or should only be modified to a limited extent (e.g. color change), **Canny Edge** can be used to exert good control over the process. 

It is also interesting to note that ControlNETs can only be applied to certain **sections of the image**. This limitation means that control can only be applied to the desired areas of the image, while other areas can be freely generated by the model. It is conceivable that only the prepared product shown on the packaging could be taken into account via the ControlNET in order to depict a chocolate cake with the same shape and structure instead of an apple pie.

It is also possible to **combine ControlNET with fine-tuning methods** such as Dreambooth. With **ControlNET x Dreambooth** very individual images can be created that correspond to your own specifications. I have not yet tested this method with my own Dreambooth training.