# 02. Basic Image Generation + Functions

## 05. Inpainting

### Content:
1. [Initialzation + Tiger Example](#init)
2. [Example - Kelloggs](#kelloggs)
3. [Example - RUF Backmischung](#ruf)
4. [Key-Findings](#keyfind1)

## Description + Links

SD-XL Inpainting 0.1 is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask.

The SD-XL Inpainting 0.1 was initialized with the stable-diffusion-xl-base-1.0 weights. The model is trained for 40k steps at resolution 1024x1024 and 5% dropping of the text-conditioning to improve classifier-free guidance sampling. For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and, in 25% mask everything.

**Pipeline Parameters:**
* **Strength** is a measure of how much noise is added to the base image, which influences how similar the output is to the base image.
    * ***High strength*** = more noise is added and the denoising process takes longer, but you'll get higher quality images that are more different from the base image
    * ***low strength*** = less noise is added and the denoising process is faster, but the image quality may not be as great and the generated images resembles the base image more
 
* **Guidance Scale:**
    * ***high guidance scale*** = the prompt and the generated image are closely aligned, the output is a stricter interpretation of the prompt
    * ***low guidance scale*** = the prompt and generated image are more loosely aligned, so the output may be more varied from the prompt

---

https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/stable_diffusion_xl

https://huggingface.co/diffusers/stable-diffusion-xl-1.0-inpainting-0.1

https://huggingface.co/docs/diffusers/using-diffusers/inpaint

## Setup

In [None]:
%env HF_HOME=/cluster/user/ehoemmen/.cache
%env HF_DATASETS_CACHE=/cluster/user/ehoemmen/.cache
%env TRANSFORMERS_CACHE=/cluster/user/ehoemmen/.cache

In [None]:
pip install -U diffusers invisible_watermark transformers accelerate safetensors matplotlib

<a id="init"></a>

## 1. Initialization  + Tiger Example

In [None]:
import torch
from diffusers import StableDiffusionXLInpaintPipeline
from diffusers.utils import load_image
import matplotlib.pyplot as plt

pipe = StableDiffusionXLInpaintPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
    cache_dir="/cluster/user/ehoemmen/.cache"
)
pipe.enable_sequential_cpu_offload()

# image grid
from PIL import Image

#Image Grid
def image_grid(imgs, rows, cols):
    assert len(imgs) == rows*cols

    w, h = imgs[0].size
    grid = Image.new('RGB', size=(cols*w, rows*h))
    grid_w, grid_h = grid.size
    
    for i, img in enumerate(imgs):
        grid.paste(img, box=(i%cols*w, i//cols*h))
    return grid

In [None]:
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

init_image = load_image(img_url).convert("RGB")
mask_image = load_image(mask_url).convert("RGB")

prompt = "A majestic tiger sitting on a bench"
image = pipe(
    prompt=prompt, 
    image=init_image, 
    mask_image=mask_image, 
    num_inference_steps=50, 
    strength=0.80
).images[0]


# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(20, 8))

# Ursprüngliches Bild anzeigen
axes[0].imshow(init_image)
axes[0].axis('off')
axes[0].set_title('Original Image')

# Inpaint Mask anzeigen
axes[1].imshow(mask_image)
axes[1].axis('off')
axes[1].set_title('Inpaint Image')

# Generiertes Bild anzeigen
axes[2].imshow(image)
axes[2].axis('off')
axes[2].set_title(prompt)

plt.tight_layout()
plt.show()

<a id="kelloggs"></a>

## 2. Example Kelloggs

The aim here was, 
1. to replace the tiger on the Kellogg's packaging with a bee and 
2. to replace the tiger and the bowl of cornflakes
 
and the cereal bowl.

I created the inpaint image (mask_image) manually with a design program and uploaded it. 

The tests refer to different parameter values.

In [None]:
# nur Tiger austauschen
# strength = 0.7

init_image = load_image('../5.0_pictures/kelloggs_resized.jpg').convert("RGB")
mask_image = load_image('../5.0_pictures/kellogsfrosties_inpaint.jpg').convert("RGB")

prompt = "Honey Flavoured Cornflakes with a Bee"
image = pipe(
    prompt=prompt, 
    image=init_image, 
    mask_image=mask_image, 
    num_inference_steps=50, 
    strength=0.70
).images[0]

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(20, 8))

# Ursprüngliches Bild anzeigen
axes[0].imshow(init_image)
axes[0].axis('off')
axes[0].set_title('Original Image')

# Inpaint Mask anzeigen
axes[1].imshow(mask_image)
axes[1].axis('off')
axes[1].set_title('Inpaint Image')

# #Generierte Bild anzeigen
axes[2].imshow(image)
axes[2].axis('off')
axes[2].set_title(prompt)

plt.tight_layout()
plt.show()

In [None]:
# strength = 0.8, guidance scale = 10

init_image = load_image('../5.0_pictures/kelloggs_resized.jpg').convert("RGB")
mask_image = load_image('../5.0_pictures/kellogsfrosties_inpaint_2.jpg').convert("RGB")

# prompt = "Honey Flavoured Cornflakes with a Bee"

prompt = "honey flavoured conflakes with a simple bee illustration, cereal box label design"
image = pipe(
    prompt=prompt, 
    image=init_image, 
    mask_image=mask_image, 
    num_inference_steps=50, 
    strength=0.90, 
    guidance_scale=5
).images[0]

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(20, 8))

# Ursprüngliches Bild anzeigen
axes[0].imshow(init_image)
axes[0].axis('off')
axes[0].set_title('Original Image')

# Inpaint Mask anzeigen
axes[1].imshow(mask_image)
axes[1].axis('off')
axes[1].set_title('Inpaint Image')

# #Generierte Bild anzeigen
axes[2].imshow(image)
axes[2].axis('off')
axes[2].set_title(prompt)

plt.tight_layout()
plt.show()

In [None]:
init_image = load_image('../5.0_pictures/kellogsfrosties_removebg_preview.jpg').convert("RGB")
mask_image = load_image('../5.0_pictures/kellogsfrosties_inpaint_2.jpg').convert("RGB")

cluster/upload/5.0_pictures/kellogsfrosties_removebg_preview.jpg

prompt = "Honey Flavoured Cornflakes with a Bee"
image = pipe(
    prompt=prompt, 
    image=init_image, 
    mask_image=mask_image, 
    num_inference_steps=50, 
    strength=0.80
).images[0]

image

<a id="ruf"></a>
## Example - RUF Backmischung

The aim was to replace the apple cake with a chocolate cake. I got the pictures from the RUF website. Since the source image and the mask were in the format **544x800 px**, I also created the target image in the same dimensions.

I tried to **change the apple pie to a chocolate cake** by testing out different parameter values.

In [None]:
# strength = 0.8

init_image = load_image('../5.0_pictures/RUF_13726_Apfel-Haferflocken-Kuchen-1.png').convert("RGB")
mask_image = load_image('../5.0_pictures/RUF_13726-Apfel-Haferflocken-Kuchen-1_inpaint.jpg').convert("RGB")

prompt = "Glossy Chocolate cake, delicous, highly detailed, dark chocolate"
image = pipe(
    prompt=prompt, 
    image=init_image, 
    mask_image=mask_image, 
    height=800, 
    width=544,
    num_inference_steps=50, 
    strength=0.80
).images[0]

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(20, 8))

# Ursprüngliches Bild anzeigen
axes[0].imshow(init_image)
axes[0].axis('off')
axes[0].set_title('Original Image')

# Inpaint Mask anzeigen
axes[1].imshow(mask_image)
axes[1].axis('off')
axes[1].set_title('Inpaint Image')

# #Generierte Bild anzeigen
axes[2].imshow(image)
axes[2].axis('off')
axes[2].set_title(prompt)

plt.tight_layout()
plt.show()

In [None]:
#cfg=7.0, strength=0.5

init_image = load_image('../5.0_pictures/RUF_13726_Apfel-Haferflocken-Kuchen-1.png').convert("RGB")
mask_image = load_image('../5.0_pictures/RUF_13726-Apfel-Haferflocken-Kuchen-1_inpaint.jpg').convert("RGB")

prompt = "Glossy Chocolate cake on a wooden table, delicous, highly detailed, professional food photography, realistic photo"

generator = torch.Generator().manual_seed(0)

image = pipe(
    prompt=prompt,
    image=init_image,
    mask_image=mask_image,
    height=800, 
    width=544,
    guidance_scale=7.0,
    num_inference_steps=30,  # steps between 15 and 30 work well for us
    strength=0.5,  # make sure to use `strength` below 1.0
    generator=generator,
).images[0]

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(20, 8))

# Ursprüngliches Bild anzeigen
axes[0].imshow(init_image)
axes[0].axis('off')
axes[0].set_title('Original Image')

# Inpaint Mask anzeigen
axes[1].imshow(mask_image)
axes[1].axis('off')
axes[1].set_title('Inpaint Image')

# #Generierte Bild anzeigen
axes[2].imshow(image)
axes[2].axis('off')
axes[2].set_title(prompt)

plt.tight_layout()
plt.show()

compare results of high and low `strength`and `cfg` values


In [None]:
#cfg=8.0, strength=0.8

init_image = load_image('../5.0_pictures/RUF_13726_Apfel-Haferflocken-Kuchen-1.png').convert("RGB")
mask_image = load_image('../5.0_pictures/RUF_13726-Apfel-Haferflocken-Kuchen-1_inpaint.jpg').convert("RGB")

prompt = "Glossy Chocolate cake, delicous, highly detailed, dark chocolate"

generator = torch.Generator().manual_seed(0)

image1 = pipe(
    prompt=prompt,
    image=init_image,
    mask_image=mask_image,
    height=800, 
    width=544,
    guidance_scale=8.0,
    num_inference_steps=30,  # steps between 15 and 30 work well for us
    strength=0.8,  # make sure to use `strength` below 1.0
    generator=generator,
).images[0]

image2 = pipe(
    prompt=prompt,
    image=init_image,
    mask_image=mask_image,
    height=800, 
    width=544,
    guidance_scale=1.0,
    num_inference_steps=30,  # steps between 15 and 30 work well for us
    strength=0.5,  # make sure to use `strength` below 1.0
    generator=generator,
).images[0]

# Bilder mit matplotlib darstellen
fig, axes = plt.subplots(1, 3, figsize=(20, 8))

# Ursprüngliches Bild anzeigen
axes[0].imshow(init_image)
axes[0].axis('off')
axes[0].set_title('Original Image')

# Inpaint Mask anzeigen
axes[1].imshow(image1)
axes[1].axis('off')
axes[1].set_title('cfg=8.0, strength=0.8')

# #Generierte Bild anzeigen
axes[2].imshow(image2)
axes[2].axis('off')
axes[2].set_title('cfg=1.0, strength=0.5')

plt.tight_layout()
plt.show()

In [None]:
init_image = load_image('../5.0_pictures/RUF_13726_Apfel-Haferflocken-Kuchen-1.png').convert("RGB")
mask_image = load_image('../5.0_pictures/RUF_13726-Apfel-Haferflocken-Kuchen-1_inpaint.jpg').convert("RGB")

prompt = "Glossy Chocolate cake, brown cake, highly detailed food photography, chocolate chips around the cake"
negative_prompt="white cake"

generator = torch.Generator().manual_seed(0)

image1 = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    image=init_image,
    mask_image=mask_image,
    height=800, 
    width=544,
    guidance_scale=10.0,
    num_inference_steps=30,  # steps between 15 and 30 work well for us
    strength=0.5,  # make sure to use `strength` below 1.0
    generator=generator,
).images[0]

image1

<a id="keyfind1"></a>

## 4. Key Findings

Inpainting is a **promising method** of specifically modifying images/packaging. 

While the entire image is revised during image-to-image generation, here you have the option of only revising the **desired sub-areas**. This is helpful as partial areas and elements such as the logo remain in the same place and in the same quality. 
<br>If the `strength` value is too low, the generated image is hardly changed. A value that is too high leads to very abstract results. The same applies to extreme `guidance_scale` values. A range of **0.3 - 0.7** has been found for `strength` and **5.0 - 10.0** for `guidance_scale` </br>
<br>

The tests of changing the apple pie to chocolate cake showed that it's working quite good (if you have found right parameter values + prompt). Here it was a `strength`value around 0.5 and a `prompt`that described some details like "chocolate chips around the cake" and a `negative_prompt` that the cake shouldn't be white.

In addition, the manual way to create the inpainting mask is very inconvenient. An interface would be helpful here to mark the desired area directly. One possibility would be to create a local web gui via **Gradio**. </br>