# 02. Basic Image Generation + Functions

## 04. Image-to-Image Generation

### Content
1. [Img2Img - Base Model](#base)
2. [Img2Img - Comparison Base  + Refiner Model](#refiner)
3. [Img2Img - Diffusion Process Grid](#grid1)
4. [Img2Img - Examples Kelloggs (Honey Cornflakes)](#exampleskelloggs)
5. [Img2Img - Examples Campusbier (Beer Can)](#examplescampusbier)
6. [Key Findings](#keyfindings)

## Description + Links

* in addition to a text-prompt, you pass an initial image as a starting point for the diffusion process
* the initial image is encoded to the latent space and noise is added to it
* the denoising process **starts from the noisy initial image** (in the latent space), predicts the nose and removes the predicted noiseto get the new image

**Parameters:**
* **Strength (0-1)**: determines how much the generated image resembles the initial image
    *  higher strength: value gives the model more creativity, a value of 1.0 means the initial image is more or less ignored
    *  lower strength: generated image is more similar to the initial image
* **Guidance Scale (default 5.0)**: used to control how closely aligned the generated image and text prompt are
    * lower value: generated image is not that closely aligned to the prompt
    * higher value: generated image is more aligned with the prompt
---
**Documentation**

https://huggingface.co/docs/diffusers/using-diffusers/img2img

https://huggingface.co/docs/diffusers/using-diffusers/sdxl

## Setup

In [None]:
%env HF_HOME=/cluster/user/ehoemmen/.cache
%env HF_DATASETS_CACHE=/cluster/user/ehoemmen/.cache
%env TRANSFORMERS_CACHE=/cluster/user/ehoemmen/.cache

In [None]:
pip install -U diffusers invisible_watermark transformers accelerate safetensors

In [None]:
import torch
from diffusers import StableDiffusionXLImg2ImgPipeline, AutoPipelineForImage2Image
from diffusers.utils import load_image, make_image_grid
from PIL import Image

base = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache"
)
# pipe = pipe.to("cuda")
base.enable_model_cpu_offload()

refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, cache_dir="/cluster/user/ehoemmen/.cache"
)
# pipe = pipe.to("cuda")
refiner.enable_model_cpu_offload()

<a id="base"></a>

## 1. Img2Img - Base Model

In [None]:
# load initial image
url = "https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/aa_xl/000000009.png"
init_image = load_image(url).convert("RGB")

prompt = "cowboy riding a horse"

# pass prompt and image to the pipeline
image = base(prompt=prompt, 
             strenght=1.0,
             guidance_scale=10,
             num_inference_steps=50,
             image=init_image
            ).images[0]

# make grid
make_image_grid([init_image, image], rows=1, cols=2)

<a id="refiner"></a>

## 2. Img2Img - Comparison Base and Refiner

In [None]:
# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-sdxl-init.png"
init_image = load_image(url)

# parameters (50 default inference steps - strength=0.7 --> 50 x 0.7 = 35 steps)
prompt = "Astronaut in a jungle, trees in the background, comic style, cold color palette, muted colors, detailed"
strength = 0.7
cfg = 7.0

# compare base and refiner pipeline
base_image= base(prompt=prompt, 
                 image=init_image,
                 guidance_scale=cfg,
                 strength=strength
                ).images[0]

refiner_image = refiner(prompt=prompt, 
                        image=init_image,
                        guidance_scale=cfg,
                        strength=strength
                       ).images[0]

# create grid
make_image_grid([init_image, base_image, refiner_image], rows=1, cols=3)

<a id="grid1"></a>

## 3. Img2Img - Diffusion Process Grid

Here you can see that the diffusion process starts with the initial image.

In [None]:
# Grid with 5 Process Steps

n_images = 5
prompt = "Beer Can, delicous organic beer can, sustainable local brand, green label, with the text CAMPUSBIER"
url = '../5.0_pictures/campusbier_2.png'
init_image = load_image(url)

# Create a list of denoising_end values
end = 1.0 / n_images
denoising_ends = [end * (step + 1) for step in range(n_images)]

# Seed setzen
generator = torch.Generator(device="cuda").manual_seed(2147483647) 

# Initialize images list
images = []

# Generate images for each denoising_end value
for i in range(n_images):
    generator.manual_seed(2147483647)
    image = base(prompt=prompt, 
                     generator=generator,
                     image=init_image,
                     num_inference_steps=30, 
                     strength=0.7, 
                     guidance_scale=6.0,
                     denoising_end=denoising_ends[i]
                    ).images[0]
    images.append(image)

make_image_grid(images, rows=1, cols=n_images)

<a id="exampleskelloggs"></a>

## 4. Img2Img - Examples Kelloggs (Honey Cornflakes)

**Comparing Base + Refiner** with differnent `stength` and `cfg` values.

High `stength`--> the initial image is mostly ignored.

In [None]:
# prepare image
url = '../5.0_pictures/kelloggs_resized.jpg'
init_image = load_image(url)

# parameters
prompt = "Honey Flavoured Cornflakes, Yellow Packaging design, cute bees, food photography, mockup"
strength = 0.8
cfg = 7.0

# pass prompt and image to base and refiner pipeline
base_image = base(prompt=prompt, 
                 image=init_image, 
                 strength=strength, 
                 guidance_scale=cfg
                ).images[0]

refiner_image = base(prompt=prompt, 
                 image=init_image, 
                 strength=strength, 
                 guidance_scale=cfg
                ).images[0]

make_image_grid([init_image, base_image, refiner_image], rows=1, cols=3)

Low `strength` --> the initial image has higher influence

Low `cfg` --> more creativity (model is not so focused on the prompt)

In [None]:
# prepare image
url = '../5.0_pictures/kelloggs_resized.jpg'
init_image = load_image(url)

# parameters
prompt = "Honey Flavoured Cornflakes, Yellow Packaging design, cute bees, food photography, mockup"
strength = 0.4
cfg = 4.0

# pass prompt and image to base and refiner pipeline
base_image = base(prompt=prompt, 
                 image=init_image, 
                 strength=strength, 
                 guidance_scale=cfg
                ).images[0]

refiner_image = base(prompt=prompt, 
                 image=init_image, 
                 strength=strength, 
                 guidance_scale=cfg
                ).images[0]

make_image_grid([init_image, base_image, refiner_image], rows=1, cols=3)

High `strength`  +

High `cfg`

--> should be a really creative output, but sticking to the prompt

In [None]:
# prepare image
url = '../5.0_pictures/kelloggs_resized.jpg'
init_image = load_image(url)

# parameters
prompt = "Honey Flavoured Cornflakes, Yellow Packaging design, cute bees, food photography, professional mockup"
strength = 0.8
cfg = 10.0

# pass prompt and image to base and refiner pipeline
refiner_image = refiner(prompt=prompt, 
                 image=init_image, 
                 strength=strength, 
                 guidance_scale=cfg
                ).images[0]

make_image_grid([init_image, refiner_image], rows=1, cols=2)

Low `strength` +

Low `cfg`

--> output should be similar to initial image, but creative as it's not so focused on the prompt

In [None]:
#strenght = 0.4, guidance scale = 4.0

prompt = "Honey Flavoured Cornflakes, Yellow Packaging design, cute bees, food photography"
strength = 0.4
cfg = 4.0

# pass prompt and image to pipeline
image = refiner(prompt, 
                 image=init_image, 
                 strength=0.4, 
                 guidance_scale=4
                ).images[0]

make_image_grid([init_image, image], rows=1, cols=2)

<a id="examplescampusbier"></a>

## 5. Img2Img - Examples Campusbier 

General Test with the Campusbier, how it can influence the bottle and label (the building should be replaced with students)

low `strength` +

low `cfg`

so the initial image has high influence on the image generation process.

In [None]:
url = '../5.0_pictures/campusbier_2.png'
init_image = load_image(url)

prompt = "Beer Bottle, students on the label, delicous beer, green colors, food photography, mockup"
negativ_prompt = "can"

strength = 0.4
cfg = 6.0

# pass prompt and image to pipeline
image = refiner(prompt, 
                negativ_prompt=negativ_prompt,
                image=init_image, 
                strength=strength, 
                guidance_scale=cfg
                ).images[0]

make_image_grid([init_image, image], rows=1, cols=2)

Creating a Campusbier **Beer Can**

high `strength` +

highter `cfg`

should give the model the freedom and creativity to transform the bootle into a can

In [None]:
# Campusbier Can
#strenght = 0.7, guidance scale = 6.0

url = '../5.0_pictures/campusbier_2.png'
init_image = load_image(url)

prompt = "Beer Can, delicous organic beer can, sustainable local brand, green label, with the text CAMPUSBIER"
negativ_prompt = "bottle"

# pass prompt and image to pipeline
image = refiner(prompt=prompt, 
                 negativ_prompt=negativ_prompt,
                 image=init_image, 
                 strength=0.9, 
                 guidance_scale=10.0
                ).images[0]

make_image_grid([init_image, image], rows=1, cols=2)

<a id="keyfindings"></a>

## Key Findings

You have the option of using the two parameters **`Strength`** and **`Guidance Scale`** to either stay very close to the original image or to leave it almost unconsidered. This differs depending on the **application**, so that no recommendations for specific values can be made here.

Depending on the application, the base or refiner model is more suitable. If the original image is to be modified to a greater extent, the **Base Model** is the better choice. If the initial image is to be optimized (focus on details such as eyes), the **Refiner Model** may be the better choice.

It is important to use high-quality source images, as this forms the basis. For the packaging development application, it is helpful to keep both parameters low so as not to give the model too much freedom and to stay close to the original. 
<br>In general, however, the image-to-image pipeline is **not suitable for packaging development**. At high parameter values, the generated images are too far away from the original image (Kelloggs folding carton packaging was partially changed into completely new packaging such as cans or cups). At low parameter values, the model is usually not able to change the individual image elements sufficiently (the Kelloggs tiger remains a tiger and could not be changed into a bee). </br>

But it worked to change the **Campusbier** glass bottle into a **beer can** (high parameter values for more freedom + negative prompt). This would at least give a first, rough impression of what the product could look like (some elements such as the color scheme would be adopted).

Possibly suitable as a **creativity method** or in **combination with other image control features**. This was not tested as part of the project.