# 03. Fast Image Generation

## 01. SDXL Turbo
#### Content

1. [SDXL Turbo - General](#sdxlturbogeneral)
2. [SDXL Turbo - Test Cornflakes ](#conflakes)
3. [SDXL Turbo - Image-to-Image](#img2img)
4. [Key-Findings](#keyfindings)

## Description + Links


* SDXL Turbo is an adversarial time-distilled model capable of running inference in as little as 1 step, increasing the number of steps up to 4 should be improve image quality
* generates 512x512 images by default (*height* and *width* parameter could be adjusted to 1024x1024, but with quality degradations)
* guidance scale should be set to 0.0, as the model was trained without
---
**Documentation**

https://huggingface.co/docs/diffusers/v0.24.0/en/using-diffusers/sdxl_turbo

**Paper**

[Sauer, A., et al (2023): Adversarial Diffusion Distillation](https://arxiv.org/abs/2311.17042)


## Setup

In [None]:
%env HF_HOME=/cluster/user/ehoemmen/.cache
%env HF_DATASETS_CACHE=/cluster/user/ehoemmen/.cache
%env TRANSFORMERS_CACHE=/cluster/user/ehoemmen/.cache

In [None]:
pip install -U diffusers invisible_watermark transformers accelerate safetensors datasets compel

In [None]:
from diffusers import AutoPipelineForText2Image, AutoPipelineForImage2Image
import torch

# compel for prompt weighting
from compel import Compel, ReturnedEmbeddingsType

# image grid
from PIL import Image

<a id="sdxlturbogeneral"></a>## 01. SDXL Turbo - General


In [None]:
pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16", cache_dir="/cluster/user/ehoemmen/.cache")
# pipeline = pipeline.to("cuda")
pipeline.enable_model_cpu_offload()

In [None]:
#Prompt Weighting
compel = Compel(
  tokenizer=[pipeline.tokenizer, pipeline.tokenizer_2] ,
  text_encoder=[pipeline.text_encoder, pipeline.text_encoder_2],
  returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,
  requires_pooled=[False, True]
)

#Image Grid
def image_grid(imgs, rows, cols):
    assert len(imgs) == rows*cols

    w, h = imgs[0].size
    grid = Image.new('RGB', size=(cols*w, rows*h))
    grid_w, grid_h = grid.size
    
    for i, img in enumerate(imgs):
        grid.paste(img, box=(i%cols*w, i//cols*h))
    return grid

#### A Single Inference Step

In [None]:
prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."

image = pipeline(prompt=prompt, 
                 guidance_scale=0.0, 
                 num_inference_steps=1
                ).images[0]
image

In [None]:
n_images = 3
prompt = ["A cinematic shot of a baby racoon wearing an intricate italian priest robe"] * n_images

images = pipeline(prompt=prompt, 
                  guidance_scale=0.0, 
                  num_inference_steps=1
                 ).images

grid = image_grid(images, rows=1, cols=3)
grid

In [None]:
n_images = 3
prompt = ["A cinematic shot of a baby racoon wearing an intricate italian priest robe"] * n_images
conditioning, pooled = compel(prompt)

images = pipeline(prompt_embeds=conditioning, 
                  pooled_prompt_embeds=pooled, 
                  guidance_scale=0.0, 
                  num_inference_steps=1).images

grid = image_grid(images, rows=1, cols=3)
grid

#### Image Quality Comparison 1 - 6 Inference Steps

In [None]:
n_images = 6
prompt = ["A cinematic shot of a baby racoon wearing an intricate italian priest robe"] * n_images
conditioning, pooled = compel(prompt)

# Seed setzen
generator = torch.Generator(device="cuda").manual_seed(2147483647) 

# Initialize images list
images = []

for i in range(n_images):
    generator.manual_seed(2147483647)
    image = pipeline(prompt_embeds=conditioning, 
                     pooled_prompt_embeds=pooled, 
                     generator=generator, 
                     guidance_scale=0.0,
                     num_inference_steps=i+1,
                    ).images[0]
    images.append(image)

grid = image_grid(images, rows=1, cols=n_images)
grid

<a id="conflakes"></a>
## 02. SDXL Turbo - Test Cornflakes

#### Quality Comparison 1 - 4 Inference Steps

In [None]:
n_images = 4
prompt = ["delicous cornflakes box with sweet honey flavour in an awesome packaging design that reads Kelloggs Honey Bites"] * n_images
conditioning, pooled = compel(prompt)

# Seed setzen
generator = torch.Generator(device="cuda").manual_seed(2147483647) 

# Initialize images list
images = []

for i in range(n_images):
    generator.manual_seed(2147483647)
    image = pipeline(prompt_embeds=conditioning, 
                     pooled_prompt_embeds=pooled, 
                     generator=generator, 
                     guidance_scale=0.0,
                     num_inference_steps=i+1,
                    ).images[0]
    images.append(image)

grid = image_grid(images, rows=1, cols=n_images)
grid

In [None]:
prompt = "delicous honey cornflakes box in an awesome packaging design with a cute little bee drawing on it that reads Honey Cornflakes"

image = pipeline(prompt=prompt, 
                 guidance_scale=0.0, 
                 num_inference_steps=4
                ).images[0]
image

<a id="img2img"></a>
# 3. SDXL Turbo - Image-to-Image

For image-to-image generation, make sure that num_inference_steps * strength is larger or equal to 1. The image-to-image pipeline will run for int(num_inference_steps * strength) steps, e.g. 0.5 * 2.0 = 1 step

In [None]:
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image, make_image_grid

In [None]:
# use from_pipe to avoid consuming additional memory when loading a checkpoint
#pipeline = AutoPipelineForImage2Image.from_pretrained("stabilityai/sdxl-turbo", cache_dir="/cluster/user/ehoemmen/.cache")
pipeline = AutoPipelineForImage2Image.from_pretrained("stabilityai/sdxl-turbo", cache_dir="/cluster/user/ehoemmen/.cache")

#.to("cuda")
pipeline.enable_sequential_cpu_offload()

In [None]:
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")
init_image = init_image.resize((512, 512))

prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"

image = pipeline(prompt=prompt,
                 image=init_image, 
                 strength=0.5, 
                 guidance_scale=0.0, 
                 num_inference_steps=4
                ).images[0]
make_image_grid([init_image, image], rows=1, cols=2)

In [None]:
url = '../5.0_pictures/kelloggs_resized.jpg'
init_image = load_image(url)

prompt = "Honey Flavoured Cornflakes, Yellow Packaging design, cute bees, food photography, mockup"

image = pipeline(prompt=prompt,
                 image=init_image, 
                 strength=0.7, 
                 guidance_scale=0.0, 
                 num_inference_steps=4
                ).images[0]

make_image_grid([init_image, image], rows=1, cols=2)

<a id="keyfindings"></a>

## Key Findings

Normal text-to-image generation works very well. In practice (e.g. for an ideas workshop), it would be preferable to the normal SDXL pipeline, as it is very fast and the images have the same level of quality. But it is more suitable for generating and finding ideas. When it comes to **concrete ideas**, **SDXL-Turbo is not suitable**. In addition to the standard text-to-image pipeline, there is only the image-to-image pipeline, which means that no individual design elements can be changed.