# 03. Fast Image Generation

## 04. Stable Cascade (Würstchen V3)
#### Content

1. [Basic Stable Cascade Pipeline](#cascade)
2. [Campusbier Test](#beer)
3. [Key-Findings](#keyfindings1)

## Description + Links

* is working in a **much smaller latent space** (compared with Stable Diffusion)
    * SD compresses by a factor of 8 (1024x1024 to 128x128), Stable Cascade compresses by a factor of 42 (1024x1024 to 24x24) 
* consists of three models: Stage A, Stage B and Stage C, representing a cascade to generate images
* Stage A & B are used to compress images, similar to what the job of the VAE is in Stable Diffusion
* The Stage C model operates on the small 24 x 24 latents and denoises the latents conditioned on text prompts
  
---

**Blog**

https://stability.ai/news/introducing-stable-cascade

**Documentation**

https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_cascade

https://huggingface.co/stabilityai/stable-cascade

https://github.com/Stability-AI/StableCascade/tree/master/inference

**Paper**

https://openreview.net/forum?id=gU58d5QeGv

**Diffusers Repository - Würstchen V3 Branch**

https://github.com/kashif/diffusers/tree/wuerstchen-v3/tests/pipelines/stable_cascade

**Colab Notebook** (by mkshing)

https://colab.research.google.com/github/mkshing/notebooks/blob/main/stable_cascade.ipynb#scrollTo=nd1QOOSw1OjD

## Setup

In [None]:
%env HF_HOME=/cluster/user/ehoemmen/.cache
%env HF_DATASETS_CACHE=/cluster/user/ehoemmen/.cache
%env TRANSFORMERS_CACHE=/cluster/user/ehoemmen/.cache

In [None]:
# übers terminal

# currently broken
# pip install git+https://github.com/kashif/diffusers.git@wuerstchen-v3

# solution (https://github.com/Stability-AI/StableCascade/issues/58)
# pip install git+https://github.com/kashif/diffusers.git@a3dc21385b7386beb3dab3a9845962ede6765887 --force

In [None]:
import torch
from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline

from PIL import Image
from IPython.display import display

<a id="cascade"></a>

## 01. Basic SD-Cascade Pipeline


In [None]:
prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior",
                                                   torch_dtype=torch.bfloat16,
                                                   cache_dir="/cluster/user/ehoemmen/.cache",        #my cache path
                                                   ignore_mismatched_sizes=True
                                                  )
prior.enable_model_cpu_offload()

decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade",
                                                       torch_dtype=torch.float16,
                                                       cache_dir="/cluster/user/ehoemmen/.cache",     #my cache path
                                                       low_cpu_mem_usage=False,
                                                       ignore_mismatched_sizes=True
                                                      )
decoder.enable_model_cpu_offload()

<a id="beer"></a>

## 02. Campusbier Test

In [None]:
num_images_per_prompt = 2

prompt ="brown beer bottle from the brand 'CAMPUSBIER', green label, realistic photo, professional food mockup, awesome studio lightning"

#prompt ="kids strawberry cornflakes package with the text 'CORNBERRY' a cute elephant on the label, realistic photo, carton, organic package"
#prompt = "pikachu standing near a clif in Norway holding a sign with the text 'I LOVE OSNA', closeup, cinematic"
negative_prompt = ""

prior_output = prior(
    prompt=prompt,
    height=1024,
    width=1024,
    negative_prompt=negative_prompt,
    guidance_scale=7.0,
    num_images_per_prompt=num_images_per_prompt,
    num_inference_steps=20
)
decoder_output = decoder(
    image_embeddings=prior_output.image_embeddings.half(),
    prompt=prompt,
    negative_prompt=negative_prompt,
    guidance_scale=0.0,
    output_type="pil",              #now output is a list with PIL images
    num_inference_steps=10
).images

for image in decoder_output:
    image.show()

#save image to your path    
#decoder_output[1].save("/cluster/user/ehoemmen/Pictures/stable_cascade/... .png")

<a id="keyfindings1"></a>

## 3. Key Findings

General images look very appealing and aesthetic. Text is generated better than with SDXL. Some problems generating the correct packaging type for food packaging (folding box instead of plastic stand-up pouch). Often ran 'out of memory' as the model is very large.

But I haven't tested much with Stable Cascade. 

There is also a pipeline for image-to-image, ControlNET or LoRa. I have not yet tested these either.
