# Effect of Guidance Scale

In this notebook, we take a look at how the guidance scale affects the image quality of the model.

Guidance scale is a value inspired by the paper [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598). The explanation of how CFG works is out-of-scope here, but there are many online sources where you can read about it (linked below).

- [Guidance: a cheat for diffusion models](https://sander.ai/2022/05/26/guidance.html)
- [Diffusion Models, DDPMs, DDIMs and CFG](https://betterprogramming.pub/diffusion-models-ddpms-ddims-and-classifier-free-guidance-e07b297b2869)
- [Classifier-Free Guidance Scale](https://mccormickml.com/2023/02/20/classifier-free-guidance-scale/)

In short, guidance scale is a value that controls the amount of "guidance" used in the diffusion process. That is, the higher the value, the more closely the diffusion process follows the prompt. A lower guidance scale allows the model to be more creative, and work slightly different from the exact prompt. After a certain threshold maximum value, the results start to get worse, blurry and noisy.

Guidance scale values, in practice, are usually in the range 6-15, and the default value of 7.5 is used in many inference implementations. However, manipulating it can lead to some very interesting results. It also only makes sense when it is set to 1.0 or higher, which is why many implementations use a minimum value of 1.0.

But... what happens when we set guidance scale to 0? Or negative? Let's find out!

When you use a negative value for the guidance scale, the model will try to generate images that are the opposite of what you specify in the prompt. For example, if you prompt the model to generate an image of an astronaut, and you use a negative guidance scale, the model will try to generate an image of everything but an astronaut. This can be a fun way to generate creative and unexpected images (sometimes NSFW or absolute horrendous stuff, if you are not using a safety-checker model - which is the case with StableFused).

### Install and Import required packages

In [None]:
!pip install stablefused ipython

In [None]:
import numpy as np
import torch

from IPython.display import Video, display
from PIL import Image, ImageDraw, ImageFont
from stablefused import TextToImageDiffusion
from typing import List

### Initialize model and parameters

We use RunwayML's Stable Diffusion 1.5 checkpoint.

In [None]:
model_id = "runwayml/stable-diffusion-v1-5"
model = TextToImageDiffusion(model_id=model_id, torch_dtype=torch.float16)

##### Prompt Credits

The prompts used in this notebook have been taken from different sources. The main inspirations are:

- https://levelup.gitconnected.com/20-stable-diffusion-prompts-to-create-stunning-characters-a63017dc4b74
- https://mpost.io/best-100-stable-diffusion-prompts-the-most-beautiful-ai-text-to-image-prompts/

In [None]:
prompt = [
    "Artistic image, very detailed cute cat, cinematic lighting effect, cute, charming, fantasy art, digital painting, photorealistic",
    "A lion in galaxies, spirals, nebulae, stars, smoke, iridescent, intricate detail, octane render, 8k",
    "A grand city in the year 2100, atmospheric, hyper realistic, 8k, epic composition, cinematic, octane render",
    "Starry Night, painting style of Vincent van Gogh, Oil paint on canvas, Landscape with a starry night sky, dreamy, peaceful",
]
negative_prompt = "cartoon, unrealistic, blur, boring background, deformed, disfigured, low resolution, unattractive"
num_inference_steps = 20
seed = 2023

`image_grid_with_labels` is a helper function that takes a list of images and a list of labels and displays them in a grid.

In [None]:
def image_grid_with_labels(
    images: List[Image.Image], labels: List[str], rows: int, cols: int
) -> Image.Image:
    """Create a grid of images with labels."""
    if len(images) > rows * cols:
        raise ValueError(
            f"Number of images ({len(images)}) exceeds grid size ({rows}x{cols})."
        )
    if len(labels) != rows:
        raise ValueError(
            f"Number of labels ({len(labels)}) does not match the number of rows ({rows})."
        )

    w, h = images[0].size
    label_width = 100

    grid = Image.new("RGB", size=(cols * w + label_width, rows * h))
    draw = ImageDraw.Draw(grid)

    font_size = 32
    font = ImageFont.truetype(
        "/usr/local/lib/python3.10/dist-packages/matplotlib/mpl-data/fonts/ttf/DejaVuSans.ttf",
        size=font_size,
    )

    for i, label in enumerate(labels):
        x_label = label_width // 4
        y_label = i * h + h // 2
        draw.text((x_label, y_label), label, fill=(255, 255, 255), font=font)

    for i, image in enumerate(images):
        x_img = (i % cols) * w + label_width
        y_img = (i // cols) * h
        grid.paste(image, box=(x_img, y_img))

    return grid

### Inference with different guidance scales

We start with a negative guidance scale and increment it by 1.5 until a certain maximum value. The results obtained are very interesting!

The below code demonstrates the effect of guidance scale on different prompts.

In [None]:
epsilon = 1e-5
guidance_scale = -1.5
increment = 1.5
max_guidance_scale = 15 + epsilon
num_iterations = 0
results = []
guidance_scale_labels = []

while guidance_scale <= max_guidance_scale:
    torch.manual_seed(seed)
    np.random.seed(seed)

    print(f"Generating images with guidance_scale={guidance_scale:.2f} and seed={seed}")
    results.append(
        model(
            prompt=prompt,
            negative_prompt=[negative_prompt] * len(prompt),
            num_inference_steps=num_inference_steps,
            guidance_scale=guidance_scale,
        )
    )

    guidance_scale_labels.append(str(round(guidance_scale, 2)))
    guidance_scale += increment
    num_iterations += 1

In [None]:
flattened_results = [image for result in results for image in result]

In [None]:
image_grid_with_labels(
    flattened_results,
    labels=guidance_scale_labels,
    rows=num_iterations,
    cols=len(prompt),
).save("effect-of-guidance-scale-on-different-prompts.png")


The below code demonstrates the effect of guidance scale on the same prompt over multiple inference steps. 

In [None]:
steps = [3, 6, 12, 20, 25]
prompt = "Photorealistic illustration of a mystical alien creature, magnificent, strong, atomic, tyrannic, predator, unforgiving, full-body image"
epsilon = 1e-5
guidance_scale = -1.5
increment = 1.5
max_guidance_scale = 15 + epsilon
num_iterations = 0
results = []
guidance_scale_labels = []
seed = 42

while guidance_scale <= max_guidance_scale:
    step_results = []

    for step in steps:
        torch.manual_seed(seed)
        np.random.seed(seed)

        print(
            f"Generating images with guidance_scale={guidance_scale:.2f}, num_inference_steps={step} and seed={seed}"
        )
        step_results.append(
            model(
                prompt=prompt,
                negative_prompt=negative_prompt,
                num_inference_steps=step,
                guidance_scale=guidance_scale,
            )
        )

    guidance_scale_labels.append(str(round(guidance_scale, 2)))
    guidance_scale += increment
    num_iterations += 1

    results.append(step_results)

In [None]:
flattened_results = [
    image for result in results for images in result for image in images
]

In [None]:
image_grid_with_labels(
    flattened_results, labels=guidance_scale_labels, rows=len(results), cols=len(steps)
).save("effect-of-guidance-scale-vs-steps.png")