<a href="https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JoaoLages/diffusers-interpret/blob/main/notebooks/stable_diffusion_example_colab.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Stable Diffusion 🎨 

This notebook shows an example of how to run `diffusers_interpret.StableDiffusionPipelineExplainer` to explain `diffusers.StableDiffusionPipeline`.

Before going through it, it is recommended to have a look at [🤗 HuggingFace's notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb#scrollTo=-xMJ6LaET6dT).

### 0 - Login in HuggingFace's Hub

In [1]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

### 1 - Initialize StableDiffusionPipeline normally

In [2]:
# make sure you're logged in by running the previous cell or `huggingface-cli login`
import torch
from contextlib import nullcontext
import requests
from PIL import Image
from io import BytesIO

from diffusers import StableDiffusionImg2ImgPipeline


device = 'cuda' if torch.cuda.is_available() else 'cpu'

pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4", 
    use_auth_token=True,
    
    # FP16 is not working for 'cpu'
    revision='fp16' if device != 'cpu' else None,
    torch_dtype=torch.float16 if device != 'cpu' else None
).to(device)
pipe.enable_attention_slicing() # comment this line if you wish to deactivate this option

{'trained_betas'} was not found in config. Values will be initialized to default values.


### 2 - Pass StableDiffusionPipeline to StableDiffusionPipelineExplainer

In [3]:
from diffusers_interpret import StableDiffusionImg2ImgPipelineExplainer

explainer = StableDiffusionImg2ImgPipelineExplainer(
    pipe,
    
    # We pass `True` in here to be able to have a higher `n_last_diffusion_steps_to_consider_for_attributions` in the cell below
    gradient_checkpointing=True 
)

### 3 - Generate an image with the StableDiffusionPipelineExplainer object

Note that the `explainer()` method accepts all the arguments that `pipe()` accepts. 

We also pass a `generator` argument so that we get a deterministic output.

In [4]:
prompt = "A fantasy landscape, trending on artstation"

# let's download an initial image
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"

response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((384, 256))

with torch.autocast('cuda') if device == 'cuda' else nullcontext():
    output = explainer(
        prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5,
        #n_last_diffusion_steps_to_consider_for_attributions=5
    )

  0%|          | 0/38 [00:00<?, ?it/s]

Calculating token attributions... Done!
Calculating image pixel attributions... > [0;32m/home/ubuntu/diffusers-interpret/src/diffusers_interpret/explainer.py[0m(380)[0;36m_get_attributions[0;34m()[0m
[0;32m    379 [0;31m        [0;32mimport[0m [0mipdb[0m[0;34m;[0m [0mipdb[0m[0;34m.[0m[0mset_trace[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m--> 380 [0;31m        pixel_attributions = gradient_x_inputs_attribution(
[0m[0;32m    381 [0;31m            [0mpred_logits[0m[0;34m=[0m[0moutput[0m[0;34m.[0m[0mimage[0m[0;34m,[0m [0minput_embeds[0m[0;34m=[0m[0minit_image[0m[0;34m,[0m[0;34m[0m[0;34m[0m[0m
[0m
ipdb> c


RuntimeError: RuntimeError: Attempt to retrieve a tensor saved by autograd multiple times without checkpoint recomputation being triggered in between, this is not currently supported. Please open an issue with details on your use case so that we can prioritize adding this.

At:
  /home/ubuntu/diffusers-interpret/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py(374): unpack


#### 3.1 - Check final generated image

In [None]:
# Final image
output.image

#### 3.2 -  Check all the generated images during the diffusion process

In [None]:
output.all_images_during_generation.show(width="100%", height="400px")

You can also check the images individually:

In [None]:
# Image at first generation 
output.all_images_during_generation[0]

In [None]:
# Image at 33rd generation 
output.all_images_during_generation[33]

#### 3.3 - Check the normalized and unnormalized token attributions 

We are now able to see what were the importances of each token in the input text to generate when generating the image.

The token `corgi` was the most important feature according to our explainability method.

In [None]:
# (token, attribution)
output.token_attributions

In [None]:
# (token, attribution_percentage)
output.normalized_token_attributions

### 4 - Get explanations for a specific part of the image

`diffusers-interpret` also computes the tokens importances for generating a particular part of the output image.

In the current implementation, we only need to re-run the `explainer` and pass it the `explanation_2d_bounding_box` argument with the bounding box we are interested in seeing.

In [None]:
prompt = "A cute corgi with the Eiffel Tower in the background"

generator = torch.Generator(device).manual_seed(2023)
with torch.autocast('cuda') if device == 'cuda' else nullcontext():
    output = explainer(
        prompt, 
        num_inference_steps=50, 
        generator=generator,
        height=448,
        width=448,
        
        # for this model, the GPU VRAM usage will raise drastically if we increase this argument. feel free to experiment with it
        # if you are not interested in checking the token attributions, you can pass 0 in here
        n_last_diffusion_steps_to_consider_for_attributions=5,
        
        explanation_2d_bounding_box=((305, 180), (448, 448)), # (upper left corner, bottom right corner)
    )


#### 4.1 - Check generated image

A red bounding box is now visible in the picture, to indicate the area that `explainer` is looking at when calculating the token attributions.

In [None]:
output.image

#### 4.2 - Check token attributions for bounding box

In [None]:
# (token, attribution_percentage)
output.normalized_token_attributions

### 5 - Same generation, but with a different `explanation_2d_bounding_box`

In [None]:
prompt = "A cute corgi with the Eiffel Tower in the background"

generator = torch.Generator(device).manual_seed(2023)
with torch.autocast('cuda') if device == 'cuda' else nullcontext():
    output = explainer(
        prompt, 
        num_inference_steps=50, 
        generator=generator,
        height=448,
        width=448,
        
        # for this model, the GPU VRAM usage will raise drastically if we increase this argument. feel free to experiment with it
        # if you are not interested in checking the token attributions, you can pass 0 in here
        n_last_diffusion_steps_to_consider_for_attributions=5,
        
        explanation_2d_bounding_box=((140, 0), (270, 190)), # (upper left corner, bottom right corner)
    )

### 5 - Same generation, but with a different `explanation_2d_bounding_box`

In [None]:
prompt = "A cute corgi with the Eiffel Tower in the background"

generator = torch.Generator(device).manual_seed(2023)
with torch.autocast('cuda') if device == 'cuda' else nullcontext():
    output = explainer(
        prompt, 
        num_inference_steps=50, 
        generator=generator,
        height=448,
        width=448,
        
        # for this model, the GPU VRAM usage will raise drastically if we increase this argument. feel free to experiment with it
        # if you are not interested in checking the token attributions, you can pass 0 in here
        n_last_diffusion_steps_to_consider_for_attributions=5,
        
        explanation_2d_bounding_box=((140, 0), (270, 190)), # (upper left corner, bottom right corner)
    )

In [None]:
output.image

In [None]:
# (token, attribution_percentage)
output.normalized_token_attributions