### Install Required Packages

In [1]:
!pip install daam==0.0.11
!pip install accelerate  # this is to reduce CPU model load overhead

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting daam==0.0.11
  Downloading daam-0.0.11.tar.gz (21 kB)
Collecting diffusers==0.9.0
  Downloading diffusers-0.9.0-py3-none-any.whl (453 kB)
[K     |████████████████████████████████| 453 kB 10.4 MB/s 
Collecting gradio
  Downloading gradio-3.14.0-py3-none-any.whl (13.8 MB)
[K     |████████████████████████████████| 13.8 MB 52.2 MB/s 
[?25hCollecting ftfy
  Downloading ftfy-6.1.1-py3-none-any.whl (53 kB)
[K     |████████████████████████████████| 53 kB 843 kB/s 
[?25hCollecting transformers==4.24.0
  Downloading transformers-4.24.0-py3-none-any.whl (5.5 MB)
[K     |████████████████████████████████| 5.5 MB 21.1 MB/s 
Collecting huggingface-hub>=0.10.0
  Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
[K     |████████████████████████████████| 182 kB 63.0 MB/s 
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_6

We would be running `Stable Diffusion 2` so enable `GPU` under `View Resources > Change runtime type`

In [3]:
!nvidia-smi -L

GPU 0: Tesla T4 (UUID: GPU-b28d3e60-eccf-9be6-edb5-54466c054ca9)


### Load Necessary Libraries

We will load the necessary libraries required for generating DAAM outputs for input prompts.

In [11]:
import os

from matplotlib import pyplot as plt
import numpy as np

from diffusers import StableDiffusionPipeline
import daam
import torch

### Load Data

The below list is a placeholder for any list of prompts, we will be replacing it with a list of prompts from `MS-COCO` text annotations later.

In [5]:
prompts = [
  "A group of people stand in the back of a truck filled with cotton.",
  "A mother and three children collecting garbage from a blue and white garbage can on the street.",
  "A woman is sitting in a chair reading a book with her head resting on her free hand.",
  "A brown and white dog exiting a yellow and blue ramp in a grassy area.",
  "A boy stands on a rocky mountain."
  ]

### Setting up the Pipeline

I will set up the pipeline for generation of the heatmaps for each generated image for each prompt in the `prompts` list. We will be generating $20$ images per prompt.

Below I summarising the storage scheme, I adopted:
- For $i^{th}$ prompt in `prompts` list create a folder named `i`.
- For $i^{th}$ prompt in the `prompts` list generate $20$ images from a diffusion model (`stabilityai/stable-diffusion-2-base`) and store each of these images in subfolders under `i`. So, if the generated images are named `a`, `b`, .., `t` then we will store them in folders in the following way: `i\1\a.png`, .., `i\20\t.png`.
- Each of these $20$ subfolders under `i` will also store the heatmaps for each image, where the number of heatmaps is equal to the number of tokens in that particular prompt that generated these images and these heatmaps will be named as per the index of the word in the prompt that generated it.

Note: I save heatmaps are `.npy` numpy arrays.

Now, let's load the `stabilityai/stable-diffusion-2-base` diffusion model.

In [8]:
model = StableDiffusionPipeline.from_pretrained('stabilityai/stable-diffusion-2-base')
model = model.to('cuda')

Downloading:   0%|          | 0.00/511 [00:00<?, ?B/s]

Fetching 12 files:   0%|          | 0/12 [00:00<?, ?it/s]

Downloading:   0%|          | 0.00/308 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/738 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/525k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/460 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/929 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.01k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/3.46G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/716 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/335M [00:00<?, ?B/s]

Let's generate Global Word Attribution HeatMaps.

In [36]:
# The folder that will contain the generated data
os.mkdir('Data-Generated')

# Iterating overs the prompts
for i, prompt in enumerate(prompts):

  # Creating a folder for each prompt
  os.mkdir(f'Data-Generated/{i}')

  for j in range(20):
    
    # Creating 20 subfolders for each generated image
    os.mkdir(f'Data-Generated/{i}/{j}')

    # Generating images and storing their trace for daam output
    with daam.trace(model) as trc:
      output_image = model(prompt).images[0]
      global_heat_map = trc.compute_global_heat_map()

    # Saving a generated image
    output_image.save(f'Data-Generated/{i}/{j}/{chr(97+j)}.png')

    # Generate Global Word Attribution HeatMap
    for k, word in enumerate(prompt.split()):
      word_heatmap = global_heat_map.compute_word_heat_map(word).expand_as(output_image).numpy()

      # Saving generated heatmaps
      np.save(f'Data-Generated/{i}/{j}/{k}.npy', word_heatmap)

  0%|          | 0/51 [00:00<?, ?it/s]

  0%|          | 0/51 [00:00<?, ?it/s]

  0%|          | 0/51 [00:00<?, ?it/s]

  0%|          | 0/51 [00:00<?, ?it/s]

KeyboardInterrupt: ignored