# Image Generation

This notebook includes several code snippets used to generate and display images using Stable Diffusion v1.5.

## Import Libraries

In [None]:
import torch
from datasets import load_dataset
from diffusers import DiffusionPipeline
import transformers
import os
import random

## Load Diffusion Model and Dataset

In [None]:
pokemon_captions = load_dataset('lambdalabs/pokemon-blip-captions')
dataset = pokemon_captions['train']

Throughout this notebook, we follow the guidance found in the [tutorial](https://huggingface.co/docs/diffusers/using-diffusers/conditional_image_generation) on HuggingFace.

In [None]:
HF_TOKEN = 'REDACTED'
generator = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", use_auth_token='HF_TOKEN')
generator.cuda()

## Creating Training Dataset

We randomly select 80 images from the Pokemon captions dataset and run the captions through the diffusion model, saving them to a folder.

We perform this generation in groups of 20 (dividing it up amongst our 4 group members).

We set the random seed each time to ensure reproducibility.

In [None]:
FOLDER = 'Generated Pokemon/'

In [None]:
def gen_if_not_present(i: int):
    if os.path.isfile(FOLDER + f"GenPokemon{i}.png"):
        return
    text = dataset[i]['text']
    transformers.set_seed(27)
    image = generator(text).images[0]
    image.save(FOLDER + f"GenPokemon{i}.png")

In [None]:
def gen_if_not_present(i: int):
    if os.path.isfile(FOLDER + f"GenPokemon{i}.png"):
        return
    text = dataset[i]['text']
    transformers.set_seed(27)
    image = generator(text).images[0]
    image.save(FOLDER + f"GenPokemon{i}.png")

In [None]:
# generate first group IDs
group1 = []
while len(group1) < 20:
    i = random.randrange(len(dataset))
    if i not in group1:
        gen_if_not_present(i)
        group1.append(i)
print(group1)

In [None]:
# generate second group IDs
group2 = []
while len(group2) < 20:
    i = random.randrange(len(dataset))
    if i not in group1 and i not in group2:
        gen_if_not_present(i)
        group2.append(i)
print(group2)

In [None]:
# generate third group IDs
group3 = []
while len(group3) < 20:
    i = random.randrange(len(dataset))
    if i not in group1 and i not in group2 and i not in group3:
        gen_if_not_present(i)
        group3.append(i)
print(group3)

In [None]:
# generate fourth group IDs
group4 = []
while len(group4) < 20:
    i = random.randrange(len(dataset))
    if i not in group1 and i not in group2 and i not in group3 and i not in group4:
        gen_if_not_present(i)
        group4.append(i)
print(group4)

## Fetch and Display Image

The following cell is self-contained so that it can be run by other group members without running the full notebook (as long as they have the images folder).

In [None]:
# install HuggingFace datasets library
from datasets import load_dataset
import matplotlib.pyplot as plt
from PIL import Image

pokemon_captions_dataset = load_dataset('lambdalabs/pokemon-blip-captions')
dataset = pokemon_captions_dataset['train']
INDEX = 105

torch.cuda.empty_cache()
transformers.set_seed(27)
gen_image = generator(dataset[INDEX]['text']).images[0]

print(dataset[INDEX]['text'])
f, axarr = plt.subplots(1, 2)
axarr[1].imshow(Image.open(f"Generated Pokemon/GenPokemon{INDEX}.png"))
axarr[1].imshow(gen_image)
plt.show()

## Generate with Feedback

The following set of cells are also self-contained.

The feedback should be subsituted with the desired LLM/VLM output.

In [None]:
# install HuggingFace datasets library
from datasets import load_dataset
import matplotlib.pyplot as plt
from PIL import Image
import transformers

pokemon_captions_dataset = load_dataset('lambdalabs/pokemon-blip-captions')
dataset = pokemon_captions_dataset['train']

# install PyTorch and HuggingFace diffusers library
import torch
from diffusers import DiffusionPipeline
HF_TOKEN = 'REDACTED' # make a token on Huggingface
generator = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", use_auth_token=HF_TOKEN)
generator.to("cuda") # if you can use your GPU

In [None]:
torch.cuda.empty_cache()

# change this to the desired number
INDEX = 27

# create image caption
orig_caption = dataset[INDEX]['text']
feedback = 'The Pokemon is a cartoon lady with pink baggy pants and a big hat with a white background.  It has a white background and is smiling against a white background.'
# new_caption = orig_caption + '. ' + feedback
print(f"Feedback: {feedback}")

# generate image for caption
transformers.set_seed(27)
image = generator(feedback).images[0]
image