<a href="https://colab.research.google.com/github/bachaudhry/FastAI-22-23/blob/main/course_part_2/01_Introduction_to_Generative_Modeling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# A Hands on Intro to Generative Modeling Using HF Diffusers.

_Github's renderer tends to break with these output heavy notebooks. So, the versions saved here will have all outputs cleared._

_In case I decide to retain outputs, then visit the [NB Viewer](https://nbviewer.org/github/bachaudhry/FastAI-22-23/blob/main/course_part_2/01_Introduction_to_Generative_Modeling.ipynb) link for the notebook.

In [None]:
!pip install -Uq diffusers transformers fastcore

In [None]:
import logging
from pathlib import Path

import matplotlib.pyplot as plt
import torch
from diffusers import StableDiffusionPipeline
from fastcore.all import concat
from huggingface_hub import notebook_login
from PIL import Image

logging.disable(logging.WARNING)

torch.manual_seed(44)
if not (Path.home()/'.cache/huggingface' / 'token').exists(): notebook_login()

## Setting Up the Stable Diffusion Pipeline

In [None]:
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4",
                                               variant="fp16",
                                               torch_dtype=torch.float16).to("cuda")

In [None]:
# Checking location of the model weights
!ls ~/.cache/huggingface/hub

In [None]:
# In case the GPU has insufficient memory
# pipe.enable_attention_slicing()

In [None]:
# Testing first prompt
prompt = "a poster of a samurai on a racing motorbike"

In [None]:
pipe(prompt).images[0]

In [None]:
# Using different seed values
torch.manual_seed(8161)
pipe(prompt).images[0]

As diffusion models generate images from random noise after a series of steps, we can play around with the number of steps to see the effects on the model's outputs.

In [None]:
# Taking the manual  seed setting from the last cell
torch.manual_seed(8161)
pipe(prompt, num_inference_steps=3).images[0]

In [None]:
# Increase the number of steps to 10
torch.manual_seed(8161)
pipe(prompt, num_inference_steps=10).images[0]

In [None]:
# Increase the number of steps to 16
torch.manual_seed(8161)
pipe(prompt, num_inference_steps=16).images[0]

In [None]:
# Let's take it up to 40
torch.manual_seed(8161)
pipe(prompt, num_inference_steps=40).images[0]

In [None]:
# Cranking to 100
torch.manual_seed(8161)
pipe(prompt, num_inference_steps=100).images[0]

## Classifier Free Guidance

This method is used to increase adherence of the outputs to the conditioning signal used in the prompts.

Larger guidance settings increase adherence at the expense of diversity. The default setting is `7.5`

In [None]:
def image_grid(imgs, rows, cols):
  w, h = imgs[0].size
  grid = Image.new('RGB', size=(cols * w, rows * h))
  for i, img in enumerate(imgs):
    grid.paste(img, box=(i % cols * w, i // cols * h))
  return grid

In [None]:
# Testing guidance parameter settings
num_rows, num_cols = 4, 4
prompts = [prompt] * num_cols

In [None]:
images = concat(pipe(prompts, guidance_scale=g).images for g in [1.1, 4, 10, 20])

In [None]:
image_grid(images, rows=num_rows, cols=num_cols)

## Negative Prompts

In [None]:
torch.manual_seed(4353)
prompt = "Early morning in the Himalayas"
pipe(prompt).images[0]

In [None]:
torch.manual_seed(4353)
pipe(prompt, negative_prompt="orange").images[0]

In [None]:
torch.manual_seed(4353)
pipe(prompt, negative_prompt="orange", guidance_scale=15).images[0]

## Image to Image

In [None]:
from diffusers import StableDiffusionImg2ImgPipeline
from fastdownload import FastDownload

In [None]:
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    variant="fp16",
    torch_dtype=torch.float16,
).to("cuda")

In [None]:
# Using the lesson example
p = FastDownload().download('https://cdn-uploads.huggingface.co/production/uploads/1664665907257-noauth.png')
init_image = Image.open(p).convert("RGB")
init_image

In [None]:
torch.manual_seed(1776)
prompt = "Dragon roaring at the moon, photorealistic 4K"
images = pipe(prompt=prompt, num_images_per_prompt=3,
              image=init_image, strength=0.8, num_inference_steps=50).images
image_grid(images, rows=1, cols=3)

In [None]:
# Selecting a generated image to seed the series of prompts
init_image=images[2]

torch.manual_seed(776)
prompt = "Comic book art of dragon roaring at the moon."
images = pipe(prompt=prompt, num_images_per_prompt=3,
              image=init_image, strength=1, num_inference_steps=100).images
image_grid(images, rows=1, cols=3)

## Fine Tuning

**Refer to Blog**

## Textual Inversion

Using this technique, we can "teach" a new word to the text model and train its embeddings accordingly.

The token vocabulary is updated, while the model weights are frozen - apart from the text encoder - and the generator is trained using a sample of representative images.

In [None]:
# Using the embeddings from the above link
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4",
                                               variant="fp16",
                                               torch_dtype=torch.float16)
pipe = pipe.to("cuda")

In [None]:
embeds_url = "https://huggingface.co/sd-concepts-library/tim-sale/blob/main/learned_embeds.bin"
embeds_path = FastDownload().download(embeds_url)

In [None]:
embeds_dict = torch.load(str(embeds_path), map_location="cpu")