# Stable Diffusion 🎨

Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/), and [LAION](https://laion.ai/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.

## Getting Started

In [None]:
%pip install diffusers==0.2.4
%pip install transformers scipy ftfy
%pip install "ipywidgets>=7,<8"
%pip install huggingface_hub

In [None]:
# Store Hugging Face token
from huggingface_hub import HfFolder
from os import environ
HfFolder.save_token(environ["HUGGING_FACE_TOKEN"])

In [None]:
import torch
from diffusers import StableDiffusionPipeline

# Download the stable diffusion model from Hugging Face
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=True)
pipe = pipe.to("cuda")

In [None]:
from torch import autocast

prompt = "a photograph of an astronaut riding a horse"
with autocast("cuda"):
  image = pipe(prompt)["sample"][0]

image

In [None]:
from PIL import Image

num_images = 3
my_prompts = [prompt] * num_images

with autocast("cuda"):
  images = pipe(my_prompts)["sample"]

def image_grid(imgs, rows, cols):
    assert len(imgs) == rows*cols

    w, h = imgs[0].size
    grid = Image.new('RGB', size=(cols*w, rows*h))
    grid_w, grid_h = grid.size
    
    for i, img in enumerate(imgs):
        grid.paste(img, box=(i%cols*w, i//cols*h))
    return grid

grid = image_grid(images, rows=1, cols=3)
grid