# Text-to-Image library (Stable Diffusion)

Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text prompt as input. The model was created by the researchers and engineers from CompVis, Stability AI, runway, and LAION.

For more information about how Stable Diffusion functions, please go through [ðŸ¤—'s Stable Diffusion with ðŸ§¨Diffusers blog](https://huggingface.co/blog/stable_diffusion).


- check if you can access GPU from docker JupyterLab container

In [None]:
!nvidia-smi

### 1. Import required packages

- [ðŸ¤— ðŸ§¨Diffusers](https://huggingface.co/docs/diffusers/index) provides pretrained vision and audio diffusion models and a great toolset for inference and training

- `StableDiffusionPipeline` is an end-to-end inference pipeline that you can use to generate images from text with just a few lines of code.

In [None]:
import os
import torch
from diffusers import StableDiffusionPipeline
from dotenv import load_dotenv
load_dotenv()

import warnings
warnings.filterwarnings('ignore')

### 2. Setup StableDiffusion Pipeline

- There are different checkpoints / models available for text2img inferencing from HuggingFace. In this notebook, we would be using ``CompVis/stable-diffusion-v1-4``. The other variations of the model can be found [here](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/text2img).

In [None]:
model_id = "CompVis/stable-diffusion-v1-4"

- To generate User access tokens, please go through the link [here](https://huggingface.co/docs/hub/security-tokens) and source it as a ``AUTH_TOKEN`` environment variable while building the docker container

In [None]:
token = os.environ.get('AUTH_TOKEN')

- Since we are limited by the GPU RAM, StableDiffusionPipeline is loaded in ``float16`` precision by loading the weights from ``fp16`` branch

In [None]:
pipe = StableDiffusionPipeline.from_pretrained(
    model_id,
    revision="fp16",
    torch_dtype=torch.float16,
    use_auth_token = token
)

- Moving the diffusion pipeline to GPU

In [None]:
pipe = pipe.to("cuda")

### 3. Inferencing

In [None]:
prompt = "a photograph of an astronaut riding a horse"
image = pipe(prompt).images[0]
image

In [None]:
result = pipe(prompt)
print(result)

- The above model would return different output for the same prompt. To get the same image output, we use a generator with a set manual seed

In [None]:
generator = torch.Generator(device="cuda").manual_seed(1)

In [None]:
image = pipe(
    prompt, 
    generator=generator
).images[0]
image

- ``num_inference_steps`` can be changed

In [None]:
image = pipe(
    prompt, 
    num_inference_steps=15, 
    generator=generator
).images[0]
image

- guidance scale:
It is a way to increase the adherence to the conditional signal which in this case is text as well as overall sample quality. 

In [None]:
image = pipe(
    prompt, 
    guidance_scale=7.5, 
    generator=generator
).images[0]
image

- creating grid of images

In [None]:
from PIL import Image


def image_grid(imgs, rows, cols):
    assert len(imgs) == rows*cols

    w, h = imgs[0].size
    grid = Image.new('RGB', size=(cols*w, rows*h))
    grid_w, grid_h = grid.size
    
    for i, img in enumerate(imgs):
        grid.paste(img, box=(i%cols*w, i//cols*h))
    return grid

In [None]:
num_images = 3
prompt = ["a photograph of an astronaut riding a horse"] * num_images

images = pipe(prompt).images

grid = image_grid(images, rows=1, cols=3)
grid