<a href="https://colab.research.google.com/github/AashiDutt/Stable-Diffusion/blob/main/Stable_Diffusion_XL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Stable Diffusion XL examples

Reference: https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_xl

**Turn on GPU to run this notebook

In [9]:
!pip install transformers --quiet
!pip install accelerate --quiet
!pip install safetensors --quiet

Add an invisible watermark to images generating by Stable Diffusion XL, this can help with identifying if an image is machine-synthesised for downstream applications.

In [10]:
!pip install invisible-watermark>=0.2.0

In [11]:
# disable the watermarker as follows
# pipe = StableDiffusionXLPipeline.from_pretrained(..., add_watermarker=False)

# Text to Image

In [12]:
!pip install diffusers --quiet

In [14]:
from diffusers import StableDiffusionXLPipeline
import torch
from diffusers import StableDiffusionXLImg2ImgPipeline
from diffusers.utils import load_image

In [6]:
pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

In [8]:
prompt = "sheldon cooper playing bongos"
#"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt).images[0].save("./image.jpg")

  0%|          | 0/50 [00:00<?, ?it/s]

# Image to Image

In [15]:
pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe = pipe.to("cuda")


Downloading (…)ain/model_index.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Fetching 13 files:   0%|          | 0/13 [00:00<?, ?it/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/460 [00:00<?, ?B/s]

Downloading (…)cheduler_config.json:   0%|          | 0.00/479 [00:00<?, ?B/s]

Downloading (…)kenizer_2/merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

Downloading (…)ncoder_2/config.json:   0%|          | 0.00/575 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/725 [00:00<?, ?B/s]

Downloading (…)ef86/vae/config.json:   0%|          | 0.00/642 [00:00<?, ?B/s]

Downloading (…)kenizer_2/vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

Downloading (…)f86/unet/config.json:   0%|          | 0.00/1.71k [00:00<?, ?B/s]

Downloading model.fp16.safetensors:   0%|          | 0.00/1.39G [00:00<?, ?B/s]

Downloading (…)del.fp16.safetensors:   0%|          | 0.00/4.52G [00:00<?, ?B/s]

Downloading (…)del.fp16.safetensors:   0%|          | 0.00/167M [00:00<?, ?B/s]

Downloading (…)del.fp16.safetensors:   0%|          | 0.00/167M [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

In [16]:
url = "https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/aa_xl/000000009.png"

init_image = load_image(url).convert("RGB")


In [20]:
import os
from PIL import Image

def load_image_locally(filename):
  """Loads an image from a local file.

  Args:
    filename: The path to the image file.

  Returns:
    A PIL Image object.
  """

  image = Image.open('/content/image.jpg')
  image = image.convert("RGB")
  return image

init_image = load_image_locally("/content/image.jpg")

In [22]:
prompt = "Sheldon plays bongos while leonard sleeps"
image = pipe(prompt, image=init_image).images[0].save("./image2.jpg")

  0%|          | 0/15 [00:00<?, ?it/s]

# Using Inpainting

In [23]:
import torch
from diffusers import StableDiffusionXLInpaintPipeline
from diffusers.utils import load_image

pipe = StableDiffusionXLInpaintPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

  0%|          | 0/40 [00:00<?, ?it/s]

In [24]:
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

init_image = load_image(img_url).convert("RGB")
mask_image = load_image(mask_url).convert("RGB")

In [25]:
prompt = "A majestic tiger sitting on a bench"
image = pipe(prompt=prompt, image=init_image, mask_image=mask_image, num_inference_steps=50, strength=0.80).images[0].save("./image3.jpg")

  0%|          | 0/40 [00:00<?, ?it/s]