## Stable Diffusion on PVC with IPEX

This is a demo of [Stable Diffusion with the Hugging Face API](https://huggingface.co/stabilityai), and using the [Intel Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch) (IPEX) to optimize the model pipeline on Intel® Data Center GPU Max Series.

**Intro to Hardware: Intel® Data Center GPU Max Series**

- Up to 408MB of L2 Cache
- Built-in Ray Tracing Acceleration
- Intel® Xe Matrix Extensions (XMX)




In [None]:
! clinfo -l

The demo consists of the following steps:

1. Load and define the core SD model components from HF.
2. Set up and run a standard SD pipeline with the HF API, i.e., generate a FP32 precision image.
3. Optimize SD with IPEX-XPU, run the SD on Intel® Data Center GPU Max 1100.
4. Compare the results wrt inference latency time.


#### Select the proper kernel  

Choose ipex_xpu kernel to run this demo.


If you can't find the ipex_xpu kernel, please go back to terminal and type ```source prepare_env.sh``` to prepare the environments for the workshop.

In [2]:
import torch

from diffusers import AutoencoderKL, DDPMScheduler, PNDMScheduler, DPMSolverMultistepScheduler, EulerDiscreteScheduler, StableDiffusionPipeline, UNet2DConditionModel
from diffusers.optimization import get_scheduler
from diffusers.pipelines.stable_diffusion import StableDiffusionSafetyChecker

from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer

import matplotlib.pyplot as plt

import time

# Define model ID for SD version
model_id = "stabilityai/stable-diffusion-2-1-base"

pipe = StableDiffusionPipeline.from_pretrained(model_id)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

  from .autonotebook import tqdm as notebook_tqdm


Next, we construct the SD pipeline with the HF API. For different performances, experiment, e.g., with the scheduler, and its parameters.

**Single image inference**

Then, we call the pipeline with a written description of the wanted image, i.e., the text prompt. And generate an image.

The inference process can be made deterministic by setting the generator seed. Furthermore, through the number of inference steps, we can govern the quality of the image, i.e., more steps equals better quality. Reduce the number of steps to receive results faster.

Please experiment with your own prompts!

In [7]:
# Define the prompt for the image generation
prompt = "Painting of a frog with hat on a bicycle cycling in New York City at a beautiful dusk with a traffic jam and moody people in the style of Picasso"

# Set the number of iterations for the image generation
n_inf_steps = 20

# # Setting seed for deterministic output
# seed = 701
# generator = torch.Generator("cpu").manual_seed(seed)

# Simple timing of inference
start = time.time()
# image = pipeline(prompt, num_inference_steps=n_inf_steps, generator=generator).images[0]
image = pipe(prompt, num_inference_steps=n_inf_steps).images[0]
end = time.time()
sd_fp32_t = end-start
print(f"Generating one FP32 image took {round(sd_fp32_t, 2)}s")

image.save("frog_test_FP32.png")

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:22<00:00,  1.15s/it]


Generating one FP32 image took 24.17s


In [9]:
import intel_extension_for_pytorch as ipex


pipe = pipe.to('xpu')

# Put model in eval mode.
pipe.unet.eval()

# unet to device=xpu
pipe.unet = pipe.unet.to('xpu')

# Optimize the model w/ IPEX
pipe.unet = ipex.optimize(pipe.unet)


# Simple timing of inference
start = time.time()
with torch.cpu.amp.autocast():
    image = pipe(prompt, num_inference_steps=n_inf_steps).images[0]
end = time.time()
sd_xpu_t = end-start
print(f"Generating one image on xpu took {round(sd_xpu_t, 2)}s")

image.save("frog_test_xpu.png")




# Simple timing of inference on xpu
start = time.time()
with torch.xpu.amp.autocast():
    image = pipe(prompt, num_inference_steps=n_inf_steps, generator=generator).images[0]
end = time.time()
sd_bf16_t = end-start
print(f"Generating one BF16 image took {round(sd_bf16_t, 2)}s")

image.save("frog_test_BF16.png")