<font size="5"> **Stable Diffusion Inference for Text2Image on Intel Sapphire Rapids** </font> 
<br>
This code sample will perform stable diffusion inference based on the text prompt using KerasCV implementation while using Intel® Extension for Tensorflow*. The following run cases are executed:<br>
* FP32 (baseline) <br>
* Advanced AMP for BF16 precision <br>

<font size="5">**Environment Setup**</font>  <br>
Ensure the **itex_cpu kernel** is activated before running this notebook.

In [None]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
os.environ['ITEX_CPP_MIN_LOG_LEVEL'] = '2'
os.environ['OMP_NUM_THREADS'] = '8'

import time
from keras_cv.models.stable_diffusion import StableDiffusion
from tensorflow import keras
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np

In [None]:
import intel_extension_for_tensorflow as itex
itex.get_backend()

<font size ="5">**Helper Functions**</font>

The functions below will help us plot the images.

In [None]:
def plot_images(images):
    png_name = "{}_{}imgs_{}steps.png".format(
        precision, batch_size, num_steps)
    
    print("Start plotting the generated images to %s" % (png_name))
    plt.figure(figsize=(20, 20))
    for i in range(len(images)):
        ax = plt.subplot(1, len(images), i + 1)
        plt.imshow(images[i])
        plt.axis("off")

<font size ="5">**Model Loading**</font> <br>
First, we construct a model and also define few of the required parameters:</font>

In [None]:
iterations = 1
use_xla = False
precision = 'fp32'
batch_size = 1
num_steps = 50
seed= 12345
benchmark_result = []

model = StableDiffusion(
    img_width=512,
    img_height=512,
    jit_compile=use_xla,
)

<font size ="5">**Running Inference** </font> <br>
Next, we give it a prompt:

In [None]:
prompt = "a photo of an astronaut riding a horse on mars"

print("Start Warmup")
model.text_to_image(
    "warming up the model", batch_size=batch_size, num_steps=num_steps
)
# Start inference
print("Start running inference and generating images")
t = 0
for i in range(iterations):
    start_time = time.time()
    images = model.text_to_image(prompt=prompt, batch_size=batch_size, seed=seed, num_steps=num_steps)
    t+=(time.time() - start_time)
print(f"FP32 precision: {(t/iterations):.2f} seconds")
benchmark_result.append(["FP32 precision", t/iterations])
plot_images(images)

<font size="4">**Performance computation using AMP BF16 precision** </font>
<br>
Enable Advanced AMP

In [None]:
import intel_extension_for_tensorflow as itex
print("intel_extension_for_tensorflow {}".format(itex.__version__))

auto_mixed_precision_options = itex.AutoMixedPrecisionOptions()
auto_mixed_precision_options.data_type = itex.BFLOAT16 

graph_options = itex.GraphOptions(auto_mixed_precision_options=auto_mixed_precision_options)
graph_options.auto_mixed_precision = itex.ON

config = itex.ConfigProto(graph_options=graph_options)
itex.set_config(config)

In [None]:
itex.get_config()

In [None]:
model = StableDiffusion(
    img_width=512,
    img_height=512,
    jit_compile=use_xla
)

print("Start Warmup")
model.text_to_image(
    "warming up the model", batch_size=batch_size, num_steps=num_steps
)
# Start inference
print("Start running inference and generating images")
t = 0
for i in range(iterations):
    start_time = time.time()
    images = model.text_to_image(prompt=prompt, batch_size=batch_size, seed=seed, num_steps=num_steps)
    t+=(time.time() - start_time)
    
print(f"AMP BF16 precision: {(t/iterations):.2f} seconds")
benchmark_result.append(["AMP BF16 precision", t/iterations])
plot_images(images)

<font size ="5">**Performance comparison** <br></font>
Lets compare the results wrt inference latency time.

In [None]:
print("{:<20} {:<20}".format("Model", "Runtime"))
for result in benchmark_result:
    name, runtime = result
    print("{:<20} {:<20}".format(name, runtime))

In [None]:
import matplotlib.pyplot as plt

# Create bar chart with training time results
plt.figure(figsize=(4,3))
plt.title("Stable diffusion Inference Time")
plt.ylabel("Inference Time (seconds)")
plt.bar(["FP32", "BF16-AMP"], [benchmark_result[0][1], benchmark_result[1][1]])

In [None]:
print('[CODE_SAMPLE_COMPLETED_SUCCESFULLY]')