# Deploy Stable Diffusion XL on AWS Inferentia2   

In this notebook, we deploy a [Stable Diffusion XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) model using an Inferentia2 instance and optimum-neuron on Amazon SageMaker. [Optimum Neuron](https://huggingface.co/docs/optimum-neuron/en/index) is the interface betweeen the Transfomers library and AWS Purpose Built Accelerators - AWS Trainium and Inferentia.

## Install required libraries

In [None]:
!pip install --upgrade --quiet "sagemaker"

Note: you may need to restart the kernel to use updated packages.

## Configure SageMaker resources

In [None]:
import sagemaker
import boto3

sess = sagemaker.Session()
# Create an Amazon Sagemaker session bucket for uploading data, models and logs
# Amazon Sagemaker will automatically create this bucket if it does not exist
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
    # If a bucket name is not provided, set to default bucket
    sagemaker_session_bucket = sess.default_bucket()
 
try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']
 
sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)
 
print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")
assert sess.boto_region_name in ["us-east-2", "us-east-1"] , "region must be us-east-2 or us-west-2, due to instance availability"

## Store a `model.tar.gz` in a s3 bucket

We have pre-compiled the model, added an `inference` script, created a tar file and stored it in a s3 bucket.

To read more about the detailed process, read [Deploy SDXL on AWS Inferentia2 with Amazon SageMaker](https://www.philschmid.de/inferentia2-stable-diffusion-xl) and the [developer documentation for deploying real time endpoints on sagemaker](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-deploy-models.html#deploy-models-studio)

In [None]:
s3_model_uri = "s3://sagemaker-examples-files-prod-us-east-2/models/neuron-sdxl-reinvent-2024/model.tar.gz"

## Deploy the model

In [None]:
from sagemaker.huggingface.model import HuggingFaceModel
 
# Create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=s3_model_uri,        # path to your model.tar.gz on s3
   role=role,                      # iam role with permissions to create an Endpoint
   transformers_version="4.34.1",  # transformers version used
   pytorch_version="1.13.1",       # pytorch version used
   py_version='py310',             # python version used
   model_server_workers=1,         # number of workers for the model server
)
huggingface_model._is_compiled_model = True

In [None]:
# Deploy the endpoint
predictor = huggingface_model.deploy(
    endpoint_name="Stable-Diffusion-XL",
    initial_instance_count=1,      # number of instances
    instance_type="ml.inf2.xlarge", # AWS Inferentia Instance
    volume_size = 128
)


### The above step takes about 10-15 minutes.

While you wait for the model to be deployed, you can read the below resources - 
- [AWS Neuron](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/index.html)
- [AWS Inferentia2](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/arch/neuron-hardware/inf2-arch.html)
- [Amazon SageMaker Real Time Inference](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints.html)
- [Amazon SageMaker with HuggingFace Optimum Neuron](https://huggingface.co/docs/optimum-neuron/en/guides/sagemaker)

## Invoke the model with a sample prompt

In [None]:
from PIL import Image
from io import BytesIO
from IPython.display import display
import base64
 
# Helper decoder
def decode_base64_image(image_string):
  base64_image = base64.b64decode(image_string)
  buffer = BytesIO(base64_image)
  return Image.open(buffer)
 
# Display PIL images as grid
def display_image(image=None,width=500,height=500):
    img = image.resize((width, height))
    display(img)

In [None]:
prompt = "A dog trying to catch a flying pizza at a street corner, comic book, well lit, night time"
 
# Run prediction
response = predictor.predict(data={
  "inputs": prompt,
  "parameters": {
    "num_inference_steps" : 50,
    "negative_prompt" : "disfigured, ugly, deformed"
    }
  }
)
 
# Decode and display image
display_image(decode_base64_image(response["generated_images"][0]))

If the above request times out, please retry and it should succeed.

<div class="alert alert-block alert-warning"> 

<b>DO NOT DELETE THE ENDPOINT</b>

The endpoints will be used to invoke the models when building our application.
</div>

## Clean up the environment

In [None]:
predictor.delete_model()
predictor.delete_endpoint()