
# <span style="color:DarkSeaGreen">SageMaker Lab 2</span>

This lab does the following:

- Provision a HuggingFace model via SageMaker downloading the model
- Use a custom container (inference.py provided)
- Create a tar file using downloaded model and custom container
- Upload tar to S3
- Create a SageMaker endpoint
- Interacts with the model

#### This lab does NOT have any dependencies on Lab 1

# <span style="color:DarkSeaGreen">requirements_lab2.txt</span>
- Most of the requirements just get the latest version
- However, on Nov 19 2025 AWS released SageMaker 3.0.1 SDK, this is compatable up to Python 3.12
  - At time of writing does not yet support Python 3.13+, (latest current version is Python 3.14)
- SageMaker 3.0.1 is forced in the requirements file, otherwise it only gets 3.0.0 which fails due to dependency issues as its sub files are not included, AWS fixed this is 3.0.1 which was released immediately after 3.0.0 :)
- Therefore make sure your Python venv is created using 3.12 only

# <span style="color:DarkSeaGreen">Prepare Your Environment</span>
### Requirements for this Jupyter Notebook Lab if running in VSCode or equivalent local IDE
##### Note these are macOS specific
- Credentials
  - You need credentials to your AWS account to execute this Jupyter Lab if running locally from your laptop
    - Locally: Credentials and therefore permissions asscociated with the IAM user (with CLI access enabled) are provided by AWS configure connection to your AWS account
    - Cloud: Permissions provided via logged in user
- Installers:
  - Pip
    - Python libraries
    - Works inside Python envs
  - homebrew (brew) (mac)
    - System software, tools, and dependencies
    - Works at OS level

- Run the commands of the cell below in a terminal window to create a virtual environment if you need one
  - Note check your Python version first, then if ok, copy the rest and run in terminal window
  - Note if you copy and paste the multiple lines and run as one you will get zsh: command not found: # errors because of the comments, but you can ignore
  - Remember to restart the kernel to pick up the new venv
  - The venv can be deleted via the last cell in this notebook iof no longer needed
- If you already have a virtual environment, then just activate it as shown in the second cell below
  - Venv (can be created below) used by this notebook is *venv-stable-diffuser-lab2*

In [None]:
from sagemaker.core import image_uris, script_uris, model_uris
model_id, model_version = "model-upscaling-stabilityai-stable-diffusion-x4-upscaler-fp16", "*"

# Retrieve the inference docker container uri
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,  # automatically inferred from model_id
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type="ml.g5.4xlarge",
)
# Retrieve the inference script uri
deploy_source_uri = script_uris.retrieve(model_id=model_id, model_version=model_version, script_scope="inference")

base_model_uri = model_uris.retrieve(model_id=model_id, model_version=model_version, model_scope="inference")

# https://sagemaker.readthedocs.io/en/stable/api/inference/model.html
print("Deploy Image URI (image_uri) - ECR container for the docker image:\n", deploy_image_uri)
print("Deploy Source URI (source_dir) - Location for inference.py:\n", deploy_source_uri)
print("Base Model URI (model_data) - Location of SageMaker model data:\n", base_model_uri)   

# if you want to see the example inference.py code it uses, you can get it, for example in a terminal window:
# aws s3 cp \
#   s3://jumpstart-cache-prod-ap-southeast-2/stabilityai-upscaling/model-upscaling-stabilityai-stable-diffusion-x4-upscaler-fp16/artifacts/inference-prepack/v1.0.0/code/inference.py \
#   ./inference.py
# or see it all
# aws s3 ls \
#   s3://jumpstart-cache-prod-ap-southeast-2/stabilityai-upscaling/model-upscaling-stabilityai-stable-diffusion-x4-upscaler-fp16/ \
#   --recursive

In [None]:
# Check your credentials (AWS identity) to confirm you are using the right credentials, can also run in a terminal window if you dont have ipykernel (remove the !)
!aws sts get-caller-identity

In [None]:
### STOP ###
### IF USING THIS NOTEBOOK IN A SAGEMAKER JUPYTER NOTEBOOK INSTANCE, THEN SKIP TO THE NEXT CELL ###
### OTHERWISE, IF USING VSCODE OR EQUIVALENT LOCAL IDE, THEN CONTINUE BELOW ###
### This script is for setting up your environment for the SageMaker Lab 1 ###
# do you need to upgrade python first? Your available version of Python is used to create the virtual environment
python3 --version

### STOP ###
### DO YOU NEED TO UPGRADE PYTHON ###
# upgrade to the latest version of python if required
brew install python==3.12
# restart vscode to pickup new version of python
python3 --version

### STOP ###
### OK IF YOU HAVE THE CORRECT VERSION OF PYTHON, CONTINUE ###
# create a virtual environment
python3.12 -m venv venv-stable-diffuser-lab2
# activate the virtual environment
source venv-stable-diffuser-lab2/bin/activate
### COPY TO HERE ONLY IF RUNNING AS ONE COPY AND PASTE ###

### STOP ###
### MAKE SURE ABOVE VENV GETS ACTIVATED BEFORE RUNNING THE REST ###
# upgrade pip
pip install --upgrade pip
# jupyter kernel support
pip install ipykernel
# add the virtual environment to jupyter
python  -m ipykernel install --user --name=venv-stable-diffuser-lab2 --display-name "Python (venv-stable-diffuser-lab2)"
# install the required packages - may need to specify the path here if not in the correct folder in terminal window
pip install -r requirements_lab2.txt
# pip install -r Documents/github/labs-sagemaker/jumpstart/etc/requirements_lab1.txt
# verify the installation
pip list

### RESTART VSCODE TO PICKUP THE NEW VENV ###

In [None]:
### STOP ###
### This command is for activating an environment that already exists, its for use in a terminal window if you need it ###
source venv-stable-diffuser-lab2/bin/activate
pip list

# use pip freeze if you prefer for requirements.txt freiendly format
### ALSO MAKE SURE YOU SELECT IT AS YOUR KERNEL FOR THIS JUPYTER NOTEBOOK ###

# Lab 1 Starts Here!

- Vars, libraries and clients

# <span style="color:DarkSeaGreen">Setup</span>

In [None]:
import random

# region
# for the purpose of this lab, us-east-1, us-west-2, eu-west-1 has the broadest coverage of models and instance types
# if you provision in other regions, you may not have access to all the models or instance types, and may need to request increase of quotas for endpoint usage for some instance types
myRegion='ap-southeast-2'

# iam
myRoleSageMakerExecution="stable-diffuser-lab2-execution-role"
myRoleSageMakerExecutionARN='RETRIEVED FROM ROLE BELOW'

# parameter store
myParameterStoreChosenModel='stable-diffuser-lab2-chosen-model'
myParameterStoreEndpointName='stable-diffuser-lab2-endpoint-name'
myParameterStoreIAMARN='stable-diffuser-lab2-iam-arn'

# bucket - MUST BE A UNIQUE NAME
myBucket='doit-sagemaker-model-bucket-' + str(random.randint(0, 1000)) + '-' + str(random.randint(0, 1000))

# async endpoint
myEndpointConfig='stable-diffuser-lab2-endpoint-config'
myEndpointAsync='stable-diffuser-lab2-endpoint-async'
myEndpoint='stable-diffuser-lab2-endpoint'

# model
modelID = "stabilityai/stable-diffusion-x4-upscaler"

print ('Done! Move to the next cell ->')

# <span style="color:Coral">Test Locally - 1!!</span>
- <span style="color:Coral">You DO NOT have to run this cell, its for testing the model locally if you want to see it in action first!</span>
- This tests the model locally using whatever hardware you have
- It downloads the model from huggingface when the cell is run
- You probably don't have a GPU, so it will use whatever it can to pipe
- This may take a loooong time, may buffer too much memory and fail
- We implement tiling to try and compensate, but try with smaller images, eg 128x128

In [None]:
from PIL import Image
from io import BytesIO
import base64
from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_upscale import StableDiffusionUpscalePipeline
import torch

# load model and scheduler
pipeline = StableDiffusionUpscalePipeline.from_pretrained(modelID, torch_dtype=torch.float16)

# Safe tiling parameters
TILE_SIZE = 256          # process 256x256 patches
NUM_INFERENCE_STEPS = 25 # reduce to save memory
UPSCALE_FACTOR = 4
GUIDANCE_SCALE = 7.5


# when running this locally, ie NOT on a GPU, you must use a different pipeline device
# normally you would pipeline = pipeline.to("cuda") for a GPU
# but we are just testing locally here, so we'll use whatever this device supports ... if at all
# when we build for AWS later, we will use pipeline = pipeline.to("cuda") in the models container (inference.py)
device = "mps" if torch.backends.mps.is_available() else "cpu"
pipeline = pipeline.to(device)

def tile_image(img, tile_size):
    tiles = []
    positions = []
    sizes = []
    width, height = img.size
    for y in range(0, height, tile_size):
        for x in range(0, width, tile_size):
            box = (x, y, min(x + tile_size, width), min(y + tile_size, height))
            tile = img.crop(box)
            tiles.append(tile)
            positions.append((x, y))
            sizes.append(tile.size)
    return tiles, positions, sizes

def stitch_tiles(tiles, positions, sizes, full_size):
    final_img = Image.new("RGB", (full_size[0]*UPSCALE_FACTOR, full_size[1]*UPSCALE_FACTOR))
    for tile, (x, y), (w, h) in zip(tiles, positions, sizes):
        # calculate upscale position and size
        up_x, up_y = x*UPSCALE_FACTOR, y*UPSCALE_FACTOR
        up_w, up_h = w*UPSCALE_FACTOR, h*UPSCALE_FACTOR
        # resize tile to expected upscaled size (in case the model slightly changes it)
        tile = tile.resize((up_w, up_h), Image.LANCZOS)
        final_img.paste(tile, (up_x, up_y))
    return final_img

def upscale_with_tiling(img: Image.Image, prompt: str):
    """Upscale image safely using tiling"""
    tiles, positions, sizes = tile_image(img, TILE_SIZE)
    upscaled_tiles = []

    for i, tile in enumerate(tiles):
        print(f"Processing tile {i+1}/{len(tiles)}")
        # pipeline expects PIL.Image
        tile = tile.convert("RGB")
        result = pipeline(
            prompt=prompt,
            image=tile,
            num_inference_steps=NUM_INFERENCE_STEPS,
            guidance_scale=GUIDANCE_SCALE
        )
        # safely get the first image
        upscaled_tile = result.images[0]
        upscaled_tiles.append(upscaled_tile)

    # Combine tiles
    return stitch_tiles(upscaled_tiles, positions, sizes, img.size)

# use these images
prompt = "Improve the photographic quality of the image"
images = [
    "resources/img2_original_1024.jpeg",
    "resources/img1_original_1024.jpeg",
]

for img in images:
    with open(img, "rb") as f:
        bytes_data = f.read()
    pil_image = Image.open(BytesIO(bytes_data)).convert("RGB")
    upscaled_image = upscale_with_tiling(pil_image, prompt)
    out_path = img.replace("original", "upscaled")
    upscaled_image.save(out_path)
    print("Saved:", out_path)

# <span style="color:DarkSeaGreen">Download Model Locally</span>
- We'll download the model so we can upload to S3
- When we create the model inference on SageMaker we can then source it from S3 rather than downloading
  - Note we don't clean this up locally after the end of the lab, in case you want to run this multiple times
  - <span style="color:DarkSeaGreen">We also don't put this into git, so you will only need to execute this cell if you don't already have it</span>
    - Look for the model folder, if you don't have it, you need to download the model via the next cell

In [None]:
from huggingface_hub import snapshot_download

# download snapshot - we don't require a token for a publicly available model
snapshot_dir = snapshot_download(
    repo_id=modelID,
    local_dir="model",
    allow_patterns=[
        "unet/*",
        "vae/*",
        "text_encoder/*",
        "scheduler/*",
        "tokenizer/*",
        "feature_extractor/*",
        "*.json",    # configs
        "*.txt"
    ]
)

print ('Done! Move to the next cell ->')

# <span style="color:Coral">Test Locally - 2!!</span>
- <span style="color:Coral">You DO NOT have to run this cell, its for testing the model locally if you want to see it in action first!</span>
- This tests the model locally using whatever hardware you have
- It uses the model you just downloaded from huggingface
- You probably don't have a GPU, so it will use whatever it can to pipe
- This may take a loooong time, may buffer too much memory and fail
- Try with smaller images, eg 128x128
- This code also exists in testLocal.py in the model_code folder for convenience if you want to test outside of this notebook

In [None]:
import json
import base64
from PIL import Image
from io import BytesIO
# use the inference.py from the code folder
from model_code.inference import model_fn, input_fn, predict_fn, output_fn

# 1. Load the model from your local model directory
model_dir = "./model"  # path to the folder containing the pretrained pipeline files
model = model_fn(model_dir)

# 2. Load an image to test
image_path = "resources/img2_original_512.jpeg"
with open(image_path, "rb") as f:
    img_bytes = f.read()
image_b64 = base64.b64encode(img_bytes).decode("utf-8")

# 3. Build a dummy payload
payload = {
    "prompt": "highly detailed, realistic photo",
    "image": image_b64
}
request_body = json.dumps(payload)

# 4. Call input_fn
inputs = input_fn(request_body, content_type="application/json")

# 5. Call predict_fn
prediction = predict_fn(inputs, model)

# 6. Call output_fn
response, content_type = output_fn(prediction, accept="application/json")

# 7. Show result
response_dict = json.loads(response)
upscaled_img_b64 = response_dict["image"]
upscaled_img_bytes = base64.b64decode(upscaled_img_b64)
upscaled_img = Image.open(BytesIO(upscaled_img_bytes))
upscaled_img.show()

# <span style="color:DarkSeaGreen">Tar the Model</span>
- Create the tar file
- The tar file must be of a particular structure, so we need to do the following:
  - Copy the code folder (includes inference.py and requirements.txt) into the model folder
  - Create the tar, but we dont want the model as a folder, all files in it must be at root

In [None]:
# copy the code folder to the model folder
import shutil
import tarfile
import os

# copy the code file to the downloaded model folder
shutil.copytree(
    "model_code",
    "model/code",
    dirs_exist_ok=True, # overwrite if already exists
    ignore=shutil.ignore_patterns("__pycache__", "__init__.py", "testLocal.py", "jumpstart_example"),
)

# lets make the tar file, its structure is essential
tar_name = "model.tar.gz"
if os.path.exists(tar_name):
    os.remove(tar_name)

with tarfile.open(tar_name, "w:gz") as tar:
    for root, dirs, files in os.walk("model"):
        # Skip unwanted files/folders
        dirs[:] = [d for d in dirs if d not in [".cache", "__pycache__"]]
        files = [f for f in files if f != ".DS_Store"]

        for file in files:
            full_path = os.path.join(root, file)
            # arcname removes the top-level 'model/' folder
            arcname = os.path.relpath(full_path, "model")
            tar.add(full_path, arcname=arcname)

print("Done! Move to the next cell ->")

# <span style="color:DarkSeaGreen">Paths and Clients</span>
- setup some paths and clients we need

In [None]:
# local client path for resources
myLocalPathForResources='/Users/simondavies/Documents/github/labs-sagemaker/gen-ai/image_upscale/'
# jupypter notebook path if notebook is used in AWS for example
#myLocalPathForResources='/home/ec2-user/SageMaker/labs-sagemaker/gen-ai/image_upscale/'

print ('Done! Move to the next cell ->')

In [None]:
import json
import boto3
from botocore.exceptions import ClientError
import base64
import io
import time
from datetime import datetime
from certifi import where
from PIL import Image
botoSession = boto3.Session(region_name=myRegion)

# Configure boto3 to use certifi's certificates - helps avoid SSL errors if your system’s certificate store is out of date or missing root certs
sts_client = boto3.client('sts', verify=where())
myAccountNumber = sts_client.get_caller_identity()["Account"]
print(myAccountNumber)
print(sts_client.get_caller_identity()["Arn"])

# create clients we can use later
# iam
iam = boto3.client('iam', region_name=myRegion, verify=where())
# ssm
ssm = boto3.client('ssm', region_name=myRegion, verify=where())
# s3
s3 = boto3.client('s3', region_name=myRegion, verify=where())
# sagemaker
sm = boto3.client("sagemaker", region_name=myRegion, verify=where())
# sagemaker runtime
smr = boto3.client("sagemaker-runtime", region_name=myRegion, verify=where())
print ('Done! Move to the next cell ->')

In [None]:
# define tags added to all services we create
myTags = [
    {"Key": "env", "Value": "non_prod"},
    {"Key": "owner", "Value": "doit-image-upscale"},
    {"Key": "project", "Value": "lab1"},
    {"Key": "author", "Value": "simon"},
]
myTagsDct = {
    "env": "non_prod",
    "owner": "doit-image-upscale",
    "project": "lab1",
    "author": "simon",
}

print ('Done! Move to the next cell ->')

# <span style="color:DarkSeaGreen">IAM</span>
- We need an execution role for SageMaker
- So we create one here, only needed if running locally from an IDE, eg VSCode
- If in SageKamer Studio Notebook, wont need it

In [None]:
def getSageMakerExecutionRole():
    """
    Creates a role required for SageMaker to run jobs on your behalf
    Only needed if this is being run in a local IDE, not needed if in SageMaker Studio or SageMaker Notebook Instance

    Args:
        None

    Returns:
        An IAM execution role ARN
    """

    # trust policy for the role
    roleTrust = {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "Service": "sagemaker.amazonaws.com"
                },
                "Action": "sts:AssumeRole"
            }
        ]
    }

    # check if the role exists
    try:
        role = iam.get_role(RoleName=myRoleSageMakerExecution)
        print("Role already exists. Using the existing role.")
        return role['Role']['Arn']
    except iam.exceptions.NoSuchEntityException:
        print("Role does not exist. Creating a new role.")
        
    # create execution role for sagemaker - allows SageMaker notebook instances, training jobs, and models to access S3, ECR, and CloudWatch on your behalf
    # this role is only created if we are running this notebook in a local ide, if we are in a jupyterlab in sagemaker studio, we dont need it as already created and available
    role = iam.create_role(
        RoleName=myRoleSageMakerExecution,
        AssumeRolePolicyDocument=json.dumps(roleTrust),
        Description="Service excution role for sagemaker ai use including inside jupyter notebooks",
        Tags=[
            *myTags,
        ],
    )

    # attach managed policy to the role AmazonSageMakerFullAccess
    iam.attach_role_policy(
        RoleName=myRoleSageMakerExecution,
        PolicyArn="arn:aws:iam::aws:policy/AmazonSageMakerFullAccess"
    )

    # store the role arn in parameter store for use in other notebooks
    ssm.put_parameter(
        Name=myParameterStoreIAMARN,
        Description='The ARN of the IAM role used by SageMaker for execution of jobs',
        Value=role['Role']['Arn'],
        Type='String',
        Tags=[
            *myTags,
        ],
    )   

    return role['Role']['Arn']

# <span style="color:DarkSeaGreen">Get Execution Role and Session</span>
- SageMaker requires an execution role to assume on your behalf

In [None]:
from sagemaker.core.helper.session_helper import Session, get_execution_role
sagemaker_session = Session()

try:
    # if this is being run in a SageMaker AI JupyterLab Notebook
    myRoleSageMakerExecutionARN = get_execution_role()
except:
    # if this is being run in a local IDE - we need to create our own role
    myRoleSageMakerExecutionARN = getSageMakerExecutionRole()

# make sure we get a session in the correct region (needed as it can use the aws configure region if running this locally
sageMakerSession = Session(boto_session=botoSession)

print(myRoleSageMakerExecutionARN)
print(sageMakerSession)

print ('Done! Move to the next cell ->')

# <span style="color:DarkSeaGreen">Provision a SageMaker Model</span>
- Provision a model via a customer inference container
  - This container is defined in the inference.py file
  - It allows us to download a HuggingFace model directly, and when used customise the use of the GPU via a diffuser pipeline
  - We create a custom one because JumpStart models have their own containers and do not allow customisation
### Example models to provision
- Stable Diffusion x4 upscaler FP16
  - https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler/blame/fp16/README.md
  - *model_id, model_version = "model-upscaling-stabilityai-stable-diffusion-x4-upscaler-fp16", "*"*
  - upscaling with Stable Diffusion (x4) is computationally expensive
    - FP16 means it uses half-precision floating point, so you want a GPU with good Tensor Core
  - the x4 upscaler model itself is large
    - want ≥ 16 GB VRAM to run comfortably in FP16 for 512×512 → 2048×2048 upscales
    - p4d.24xlarge (enterprise-grade, overkill unless you’re batching lots of requests)
      - **needs an aws quota increase for this instance for endpoint usage**
    - ml.g5.4xlarge
      - good for a poc - widely supported, good memory, reasonably costed
      - anything smaller and you will likely get CUDA out of memory errors
        - you need plenty of GPU memory
      - **needs an aws quota increase for this instance for endpoint usage**
- see https://aws.amazon.com/sagemaker/ai/pricing/ for pricing, **larger instances can be very expensive per hour**
- If you deply the model and you get a quota error, you will need to visit Service Quotas via the console and request an increase
  - go to SageMaker service and search for the instance
  - select the *model* for endpoint usage
  - make sure your quota allows for auto scaling max
- DO NOT LEAVE LARGE INSTANCES RUNNING LONGER THAN YOU NEED TO $$$!


### Instance Size is Important
- We are usinbg a model that upscales
- The larger the original image, the more GPU memory is taken when upscaling
- Sagemaker typically uses one GPU to do this 
  - SageMaker model endpoints don’t automatically spread inference across multiple GPUs unless the container is written for it
- Stability Diffusion provides a diffuser library 
  - Breaks the image into smaller patches, processes sequentially, then stitches
  - Uses much less VRAM at the cost of a bit more time
  - We use that below
- p4d.24xlarge has more GPU memory, but maybe an overkill, expensive and won't scale if source images are still too large to upscale in one GPU

In [None]:
# the instance we want to provision - THIS DISPLAYS AN INPUT BOX FOR YOU TO CHOOSE AN INSTANCE FOR THE MODEL INFERENCE PROVIDED
# https://aws.amazon.com/sagemaker/ai/pricing/
options = [
    f"img2img|{modelID}|ml.g5.12xlarge $$$$",
    f"img2img|{modelID}|ml.g4dn.16xlarge $$$$",
    f"img2img|{modelID}|ml.g5.4xlarge $$$",
    f"img2img|{modelID}|ml.g5.2xlarge $$",
]

print("Select an option:")
for i, opt in enumerate(options, 1):
    print(f"{i}. {opt}")

choice = int(input("Enter the number of the spec you want: "))
selected = options[choice - 1]

modelType = selected.split("|")[0]
modelID = selected.split("|")[1]
instanceType = selected.split("|")[2].split(" ")[0]
print(f"You selected: model type {modelType} {modelID} on {instanceType}")

# store the model in a parameter store for use in other labs
ssm.put_parameter(
    Name=myParameterStoreChosenModel,
    Description='the model chosen in lab1',
    Value=selected,
    Type='String',
    Overwrite=True,
)

print("Done! Move to the next cell ->")

# <span style="color:DarkSeaGreen">Upload inference container and model tar to S3</span>
- Create an S3 bucket
- Upload the tar model (you will have created this earlier in this lab if not done already)
- This can take a long time! 67 minutes if crossing the pacific from Australia to us-east-1 for example!
- <span style="color:DarkSeaGreen">You can skip this cell if its already there - dont clean it up at the end of the lab if you want to rerun</span>

In [None]:
from sagemaker.core.s3 import S3Uploader

# create a bucket
if myRegion == "us-east-1":
    s3.create_bucket(Bucket=myBucket)
else:
    s3.create_bucket(
        Bucket=myBucket, CreateBucketConfiguration={"LocationConstraint": myRegion}
    )

# Upload each file to the S3 bucket
files = [
    {
        "s3key": "model/model.tar.gz",
        "localpath": "{}model.tar.gz".format(myLocalPathForResources),
    }
]

# upload model.tar.gz to s3
for file in files:
    print("uploading: {}".format(file["localpath"]))
    S3Uploader.upload(local_path=file["localpath"], desired_s3_uri=f"s3://{myBucket}/model")

print("Done! Move to the next cell ->")

# <span style="color:DarkSeaGreen">Create Model and Endpoint</span>
- Create a model from the container
- Create a async endpoint config
- Create an endpoint

In [None]:
# AWS Elastic Container Registry (ECR) account that hosts official AWS SageMaker PyTorch containers
# NOTE
# SageMaker spins up a container from this image on your specified instance
# The container contains PyTorch + CUDA + Python runtime
# Then it loads your model artifact (model.tar.gz) into that container
# All inference happens inside that container on the GPU of your instance
# When you deploy a model via Model.deploy(), SageMaker pulls this container and runs your model inside it
# https://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html?utm_source=chatgpt.com
# dkr.ecr - Docker Registry (ECR service)
# myRegion (typicaly us-east-1, if not us-east-1 the image has to be changed) - AWS region where the container is stored
# amazonaws.com - AWS domain
# sagemaker-inference-pytorch - Container name for PyTorch inference
# 2.0-gpu-py3 - Tag specifying version: PyTorch 2.0, GPU support, Python 3
from sagemaker.core import image_uris
try:
    aws_ecr_sagemaker_pytorch_container=image_uris.retrieve(framework='neo-pytorch',region=myRegion,version='2.0',image_scope='inference',instance_type=instanceType)
except:
    print("Region not configured for this lab - please use us-east-1 or ap-southeast-2")
    raise Exception("Region not configured for this lab - please use us-east-1 or ap-southeast-2")

# If it fails when delpoying the agent, try hardcoding the container string for your region, eg:
# 763104351884.dkr.ecr.ap-southeast-2.amazonaws.com/huggingface-pytorch-inference:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04
aws_ecr_sagemaker_pytorch_container="763104351884.dkr.ecr.ap-southeast-2.amazonaws.com/huggingface-pytorch-inference:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04"
print(f"Container to use: {aws_ecr_sagemaker_pytorch_container}")
print("Done! Move to the next cell ->")

In [None]:
# creates a model object in sagemaker that can be deployed to an endpoint
from sagemaker.serve.model_builder import ModelBuilder
from sagemaker.serve.utils.types import ModelServer
from sagemaker.train.configs import Compute

# https://github.com/aws/sagemaker-python-sdk/blob/master/v3-examples/inference-examples/huggingface-example.ipynb
compute = Compute(instance_type=instanceType)
model_builder = ModelBuilder(
    image_uri=aws_ecr_sagemaker_pytorch_container,
    s3_model_data_url=f"s3://{myBucket}/model/model.tar.gz",
    model_server=ModelServer.MMS,
    compute=compute,
    role_arn=myRoleSageMakerExecutionARN,
    sagemaker_session=sageMakerSession,
)
variantName = "AllTraffic"

print("Done! Move to the next cell ->")

In [None]:
# builds it
core_model = model_builder.build(
    model_name=modelID.replace("/", "-"),
    role_arn=myRoleSageMakerExecutionARN,
    sagemaker_session=sageMakerSession,
)

print("Done! Move to the next cell ->")

In [None]:
# this cell will create  an endpoint for the model and instance type you selected previously
# this will take a while (few minutes), as it needs to get the model from s3, create the endpoint config and then the endpoint
# https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler
core_endpoint = model_builder.deploy(
    endpoint_name=myEndpoint,
    role_arn=myRoleSageMakerExecutionARN,
    sagemaker_session=sageMakerSession,
    container_timeout_in_seconds=600,
)

print("Done! Move to the next cell ->")

In [None]:
# store the endpoint name in a parameter store for use in other notebooks
ssm.put_parameter(
    Name=myParameterStoreEndpointName,
    Description='the name of the sagemaker endpoint created in lab1',
    Value=myEndpoint,
    Type='String',
    Overwrite=True,
)

print ('Done! Move to the next cell ->')

# <span style="color:DarkSeaGreen">Sample Images</span>
- Create a method to display the images
- Upload the sample images to S3, when calling the inference via async, it must use the source from S3

In [None]:
# required if an image model is being used
def decode_and_show(description, model_response) -> None:
    from PIL import Image
    import base64
    import io
    
    print (description)
    # Handle PIL Image objects
    if hasattr(model_response, 'save'):  # Check if it's a PIL Image
        display(model_response)
        return
    
    # Handle bytes (raw image data)
    elif isinstance(model_response, bytes):
        image = Image.open(io.BytesIO(model_response))
        display(image)
        image.close()
    
    # Handle base64 string (encoded image)
    elif isinstance(model_response, str):
        image = Image.open(io.BytesIO(base64.b64decode(model_response)))
        display(image)
        image.close()
    
    # Handle list of base64 strings (model response)
    elif isinstance(model_response, list):
        for i, img_data in enumerate(model_response):
            image = Image.open(io.BytesIO(base64.b64decode(img_data)))
            print(f"Image {i + 1}:")
            display(image)
            image.close()
    
    else:
        print(f"Can't handle the image. Unexpected response type: {type(model_response)}")

In [None]:
# Upload each file to the S3 bucket as a payload for async requests
files = [
    {
        "s3key": "originals/img1_original_1024.json",
        "localpath": "{}/resources/img1_original_1024.jpeg".format(myLocalPathForResources),
        "prompt": "Enhance this image to high-res",
    },
    {
        "s3key": "originals/img2_original_1024.json",
        "localpath": "{}/resources/img2_original_1024.jpeg".format(myLocalPathForResources),
        "prompt": "Enhance this image to high-res",
    },
]

for file in files:
    print(f"Preparing payload for: {file["s3key"]} from {file["localpath"]}")

    # Read and base64 encode the local image
    with open(file["localpath"], "rb") as f:
        image_b64 = base64.b64encode(f.read()).decode("utf-8")
    
    payload = {
        "prompt": file["prompt"],
        "image": image_b64
    }
    
    # Upload JSON payload to S3
    s3.put_object(
        Bucket=myBucket,
        Key=file["s3key"],
        Body=json.dumps(payload).encode("utf-8"),
        ContentType="application/json"
    )
    print("Uploaded payload to s3://{}/{}".format(myBucket, file["s3key"]))

print("Done! Move to the next cell ->")

# <span style="color:DarkSeaGreen">Invoke the Endpoint</span>
- Use the images uploaded from S3 as batch async to the inference

In [None]:
# invoke, this will return immediately with a job id
images = [
    "resources/img1_original_1024.jpeg",
    "resources/img2_original_1024.jpeg",
]

for img in images:
    imgOrig = Image.open(img)
    # encode image to base64
    buffered = io.BytesIO()
    imgOrig.save(buffered, format="JPEG")
    image_b64 = base64.b64encode(buffered.getvalue()).decode("utf-8")

    # Resize while preserving aspect ratio
    #max_dim = 512
    #w, h = imgOrig.size
    #if w > h:
    #    new_w = max_dim
    #    new_h = int(h * max_dim / w)
    #else:
    #    new_h = max_dim
    #    new_w = int(w * max_dim / h)

    #imgSmall = imgOrig.resize((new_w, new_h), Image.LANCZOS)
    #print (f'Resized the image from (w{w},h{h}) to (w{new_w},h{new_h})')
        
    # Convert to base64
    #buffered = io.BytesIO()
    #imgSmall.save(buffered, format="PNG")
    #image_b64 = base64.b64encode(buffered.getvalue()).decode("utf-8")

    payload = {
        "prompt": "highly detailed, realistic photo",
        "image": image_b64
    }
    payload_bytes = json.dumps(payload).encode("utf-8")

    # NOTE if you get a 413 error, your original image sizes are probably too large and should be < 6MB
    # HTTP 413 Content Too Large (Payload Too Large)
    # NOTE if you get a 400 CUDA out of memory error, this indicates that your GPU RAM (Random access memory) is full 
    # HTTP 400 InternalServerException
    # If so, bigger instance with a larger GPU per core, use a diffuser such as below to split the image into batches and restitch once done
    print('Starting the inference')

    # The following commented code does not work
    # Seems SageMaker V3 does not yet support the core_endpoint.invoke method as documented
    # https://github.com/aws/sagemaker-python-sdk/blob/master/v3-examples/inference-examples/jumpstart-example.ipynb
    # possibly because of a missing __dict__ attribute on the payload object as it has a bytes body
    # TypeError: vars() argument must have __dict__ attribute
    #response = core_endpoint.invoke(
    #    body = payload_bytes,
    #    content_type = "application/json",
    #    accept = "application/json",
    #)

    response = smr.invoke_endpoint(
        EndpointName=core_endpoint.endpoint_name,
        Body=json.dumps(payload).encode('utf-8'),
        ContentType="application/json",
        Accept="application/json",
    )

    print(response)
    #decode_and_show(response["generated_image"])

print ('Done! Move to the next cell ->')

# <span style="color:DarkSeaGreen">Invoke the Endpoint Asynchronously</span>
- Use the images uploaded from S3 as batch async to the inference

In [None]:
# invoke asynchronously, this will return immediately with a job id
response = core_endpoint.invoke_async(
    input_location=f"s3://{myBucket}/originals/img1_original.json",
    content_type="application/json",
    accept="application/json",
)
print(f"OutputLocation: {response["OutputLocation"]}")
print(f"Submitted async job: {response["InferenceId"]}")
print(f"Submitted at {datetime.now()}")

# <span style="color:DarkSeaGreen">Move to Lab 2</span>
# <span style="color:DarkSeaGreen">OR...</span>
# <span style="color:DarkSeaGreen">Clean Up Architecture</span>
### <span style="color:Red">Only do this if you have finished with this lab and any labs that depend on it!</span>
##### It will delete all architecture created, make sure you no longer need any of it!!!

In [None]:
# before you delete, heres the object names in case you want to review them in the console
print (f"Model name: {modelID}")
print(f"ECR path to container: {aws_ecr_sagemaker_pytorch_container}")
print(f"S3 uri: s3://{myBucket}/model/model.tar.gz")
print (f"Iam role ARN: {myRoleSageMakerExecutionARN}")
print(f"Example inference request: {json.dumps(payload).encode('utf-8')}")

#Tested via console with this:
#763104351884.dkr.ecr.ap-southeast-2.amazonaws.com/pytorch-inference:2.1.0-gpu-py310
#code currently using this
#355873309152.dkr.ecr.ap-southeast-2.amazonaws.com/sagemaker-inference-pytorch:2.0-gpu-py3

In [None]:
# when finished with the endpoint, delete it
# if you get an error it may still be updating after scaling in from lab 2 or lab 3 locust tests
# try again or delete via the console if the config cannot be found
# lets check the endpoint status first to make sure its still not changing due to scaling in
response = ssm.get_parameter(
    Name=myParameterStoreEndpointName
)
endpointName = response['Parameter']['Value']
response = sm.describe_endpoint(EndpointName=endpointName)
print(response["EndpointStatus"])

if response["EndpointStatus"] == "InService":
    print("Endpoint is in service. Proceeding with deletion.")
    sm.delete_endpoint(EndpointName=endpointName)
    print ('Done! Move to the next cell ->')
else:
    print("Endpoint is not in service. Cannot delete. Try again in a couple of minutes.")

In [None]:
# delete the model
response = sm.delete_model(ModelName=core_model.model_name)
print ('Done! Move to the next cell ->')

In [None]:
# delete the endpoint config
response = sm.delete_endpoint_config(EndpointConfigName=myEndpoint)
print ('Done! Move to the next cell ->')

In [None]:
# delete roles and policies
iam.detach_role_policy(
    RoleName=myRoleSageMakerExecution, PolicyArn='arn:aws:iam::aws:policy/AmazonSageMakerFullAccess'
)
iam.delete_role(RoleName=myRoleSageMakerExecution)
print ('Done! Move to the next cell ->')

In [None]:
# delete the parameter store entry
ssm.delete_parameter(Name=myParameterStoreChosenModel)
ssm.delete_parameter(Name=myParameterStoreEndpointName)
ssm.delete_parameter(Name=myParameterStoreIAMARN)
print ('Done! Move to the next cell ->')

In [None]:
# delete s3 bucket
# NOTE WARNING - this will delete all objects in the bucket with NO prompt or confirmation
s3r = boto3.resource('s3', region_name=myRegion, verify=where())
bucket = s3r.Bucket(myBucket)
bucket.objects.all().delete()

# delete the bucket
response = s3.delete_bucket(Bucket=myBucket)
print ('Done! Move to the next cell ->')

# <span style="color:DarkSeaGreen">Clean Up venv</span>
### Clean up if finished with this lab and running in VSCode or equivalent local IDE
#### Note these are macOS specific
- Run the commands of the cell below in a terminal window if you need to clean up a local venv
  - Note if you copy and paste the entire cell and run as one you will get zsh: command not found: # errors because of the comments, but you can ignore
  - Remember to restart the kernel to refresh whats available

In [None]:
# if you have local host in your terminal prompt
unset HOST
# deactivate the venv
deactivate 
# remove it and its contents if not needed
rm -rf venv-stable-diffuser-lab2 