# Task 4: Deploy a model for asynchronous inference

## Task 4.1: Environment setup

Install packages and dependencies.

In [1]:
#install-dependencies
import boto3
import sagemaker
import time
from sagemaker.session import Session
from botocore.exceptions import ClientError

role = sagemaker.get_execution_role()
region = boto3.Session().region_name
sess = boto3.Session()
sm = sess.client('sagemaker')
prefix = 'sagemaker/mlasms'
bucket = sagemaker.Session().default_bucket()
s3_client = boto3.client("s3")
sagemaker_runtime = boto3.client("sagemaker-runtime", region_name=region)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


Save the model from the training and tuning lab in the default Amazon Simple Storage Service (Amazon S3) bucket. Set up a model using **create_model** and configure **ModelDataUrl** to reference the trained model.

In [2]:
#set-up-model
# Upload the model to your Amazon S3 bucket
s3_client.upload_file(Filename="model.tar.gz", Bucket=bucket, Key=f"{prefix}/models/model.tar.gz")

# Set a date to use in the model name
create_date = time.strftime("%Y-%m-%d-%H-%M-%S")
model_name = 'income-model-{}'.format(create_date)

# Retrieve the container image
container = sagemaker.image_uris.retrieve(
    region=boto3.Session().region_name, 
    framework='xgboost', 
    version='1.5-1'
)

# Set up the model
income_model = sm.create_model(
    ModelName = model_name,
    ExecutionRoleArn = role,
    PrimaryContainer = {
        'Image': container,
        'ModelDataUrl': f's3://{bucket}/{prefix}/models/model.tar.gz',
    }
)

Upload the asynchronous records to the default Amazon S3 bucket.

In [3]:
#upload-dataset
s3_client.upload_file(Filename="asynchronous_records.csv", Bucket=bucket, Key=f"{prefix}/asynchronous_records.csv", ExtraArgs={"ContentType": "text/csv;charset=utf-8"})
input_location = f"s3://{bucket}/{prefix}/asynchronous_records.csv"

## Task 4.2: Create an endpoint from the provided synthesized, retrained model

Amazon SageMaker Asynchronous Inference is a capability in SageMaker that queues incoming requests and processes them asynchronously. This option is ideal for requests with large payload sizes (up to 1 GB), long processing times (up to 15 minutes), and near real-time latency requirements. With Asynchronous Inference, you can reduce costs by autoscaling the instance count to zero when there are no requests to process. Therefore, you only pay when your endpoint is processing requests.

There are three steps to creating an asynchronous endpoint using the SageMaker Python SDK. These are the same steps used for the real-time and serverless endpoints, but the steps have different configurations:
1. Create a SageMaker model in SageMaker.
2. Create an endpoint configuration for an HTTPS endpoint.
3. Create an HTTPS endpoint.

You have already created a model. You are now ready to create an endpoint configuration and an endpoint. 

First, set up the endpoint configuration name and the instance type that you want to use. Then, call the CreateEndpointConfig API.

To create an endpoint configuration, you need to set the following options:
- **VariantName**: The name of the production variant (one or more models in production).
- **ModelName**: The name of the model that you want to host. This is the name that you specified when you created the model.
- **InstanceType**: The compute instance type.
- **S3OutputPath**: The location to upload response outputs to when no location is provided in the request.
- **MaxConcurrentInvocationsPerInstance**: (Optional) The maximum number of concurrent requests sent by the SageMaker client to the model container.

Optionally, you can also set a NotificationConfig, selecting an Amazon Simple Notification Service (Amazon SNS) topic that posts notifications when an inference request is successful or if it fails. In this lab, you do not need to set up this option.

In [4]:
#create-endpoint-configuration 
# Create an endpoint config name. Here you create one based on the date so you can search endpoints based on creation time.
endpoint_config_name = 'income-model-asynchronous-endpoint-{}'.format(create_date)                              
output_location = f"s3://{bucket}/{prefix}/output"

endpoint_config_response = sm.create_endpoint_config(
   EndpointConfigName=endpoint_config_name,
   ProductionVariants=[
        {
            "ModelName": model_name,
            "VariantName": "variant1", # The name of the production variant.
            "InstanceType": "ml.m5.xlarge", # Specify the compute instance type.
            "InitialInstanceCount": 1 # Number of instances to launch initially.
            
        } 
    ],
    AsyncInferenceConfig={
        "OutputConfig": {
            # Location to upload response outputs when no location is provided in the request.
            "S3OutputPath": output_location
        },
        "ClientConfig": {
            # (Optional) Specify the max number of inflight invocations per instance
            # If no value is provided, Amazon SageMaker chooses an optimal value for you
            "MaxConcurrentInvocationsPerInstance": 4
        }
    }
)

print(f"Created EndpointConfig: {endpoint_config_response['EndpointConfigArn']}")

Created EndpointConfig: arn:aws:sagemaker:us-west-2:440570968020:endpoint-config/income-model-asynchronous-endpoint-2024-11-06-16-45-10


Next, create an endpoint. When you create an asynchronous endpoint, SageMaker launches the machine learning (ML) compute instances and deploys the model as specified in the configuration. Refer to [Asynchronous Inference](https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference.html) for more information about the options available to you with asynchronous endpoints.

When the endpoint is in service, the helper function prints the endpoint Amazon Resource Name (ARN). Endpoint creation can take as long as 7 minutes to run.

In [5]:
#create-endpoint
# The name of the endpoint. The name must be unique within an AWS Region in your AWS account.
endpoint_name = '{}-name'.format(endpoint_config_name)

create_endpoint_response = sm.create_endpoint(
    EndpointName=endpoint_name, 
    EndpointConfigName=endpoint_config_name
) 

def wait_for_endpoint_creation_complete(endpoint):
    """Helper function to wait for the completion of creating an endpoint"""
    response = sm.describe_endpoint(EndpointName=endpoint_name)
    status = response.get("EndpointStatus")
    while status == "Creating":
        print("Waiting for Endpoint Creation")
        time.sleep(15)
        response = sm.describe_endpoint(EndpointName=endpoint_name)
        status = response.get("EndpointStatus")

    if status != "InService":
        print(f"Failed to create endpoint, response: {response}")
        failureReason = response.get("FailureReason", "")
        raise SystemExit(
            f"Failed to create endpoint {create_endpoint_response['EndpointArn']}, status: {status}, reason: {failureReason}"
        )
    print(f"Endpoint {create_endpoint_response['EndpointArn']} successfully created.")

wait_for_endpoint_creation_complete(endpoint=create_endpoint_response)


Waiting for Endpoint Creation
Waiting for Endpoint Creation
Waiting for Endpoint Creation
Waiting for Endpoint Creation
Waiting for Endpoint Creation
Waiting for Endpoint Creation
Waiting for Endpoint Creation
Waiting for Endpoint Creation
Waiting for Endpoint Creation
Waiting for Endpoint Creation
Waiting for Endpoint Creation
Waiting for Endpoint Creation
Endpoint arn:aws:sagemaker:us-west-2:440570968020:endpoint/income-model-asynchronous-endpoint-2024-11-06-16-45-10-name successfully created.


In SageMaker Studio, you can review the endpoint details under the **Endpoints** tab.

1. Copy the **SagemakerStudioUrl** value to the left of these instructions.

1. Open a new browser tab, and then paste the **SagemakerStudioUrl** value into the address bar.

1. Press **Enter**.

1. The browser displays the SageMaker Studio page.

1. In the SageMaker Studio welcome popup window, choose **Skip Tour for now**.

1. Choose **Deployments**.

1. Choose **Endpoints**.

SageMaker Studio displays the **Endpoints** tab.

1. Select the endpoint which has **income-model-asynchronous-** in the **Name** column.

If the endpoint does not appear, choose the refresh icon until the endpoint appears in the list.

SageMaker Studio displays the **ENDPOINT SUMMARY** tab.

If you opened the endpoint before it finished creating, choose the refresh icon until the **Endpoint status** changes from *Creating* to *InService*.

The **Endpoint type** is listed as **Async**.

## Task 4.3: Invoke an endpoint for an asynchronous inference with asynchronous customer records

After you deploy your model using SageMaker hosting services, you can test your model on that endpoint by sending it test data.

To test an asynchronous endpoint, you must include the Amazon S3 input location in the API call. For this lab, there is a asynchronous_records.csv file in the default SageMaker S3 bucket with 100 customer records that you can test the endpoint with. If the action is successful, the service sends back an HTTP 202 response.

In [6]:
#send-test-file
response = sagemaker_runtime.invoke_endpoint_async(
                            EndpointName=endpoint_name, 
                            InputLocation=input_location)

print(response)

output_key = response['OutputLocation'].split("/", 3)[3]
print('\nThe output key is: {}'.format(output_key))

{'ResponseMetadata': {'RequestId': '0b42e3d0-0b34-421c-8bde-0f81513adb66', 'HTTPStatusCode': 202, 'HTTPHeaders': {'x-amzn-requestid': '0b42e3d0-0b34-421c-8bde-0f81513adb66', 'x-amzn-sagemaker-outputlocation': 's3://sagemaker-us-west-2-440570968020/sagemaker/mlasms/output/ca92292c-b6cc-450b-a67a-0c024dee9855.out', 'date': 'Wed, 06 Nov 2024 16:49:03 GMT', 'content-type': 'application/json', 'content-length': '54', 'connection': 'keep-alive'}, 'RetryAttempts': 0}, 'OutputLocation': 's3://sagemaker-us-west-2-440570968020/sagemaker/mlasms/output/ca92292c-b6cc-450b-a67a-0c024dee9855.out', 'InferenceId': 'b2803771-4332-4804-b7cf-cfd551010325'}

The output key is: sagemaker/mlasms/output/ca92292c-b6cc-450b-a67a-0c024dee9855.out


Check the output location to see if the inference has been processed. When it has been processed, print out the prediction scores for all the customers included in the invocation.

In [7]:
#get-output
def get_output():
    while True:
        try:
            return sagemaker.session.Session().read_s3_file(bucket=bucket, key_prefix=output_key)
        except ClientError as e:
            if e.response["Error"]["Code"] == "NoSuchKey":
                print("Waiting for output...")
                time.sleep(2)
                continue
            raise

output = get_output()
print(f"Predictions for the 100 customers: {output}")

Predictions for the 100 customers: 0.9714438319206238
0.001604771357960999
0.04654807969927788
0.4671800434589386
0.0018451622454449534
0.05159733071923256
0.012455019168555737
0.6992623805999756
0.0014537291135638952
0.5825179815292358
0.0006570011610165238
0.25096553564071655
0.9944268465042114
0.013036633841693401
0.00042709000990726054
0.759122908115387
0.9882861971855164
0.012484348379075527
0.006748788990080357
0.6448169350624084
0.994911253452301
0.0037871734239161015
0.09882298111915588
0.007764520589262247
0.03981306776404381
0.008821592666208744
0.39725974202156067
0.12154777348041534
0.592326283454895
0.02497350424528122
0.004290309734642506
0.47463929653167725
0.025301894173026085
0.8622359037399292
0.42451348900794983
0.024893196299672127
0.02361108735203743
0.030183158814907074
0.18910986185073853
0.02147681824862957
0.006713347975164652
0.007243005093187094
0.04896042123436928
0.33510586619377136
0.052108898758888245
0.04989563301205635
0.021678898483514786
0.64621305465

## Task 4.4: Delete the endpoint

Cleaning up an endpoint can be accomplished in three steps. First, delete the endpoint. Then, delete the endpoint configuration. Finally, if you no longer need the model that you deployed, delete it.

In [8]:
#delete-resources
# Delete endpoint
sm.delete_endpoint(EndpointName=endpoint_name)

# Delete endpoint configuration
sm.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
                   
# Delete model
sm.delete_model(ModelName=model_name)

{'ResponseMetadata': {'RequestId': '4cb3dc92-887a-4ee2-a3e9-31242ac089af',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '4cb3dc92-887a-4ee2-a3e9-31242ac089af',
   'content-type': 'application/x-amz-json-1.1',
   'date': 'Wed, 06 Nov 2024 16:49:13 GMT',
   'content-length': '0'},
  'RetryAttempts': 0}}

### Conclusion

Congratulations! You have used SageMaker to successfully create an asynchronous endpoint, using the SageMaker Python SDK, and to invoke the endpoint.

The next task of the lab focuses on batch transform.

### Cleanup

You have completed this notebook. To move to the next part of the lab, do the following:

- Close this notebook file.
- Return to the lab session and continue with **Task 5: Use batch transform to get inferences from a large dataset**.