# Task 2: Deploy a model for real-time inference

## Task 2.1: Environment setup

Install packages and dependencies.

In [1]:
#install-dependencies
import boto3
import pandas as pd
import sagemaker
import time

role = sagemaker.get_execution_role()
region = boto3.Session().region_name
sess = boto3.Session()
sm = sess.client('sagemaker')
prefix = 'sagemaker/mlasms'
bucket = sagemaker.Session().default_bucket()
s3_client = boto3.client("s3")

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


Review the processed customer dataset.

In [2]:
#explore-dataset
column_list = ['income','age','workclass','education','education_num','marital_status','occupation','relationship','race','sex','capital_gain','capital_loss','hours_per_week']
lab_test_data = pd.read_csv('adult_data_processed.csv', names=(column_list), header=1)
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 20)
lab_test_data.dtypes
lab_test_data.head()

Unnamed: 0,income,age,workclass,education,education_num,marital_status,occupation,relationship,race,sex,capital_gain,capital_loss,hours_per_week
0,0,50,2,2,2,0,2,0,0,0,0,0,13
1,0,38,0,0,0,2,0,1,0,0,0,0,40
2,0,53,0,3,6,0,0,0,1,0,0,0,40
3,0,28,0,2,2,0,3,4,1,1,0,0,40
4,0,37,0,4,3,0,2,4,0,1,0,0,40


Save the model from the training and tuning lab in the default Amazon Simple Storage Service (Amazon S3) bucket. Set up a model using **create_model** and configure **ModelDataUrl** to reference the trained model.

In [3]:
#set-up-model
# Upload the model to your Amazon S3 bucket
s3_client.upload_file(
    Filename="model.tar.gz", Bucket=bucket, Key=f"{prefix}/models/model.tar.gz"
)

# Set a date to use in the model name
create_date = time.strftime("%Y-%m-%d-%H-%M-%S")
model_name = 'income-model-{}'.format(create_date)

# Retrieve the container image
container = sagemaker.image_uris.retrieve(
    region=boto3.Session().region_name, 
    framework='xgboost', 
    version='1.5-1'
)

# Set up the model
income_model = sm.create_model(
    ModelName = model_name,
    ExecutionRoleArn = role,
    PrimaryContainer = {
        'Image': container,
        'ModelDataUrl': f's3://{bucket}/{prefix}/models/model.tar.gz',
    }
)

## Task 2.2: Create an endpoint from the provided synthesized, retrained model

There are three steps to creating an endpoint using the Amazon SageMaker SDK for Python:
1. Create a SageMaker model.
2. Create an endpoint configuration for an HTTPS endpoint.
3. Create an HTTPS endpoint.

Refer to [Create Your Endpoint and Deploy Your Model](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-deployment.html) for more information about creating an endpoints.

You have already created a model. You are now ready to create an endpoint configuration and an endpoint. 

First, set up the endpoint configuration name and the instance type that you want to use. Then, call the CreateEndpointConfig API.

To create an endpoint configuration, you need to set the following options:
- **VariantName**: The name of the production variant (one or more models in production).
- **ModelName**: The name of the model that you want to host. This is the name that you specified when you created the model.
- **InstanceType**: The compute instance type.
- **InitialInstanceCount**: The number of instances to launch initially.

To log the inputs to your endpoint and the inference outputs from SageMaker real-time endpoints to Amazon S3, you can enable a feature called Data Capture. Data Capture is commonly used to record information that can be used for training, debugging, and monitoring. When you explore your endpoint in Amazon SageMaker Studio, more details about the endpoint will be displayed when Data Capture is enabled. The configuration for Data Capture features later in this lab to show you how to enable it.

Refer to [Capture Data](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-data-capture.html) for more information about adding Data Capture.

In [4]:
#create-endpoint-configuration 
# Create an endpoint config name. Here you create one based on the date  
# so it you can search endpoints based on creation time.
endpoint_config_name = 'income-model-real-time-endpoint-{}'.format(create_date)                              
instance_type = 'ml.m5.xlarge'   
initial_sampling_percentage = 25 # Choose a value between 0 and 100
capture_modes = [ "Input",  "Output" ] # Specify input, output, or both

endpoint_config_response = sm.create_endpoint_config(
    EndpointConfigName=endpoint_config_name, # You will specify this name in a CreateEndpoint request.
    # List of ProductionVariant objects, one for each model that you want to host at this endpoint.
    ProductionVariants=[
        {
            "VariantName": "variant1", # The name of the production variant.
            "ModelName": model_name, 
            "InstanceType": instance_type, # Specify the compute instance type.
            "InitialInstanceCount": 1 # Number of instances to launch initially.
        }
    ],
    DataCaptureConfig= {
        'EnableCapture': True, # Whether data should be captured or not.
        'InitialSamplingPercentage' : initial_sampling_percentage,
        'DestinationS3Uri': f's3://{bucket}/data-capture',
        'CaptureOptions': [{"CaptureMode" : capture_mode} for capture_mode in capture_modes]
    }
)

print(f"Created EndpointConfig: {endpoint_config_response['EndpointConfigArn']}")

Created EndpointConfig: arn:aws:sagemaker:us-west-2:440570968020:endpoint-config/income-model-real-time-endpoint-2024-11-06-16-22-31


Next, create an endpoint. When you create a real-time endpoint, SageMaker launches the machine learning (ML) compute instances and deploys one or more models as specified in the configuration. In this lab, you are only deploying one model for inference. In SageMaker, you can create a multi-model endpoint. Refer to [Invoke a Multi-Model Endpoint](https://docs.aws.amazon.com/sagemaker/latest/dg/invoke-multi-model-endpoint.html) for more information about multi-model endpoints.

When the endpoint is in service, the helper function will print the endpoint Amazon Resource Name (ARN). Endpoint creation will take approximately 3–7 minutes to run.

In [5]:
#create-endpoint
# The name of the endpoint. The name must be unique within an AWS Region in your AWS account.
endpoint_name = '{}-name'.format(endpoint_config_name)

create_endpoint_response = sm.create_endpoint(
    EndpointName=endpoint_name, 
    EndpointConfigName=endpoint_config_name
) 

def wait_for_endpoint_creation_complete(endpoint):
    """Helper function to wait for the completion of creating an endpoint"""
    response = sm.describe_endpoint(EndpointName=endpoint_name)
    status = response.get("EndpointStatus")
    while status == "Creating":
        print("Waiting for Endpoint Creation")
        time.sleep(15)
        response = sm.describe_endpoint(EndpointName=endpoint_name)
        status = response.get("EndpointStatus")

    if status != "InService":
        print(f"Failed to create endpoint, response: {response}")
        failureReason = response.get("FailureReason", "")
        raise SystemExit(
            f"Failed to create endpoint {create_endpoint_response['EndpointArn']}, status: {status}, reason: {failureReason}"
        )
    print(f"Endpoint {create_endpoint_response['EndpointArn']} successfully created.")

wait_for_endpoint_creation_complete(endpoint=create_endpoint_response)


Waiting for Endpoint Creation
Waiting for Endpoint Creation
Waiting for Endpoint Creation
Waiting for Endpoint Creation
Waiting for Endpoint Creation
Waiting for Endpoint Creation
Waiting for Endpoint Creation
Waiting for Endpoint Creation
Waiting for Endpoint Creation
Waiting for Endpoint Creation
Waiting for Endpoint Creation
Waiting for Endpoint Creation
Endpoint arn:aws:sagemaker:us-west-2:440570968020:endpoint/income-model-real-time-endpoint-2024-11-06-16-22-31-name successfully created.


In SageMaker Studio, you can review the endpoint details under the **Endpoints** tab.

1. Copy the **SagemakerStudioUrl** value to the left of these instructions.

1. Open a new browser tab, and then paste the **SagemakerStudioUrl** value into the address bar.

1. Press **Enter**.

1. The browser displays the SageMaker Studio page.

1. In the SageMaker Studio welcome popup window, choose **Skip Tour for now**.

1. Choose **Deployments**.

1. Choose **Endpoints**.

SageMaker Studio displays the **Endpoints** tab.

1. Select the endpoint which has **income-model-real-time-** in the **Name** column.

If the endpoint does not appear, choose the refresh icon until the endpoint appears in the list.

SageMaker Studio displays the **ENDPOINT SUMMARY** tab.

If you opened the endpoint before it finished creating, choose the refresh icon until the **Endpoint status** changes from *Creating* to *InService*.

The **Endpoint type** is listed as **Real-time**.

## Task 2.3: Invoke an endpoint for a real-time inference with real-time customer records

After you deploy your model using SageMaker hosting services, you can test your model on that endpoint by sending it test data.

You have several customer records that you know have an income greater than or equal to 50,000 USD (an **income** value of **1**), and several that have an income less than 50,000 USD (an **income** value of **0**). Invoke the endpoint with these records and view the returned scores.

To view real-time predictions from the endpoint, you read the returned body text from the response, which contains a list of the prediction scores. The score for each record ranges from **0** to **1**, with numbers closer to **1** indicating that those customers are more likely to have an income greater than or equal to 50,000 USD. For example, a customer with a prediction score of **0.42** is more likely to have an income greater than or equal to 50,000 USD than a customer with a prediction score of **0.14**.

In [6]:
#invoke-endpoint-real-time-records
sagemaker_runtime = boto3.client("sagemaker-runtime", region_name=region)

response = sagemaker_runtime.invoke_endpoint(
    ContentType='text/csv',
    EndpointName=endpoint_name, 
    Body=bytes('56,3,6,6,0,3,1,0,0,1,0,13\n' +
                '29,2,2,2,0,1,0,0,0,0,0,70\n' +
                '79,0,1,1,0,3,5,0,0,0,0,20\n', 'utf-8')
)

print(response)

print('\nTesting with records that have an income value of 1:')
print('The returned scores are: {}'.format(response['Body'].read().decode('utf-8')))

response = sagemaker_runtime.invoke_endpoint(
    ContentType='text/csv',
    EndpointName=endpoint_name, 
    Body=bytes('19,0,1,1,1,3,2,0,0,0,0,32\n' +
                '31,0,1,1,2,1,2,1,1,0,0,40\n' +
                '23,0,1,1,1,0,1,0,0,0,0,40\n', 'utf-8')
)

print('\nTesting with records that have an income value of 0:')
print('The returned scores are: {}'.format(response['Body'].read().decode('utf-8')))

{'ResponseMetadata': {'RequestId': '93bbe68f-9abc-4125-831f-7a47e067a559', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '93bbe68f-9abc-4125-831f-7a47e067a559', 'x-amzn-invoked-production-variant': 'variant1', 'date': 'Wed, 06 Nov 2024 16:28:55 GMT', 'content-type': 'text/csv; charset=utf-8', 'content-length': '59', 'connection': 'keep-alive'}, 'RetryAttempts': 0}, 'ContentType': 'text/csv; charset=utf-8', 'InvokedProductionVariant': 'variant1', 'Body': <botocore.response.StreamingBody object at 0x7f0ec36a76d0>}

Testing with records that have an income value of 1:
The returned scores are: 0.4714840352535248
0.46192866563796997
0.19585643708705902


Testing with records that have an income value of 0:
The returned scores are: 0.0013394501293078065
0.012343892827630043
0.006708303466439247



## Task 2.4: Delete the endpoint

Cleaning up an endpoint can be accomplished in three steps. First, delete the endpoint. Then, delete the endpoint configuration. Finally, if you no longer need the model that you deployed, delete the model.

In [7]:
#delete-resources
# Delete endpoint
sm.delete_endpoint(EndpointName=endpoint_name)

# Delete endpoint configuration
sm.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
                   
# Delete model
sm.delete_model(ModelName=model_name)

{'ResponseMetadata': {'RequestId': '7d9947cf-8652-4cb5-a108-8ce07b2b5dd9',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '7d9947cf-8652-4cb5-a108-8ce07b2b5dd9',
   'content-type': 'application/x-amz-json-1.1',
   'date': 'Wed, 06 Nov 2024 16:28:59 GMT',
   'content-length': '0'},
  'RetryAttempts': 0}}

### Conclusion

Congratulations! You have used SageMaker to successfully create a real-time endpoint, using the SageMaker Python SDK, and to invoke the endpoint.

The next task of the lab focuses on deploying a model for inference using serverless inference.

### Cleanup

You have completed this notebook. To move to the next part of the lab, do the following:

- Close this notebook file.
- Return to the lab session and continue with **Task 3: Deploy a model for serverless inference**.