# SageMaker Real-Time Inference
## Real-time Bidding (RTB) with XGBoost Example

Amazon SageMaker Real-Time Inference is instance-based hosting, you should utilize it for low latency, high throughput sensitive workloads.

For this notebook we'll be working with the SageMaker real-time endpoints to deploy a trained XGBoost Algorithm for real-time bidding (RTB). The model was trained using data from RTB toolkit. https://github.com/aws-samples/aws-rtb-intelligence-kit

## Table of Contents
- Setup
- Deployment
    - Model Creation
    - Endpoint Configuration (Prod Variants + Instance Setup)
    - Real-Time Endpoint Creation
    - Endpoint Invocation
- Cleanup

## Deploy as SageMaker Realtime Endpoint in Bring-your-own-model (BYOM) mode


## Setup

Let's start by installing the Python SDK, boto and aws cli

In [None]:
! pip install sagemaker botocore boto3 awscli --upgrade

### SageMaker Setup
Setup the SageMaker service by initializing boto3, creating SageMaker session and retrieving default S3 bucket

In [None]:
import boto3
import sagemaker
from sagemaker.estimator import Estimator

client = boto3.client(service_name="sagemaker")
runtime = boto3.client(service_name="sagemaker-runtime")

boto_session = boto3.session.Session()
region = boto_session.region_name
print(region)

sagemaker_session = sagemaker.Session()
base_job_prefix = "xgb-bid-filtering"
role = sagemaker.get_execution_role()
print(role)

default_bucket = sagemaker_session.default_bucket()
s3_prefix = base_job_prefix
print(s3_prefix)
print(default_bucket)

## Deployment

### Model Creation
To create a model, first, we upload the model to the S3 bucket. In our example we use a pre-trained XGBoost model. 

In [None]:
model_s3_key = f"{s3_prefix}/model.tar.gz"
print(model_s3_key)

model_url = f"s3://{default_bucket}/{model_s3_key}"
print(f"Uploading Model to {model_url}")

with open("model/model.tar.gz", "rb") as model_file:
    boto_session.resource("s3").Bucket(default_bucket).Object(model_s3_key).upload_fileobj(model_file)

### Endpoint Configuration Creation

Then, we use the create_model API to create a model object ready for deployment.

Here we need to specify:
- Model artifact (`.tar.gz file`) - this is the trained model file we uploaded to S3 in the previous step. 
- Container image URI - the URI of the Docker container image in SageMaker. In our example we use a helper function - `image_uris.retrieve`  - to retrieve a pre-built XGBoost container provided by SageMaker. 
- InstanceType – compute instance you plan to use to host the model. In our example – ml.m5.xlarge. 
- Environment variables - we specify "SAGEMAKER_CONTAINER_LOG_LEVEL": "20" - this configuration will increase the verbosity of the logs in case of issues in deployment.
- Model name -  we define the model name by appending a timestamp to make it unique
- SageMaker IAM role - we specify the role that gives SageMaker permissions to access model artifacts in S3 and other required resources - in this case it is the execution role of the SageMaker Domain retrieved earlier in the Notebook.

In [None]:
from time import gmtime, strftime

model_name = "xgb-bid-filtering-realtime" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print("Model name: " + model_name)

# environment variables
byo_container_env_vars = {"SAGEMAKER_CONTAINER_LOG_LEVEL": "20"}

# inference instance type
inference_instance_type = "ml.m5.xlarge"

# retrieve xgboost image
image_uri = sagemaker.image_uris.retrieve(
    framework="xgboost",
    region=region,
    version="1.0-1",
    py_version="py3",
    instance_type=inference_instance_type,
)

create_model_response = client.create_model(
    ModelName=model_name,
    Containers=[
        {
            "Image": image_uri,
            "Mode": "SingleModel",
            "ModelDataUrl": model_url,
            "Environment": byo_container_env_vars,
        }
    ],
    ExecutionRoleArn=role,
)

print("Model Arn: " + create_model_response["ModelArn"])


To create an endpoint configuration, we use the create_endpoint_config() API by specifying:

- InstanceType – compute instance you plan to use to host the model. In our example – ml.m5.xlarge. 
- InitialInstanceCount - the number of compute instances you need to create. In our example - 1. 
- ModelName – the model name from the previous step 

In [None]:
xgboost_epc_name = "xgb-bid-filtering-real-time-epc" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

endpoint_config_response = client.create_endpoint_config(
    EndpointConfigName=xgboost_epc_name,
    ProductionVariants=[
        {
            "VariantName": "byoVariant",
            "ModelName": model_name,
            "InitialInstanceCount": 1,
            "InstanceType": inference_instance_type
        },
    ],
)
print("Endpoint Configuration Arn: " + endpoint_config_response["EndpointConfigArn"])

### RealTime Endpoint Creation
To create a SageMaker real-time endpoint we call create_endpoint API, passing in the name of our endpoint configuration and a name for the new endpoint. This creates a SageMaker real-time endpoint. 

In [None]:
endpoint_name = "xgb-bid-filtering-realtime-ep" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

create_endpoint_response = client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=xgboost_epc_name,
)

print("Endpoint Arn: " + create_endpoint_response["EndpointArn"])

We call describe_endpoint to confirm the endpoint is created successfully and to view its properties. 

In [None]:
# wait for endpoint to reach a terminal state (InService) using describe endpoint
import time

describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)

while describe_endpoint_response["EndpointStatus"] == "Creating":
    describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)
    print(describe_endpoint_response["EndpointStatus"])
    time.sleep(15)

describe_endpoint_response

### Endpoint Invocation
Now our model is hosted and ready to receive real-time predictions.

In this code snippet we invoke the endpoint by sending a request to it.

In [None]:
response = runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    Body="2,0,0.0,7.0,3.0,20.0,2",
    ContentType="text/csv",
)

print(response["Body"].read().decode('utf-8'))

### Cleanup

In [None]:
response = client.delete_endpoint(
    EndpointName=endpoint_name
)

## Store Variables for the Next Notebook

In the following code block, we store the necessary variables in this notebook for use in subsequent notebooks.

In [None]:
%store image_uri
%store model_url
%store model_name

%store