# SageMaker Serverless Inference
## XGBoost Regression Example

Amazon SageMaker Serverless Inference is a purpose-built inference option that makes it easy for customers to deploy and scale ML models. Serverless Inference is ideal for workloads which have idle periods between traffic spurts and can tolerate cold starts. Serverless endpoints also automatically launch compute resources and scale them in and out depending on traffic, eliminating the need to choose instance types or manage scaling policies.

For this notebook we'll be working with the SageMaker XGBoost Algorithm to train a model and then deploy a serverless endpoint. We will be using the public S3 Abalone regression dataset for this example.

<b>Notebook Setting</b>
- <b>SageMaker Classic Notebook Instance</b>: ml.m5.xlarge Notebook Instance & conda_python3 Kernel
- <b>SageMaker Studio</b>: Python 3 (Data Science)
- <b>Regions Available</b>: SageMaker Serverless Inference is currently available in the following regions: US East (Northern Virginia), US East (Ohio), US West (Oregon), EU (Ireland), Asia Pacific (Tokyo) and Asia Pacific (Sydney)

## Table of Contents
- Setup
- Model Training
- Deployment
    - Model Creation
    - Endpoint Configuration (Adjust for Serverless)
    - Serverless Endpoint Creation
    - Endpoint Invocation
- Cleanup

## Setup

For testing you need to properly configure your Notebook Role to have <b>SageMaker Full Access</b>.

Let's start by installing preview wheels of the Python SDK, boto and aws cli

In [16]:
# Fallback in case wheels are unavailable
! pip install sagemaker botocore boto3 awscli --upgrade

Collecting botocore
  Downloading botocore-1.23.40-py3-none-any.whl (8.5 MB)
     |████████████████████████████████| 8.5 MB 30.9 MB/s            
Collecting boto3
  Downloading boto3-1.20.40-py3-none-any.whl (131 kB)
     |████████████████████████████████| 131 kB 106.9 MB/s            
Collecting awscli
  Downloading awscli-1.22.40-py3-none-any.whl (3.8 MB)
     |████████████████████████████████| 3.8 MB 106.0 MB/s            
Installing collected packages: botocore, boto3, awscli
  Attempting uninstall: botocore
    Found existing installation: botocore 1.23.39
    Uninstalling botocore-1.23.39:
      Successfully uninstalled botocore-1.23.39
  Attempting uninstall: boto3
    Found existing installation: boto3 1.20.39
    Uninstalling boto3-1.20.39:
      Successfully uninstalled boto3-1.20.39
  Attempting uninstall: awscli
    Found existing installation: awscli 1.22.39
    Uninstalling awscli-1.22.39:
      Successfully uninstalled awscli-1.22.39
[31mERROR: pip's dependency resolver

In [17]:
# Setup clients
import boto3

client = boto3.client(service_name="sagemaker")
runtime = boto3.client(service_name="sagemaker-runtime")

### SageMaker Setup
To begin, we import the AWS SDK for Python (Boto3) and set up our environment, including an IAM role and an S3 bucket to store our data.

In [18]:
import boto3
import sagemaker
from sagemaker.estimator import Estimator

boto_session = boto3.session.Session()
region = boto_session.region_name
print(region)

sagemaker_session = sagemaker.Session()
base_job_prefix = "xgboost-example"
role = sagemaker.get_execution_role()
print(role)

default_bucket = sagemaker_session.default_bucket()
s3_prefix = base_job_prefix

training_instance_type = "ml.m5.xlarge"

us-east-1
arn:aws:iam::474422712127:role/sagemaker-role-BYOC


Retrieve the Abalone dataset from a publicly hosted S3 bucket.

In [19]:
# retrieve data
!aws s3 cp s3://sagemaker-sample-files/datasets/tabular/uci_abalone/train_csv/abalone_dataset1_train.csv .

download: s3://sagemaker-sample-files/datasets/tabular/uci_abalone/train_csv/abalone_dataset1_train.csv to ./abalone_dataset1_train.csv


Upload the Abalone dataset to the default S3 bucket.

In [20]:
# upload data to S3
!aws s3 cp abalone_dataset1_train.csv s3://{default_bucket}/xgboost-regression/train.csv

upload: ./abalone_dataset1_train.csv to s3://sagemaker-us-east-1-474422712127/xgboost-regression/train.csv


## Model Training

Now, we train an ML model using the XGBoost Algorithm. In this example, we use a SageMaker-provided [XGBoost](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html) container image and configure an estimator to train our model.

In [21]:
from sagemaker.inputs import TrainingInput

training_path = f"s3://{default_bucket}/xgboost-regression/train.csv"
train_input = TrainingInput(training_path, content_type="text/csv")

In [22]:
model_path = f"s3://{default_bucket}/{s3_prefix}/xgb_model"

# retrieve xgboost image
image_uri = sagemaker.image_uris.retrieve(
    framework="xgboost",
    region=region,
    version="1.0-1",
    py_version="py3",
    instance_type=training_instance_type,
)

# Configure Training Estimator
xgb_train = Estimator(
    image_uri=image_uri,
    instance_type=training_instance_type,
    instance_count=1,
    output_path=model_path,
    sagemaker_session=sagemaker_session,
    role=role,
)

# Set Hyperparameters
xgb_train.set_hyperparameters(
    objective="reg:linear",
    num_round=50,
    max_depth=5,
    eta=0.2,
    gamma=4,
    min_child_weight=6,
    subsample=0.7,
    silent=0,
)

Train the model on the Abalone dataset.

In [23]:
# Fit model
xgb_train.fit({"train": train_input})

2022-01-20 20:08:44 Starting - Starting the training job...
2022-01-20 20:09:07 Starting - Launching requested ML instancesProfilerReport-1642709324: InProgress
......
2022-01-20 20:10:08 Starting - Preparing the instances for training.........
2022-01-20 20:11:39 Downloading - Downloading input data
2022-01-20 20:11:39 Training - Downloading the training image..[34mINFO:sagemaker-containers:Imported framework sagemaker_xgboost_container.training[0m
[34mINFO:sagemaker-containers:Failed to parse hyperparameter objective value reg:linear to Json.[0m
[34mReturning the value itself[0m
[34mINFO:sagemaker-containers:No GPUs detected (normal if no gpus installed)[0m
[34mINFO:sagemaker_xgboost_container.training:Running XGBoost Sagemaker in algorithm mode[0m
[34mINFO:root:Determined delimiter of CSV input is ','[0m
[34mINFO:root:Determined delimiter of CSV input is ','[0m
[34m[20:11:56] 2923x8 matrix with 23384 entries loaded from /opt/ml/input/data/train?format=csv&label_column

## Create Serverless Config Object To Deploy Endpoint

This is a simpler step with managed containers rather than using Boto3 (Python SDK) to orchestrate these steps.

In [25]:
from sagemaker.serverless import ServerlessInferenceConfig
serverless_config = ServerlessInferenceConfig(memory_size_in_mb=4096, max_concurrency=3)

In [26]:
import time
from time import gmtime, strftime
endpoint_name = "xgboost-serverless-ep" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

In [27]:
xgb_train.deploy(endpoint_name = endpoint_name, serverless_inference_config=serverless_config)

---------!

<sagemaker.predictor.Predictor at 0x7efec7eaad30>

In [32]:
response = runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=b".345,0.224414,.131102,0.042329,.279923,-0.110329,-0.099358,0.0",
    ContentType="text/csv",
)

print(response["Body"].read())

b'4.566554546356201'


In [15]:
client.delete_endpoint(EndpointName=endpoint_name)

{'ResponseMetadata': {'RequestId': '4863f8af-a00f-46a8-9524-605ee695a76e',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '4863f8af-a00f-46a8-9524-605ee695a76e',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Mon, 27 Dec 2021 18:41:45 GMT'},
  'RetryAttempts': 0}}