# SageMaker Serverless Inference
## Sklearn Regression Example

Amazon SageMaker Serverless Inference is a purpose-built inference option that makes it easy for customers to deploy and scale ML models. Serverless Inference is ideal for workloads which have idle periods between traffic spurts and can tolerate cold starts. Serverless endpoints also automatically launch compute resources and scale them in and out depending on traffic, eliminating the need to choose instance types or manage scaling policies.

For this notebook we'll be working with the a custom Sklearn model to train a model and then deploy a serverless endpoint. We will be using the public S3 California housing dataset for this example.

<b>Update: </b>
SageMaker Serverless Inference is now supported by the [SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/overview.html#sagemaker-serverless-inference). This makes it very easy to deploy Serverless models that you are also training on SageMaker. As an alternative you can also use the general Boto3 Python SDK to create Serverless models, you can find an example notebook [here](https://github.com/aws/amazon-sagemaker-examples/blob/master/serverless-inference/Serverless-Inference-Walkthrough.ipynb).

<b>Notebook Setting</b>
- <b>SageMaker Classic Notebook Instance</b>: ml.m5.xlarge Notebook Instance & conda_python3 Kernel
- <b>SageMaker Studio</b>: Python 3 (Data Science)
- <b>Regions Available</b>: SageMaker Serverless Inference is currently available in the following regions: US East (Northern Virginia), US East (Ohio), US West (Oregon), EU (Ireland), Asia Pacific (Tokyo) and Asia Pacific (Sydney)

## Table of Contents
- Setup
- Model Training
- Deployment
- Cleanup

## Setup

For testing you need to properly configure your Notebook Role to have <b>SageMaker Full Access</b>.

Let's start by making sure to have the latest version of sagemaker, boto3, and the awscli.

In [1]:
! pip install sagemaker botocore boto3 awscli --upgrade



## SageMaker Setup

To begin, we import the AWS SDK for Python (Boto3) and set up our environment, including an IAM role and an S3 bucket to store our data.

In [2]:
import boto3
import sagemaker
from sagemaker.estimator import Estimator

boto_session = boto3.session.Session()
region = boto_session.region_name
print(region)

sagemaker_session = sagemaker.Session()
base_job_prefix = "sklearn-example"
role = sagemaker.get_execution_role()
print(role)

default_bucket = sagemaker_session.default_bucket()
s3_prefix = base_job_prefix

training_instance_type = "ml.m5.xlarge"

us-east-1
arn:aws:iam::474422712127:role/sagemaker-role-BYOC


Retrieve the California Housing dataset from a publicly hosted S3 bucket.

In [3]:
# retrieve data
!aws s3 cp s3://sagemaker-sample-files/datasets/tabular/california_housing/cal_housing.tgz .
!tar -zxf cal_housing.tgz

download: s3://sagemaker-sample-files/datasets/tabular/california_housing/cal_housing.tgz to ./cal_housing.tgz


Create a dataframe from the California housing dataset that we can upload to S3.

In [4]:
import pandas as pd
columns = [
    "longitude",
    "latitude",
    "housingMedianAge",
    "totalRooms",
    "totalBedrooms",
    "population",
    "households",
    "medianIncome",
    "medianHouseValue",
]
df = pd.read_csv("CaliforniaHousing/cal_housing.data", names=columns, header=None)
df.head()

Unnamed: 0,longitude,latitude,housingMedianAge,totalRooms,totalBedrooms,population,households,medianIncome,medianHouseValue
0,-122.23,37.88,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0
1,-122.22,37.86,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0
2,-122.24,37.85,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0
3,-122.25,37.85,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0
4,-122.25,37.85,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0


Split the data for training and inference.

In [5]:
#Splitting data in 80-20 split to use testing data for model inference later
train = df.iloc[:16000,:]
test = df.iloc[16001:,:]
#Train and test csv
train.to_csv('train.csv', index=False)
test.to_csv('test.csv', index=False)

Upload data to S3.

In [6]:
#Create a sagemaker session to be able to upload data to s3
prefix = "sklearn-cal-housing"
training_input_path = sagemaker_session.upload_data('train.csv', key_prefix=prefix + '/training')
training_input_path

's3://sagemaker-us-east-1-474422712127/sklearn-cal-housing/training/train.csv'

In [7]:
#verify data uploaded properly
training_data = pd.read_csv(training_input_path, sep = ',')
training_data.head()

Unnamed: 0,longitude,latitude,housingMedianAge,totalRooms,totalBedrooms,population,households,medianIncome,medianHouseValue
0,-122.23,37.88,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0
1,-122.22,37.86,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0
2,-122.24,37.85,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0
3,-122.25,37.85,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0
4,-122.25,37.85,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0


## Model Training

Now, we train a custom model using our Sklearn training script. In this example, we also provide [inference handler functions](https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-inference-container.html) to work with input and output functionality for inference, feel free to adjust this for how you want your endpoint to ingest and respond to data.

In [8]:
#Docs: https://sagemaker.readthedocs.io/en/stable/frameworks/sklearn/sagemaker.sklearn.html
from sagemaker.sklearn import SKLearn
sk_estimator = SKLearn(entry_point='train.py', 
                          role=role,
                          instance_count=1, 
                          instance_type='ml.c5.18xlarge',
                          py_version='py3',
                          framework_version='0.23-1',
                          script_mode=True,
                          hyperparameters={
                              'estimators': 20
                          }
                         )

#Training
sk_estimator.fit({'train': training_input_path})

2022-01-28 00:21:12 Starting - Starting the training job...
2022-01-28 00:21:39 Starting - Launching requested ML instancesProfilerReport-1643329271: InProgress
......
2022-01-28 00:22:39 Starting - Preparing the instances for training.........
2022-01-28 00:24:00 Downloading - Downloading input data...
2022-01-28 00:24:41 Training - Training image download completed. Training in progress..[34m2022-01-28 00:24:42,705 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training[0m
[34m2022-01-28 00:24:42,709 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-01-28 00:24:42,720 sagemaker_sklearn_container.training INFO     Invoking user training script.[0m
[34m2022-01-28 00:24:43,076 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-01-28 00:24:43,087 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-01-28 00:24:43,099 s

## Create Serverless Config Using The SageMaker SDK

For Serverless Inference you need two parameters: Memory Size and Max Concurrency. The current max concurrent invocations for a single endpoint, known as <b>MaxConcurrency</b>, can be any value from <b>1 to 50</b>, and <b>MemorySize</b> can be any of the following: <b>1024 MB, 2048 MB, 3072 MB, 4096 MB, 5120 MB, or 6144 MB</b>.

In [9]:
from sagemaker.serverless import ServerlessInferenceConfig
serverless_config = ServerlessInferenceConfig(memory_size_in_mb=4096, max_concurrency=3)

## Deploy Serverless Endpoint

In [10]:
import time
from time import gmtime, strftime
endpoint_name = "sklearn-serverless-ep" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

In [11]:
sk_estimator.deploy(endpoint_name = endpoint_name, serverless_inference_config=serverless_config)

--------!

<sagemaker.sklearn.model.SKLearnPredictor at 0x7f7973ce5160>

## Inference

Let's invoke the endpoint with a sample data point from our train set.

In [12]:
#Create sample data point
import json
samp = pd.read_csv('train.csv')
samp = samp.drop('medianHouseValue', 1)
samp = samp[:1]
request_body = {"Input": samp.values.tolist()}
data = json.loads(json.dumps(request_body))
payload = json.dumps(data)

In [14]:
import boto3
client = boto3.client('sagemaker-runtime')
content_type = "application/json"
response = client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType=content_type,
    Body=payload)
result = json.loads(response['Body'].read().decode())['Output']
result

446715