# Deploying realtime inference with API requests

By Daniel Marostica

### Predict if somebody would or not survive the sinking of RMS Titanic creating and invoking a SageMaker endpoint. 

In this example, we use SageMaker's Script Mode, in which you need to provide proper in/output functions so that everything is plugged in correctly. More information inside the python files.

In [13]:
import sagemaker
import joblib

from sagemaker import get_execution_role

In [None]:
sagemaker_session = sagemaker.Session()
role = get_execution_role()
bucket = sagemaker_session.default_bucket() # create a bucket or use your account's default
prefix = 'titanic_realtime' # folder name to store your input data

## Uploading the data for training <a class='anchor' id='upload_data'></a>

When in production, with large amounts of data, you can use Amazon Athena, AWS Glue or Amazon EMR to store data in S3. We will use the SageMaker Python SDK to upload the data to a default bucket.

`train_input` contains the S3 path to the csv file.

In [14]:
train_input = sagemaker_session.upload_data(
    path='titanic.csv',
    bucket=bucket,
    key_prefix='{}/{}'.format(prefix, 'train'),
)

## Create SageMaker Scikit Estimator <a class='anchor' id='create_sklearn_estimator'></a>

I have developed a code (`preprocessing.py`) to preprocess data which contains an SKLearn model that has to be fitted.
SageMaker will start an instance and fit it for us.

In [15]:
from sagemaker.sklearn.estimator import SKLearn

FRAMEWORK_VERSION = '0.23-1' # sklearn version
script_path = 'preprocessing.py'

sklearn_preprocessor = SKLearn(
    entry_point=script_path,
    role=role,
    framework_version=FRAMEWORK_VERSION,
    instance_type='ml.c4.xlarge',
    sagemaker_session=sagemaker_session
)

In [16]:
sklearn_preprocessor.fit({'train': train_input}) # 'train' is the name of the channel set in preprocessing.py's args

2021-12-07 12:14:00 Starting - Starting the training job...
2021-12-07 12:14:26 Starting - Launching requested ML instancesProfilerReport-1638879240: InProgress
......
2021-12-07 12:15:26 Starting - Preparing the instances for training.........
2021-12-07 12:16:47 Downloading - Downloading input data...
2021-12-07 12:17:27 Training - Downloading the training image..[34m2021-12-07 12:17:36,034 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training[0m
[34m2021-12-07 12:17:36,037 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2021-12-07 12:17:36,051 sagemaker_sklearn_container.training INFO     Invoking user training script.[0m

2021-12-07 12:17:56 Uploading - Uploading generated training model[34m2021-12-07 12:17:50,092 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2021-12-07 12:17:50,105 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus ins

## Transform training data <a class='anchor' id='preprocess_train_data'></a>
After the preprocessor is properly fitted, the training data has to be transformed. SageMaker will create another instance and apply with it.

In [17]:
transformer = sklearn_preprocessor.transformer(
    instance_count=1, instance_type='ml.m5.xlarge', assemble_with='Line', accept='text/csv'
)

In [18]:
transformer.transform(train_input, content_type='text/csv')
transformer.wait()

preprocessed_train = transformer.output_path # save preprocessed data path
preprocessed_train

.............................[34m2021-12-07 12:23:27,232 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)[0m
[34m2021-12-07 12:23:27,235 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)[0m
[34m2021-12-07 12:23:27,235 INFO - sagemaker-containers - nginx config: [0m
[34mworker_processes auto;[0m
[34mdaemon off;[0m
[34mpid /tmp/nginx.pid;[0m
[34merror_log  /dev/stderr;[0m
[34mworker_rlimit_nofile 4096;[0m
[34mevents {
  worker_connections 2048;[0m
[34m}[0m
[34mhttp {
  include /etc/nginx/mime.types;
  default_type application/octet-stream;
  access_log /dev/stdout combined;
  upstream gunicorn {
    server unix:/tmp/gunicorn.sock;
  }
  server {
    listen 8080 deferred;
    client_max_body_size 0;
    keepalive_timeout 3;
    location ~ ^/(ping|invocations|execution-parameters) {
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header Host $http_host;
      proxy_redirect off;
 

's3://sagemaker-us-east-1-296025910508/sagemaker-scikit-learn-2021-12-07-12-18-44-092'

## Fit a Random Forest model with preprocessed data <a class='anchor' id='training_model'></a>

Now the data can be fit by, for example, a Random Forest Classifier, which I wrote in `inference.py`. It is a Scikit-Learn model, so it's required to invoke SageMaker's SKLearn constructor.

In [19]:
from sagemaker.sklearn.estimator import SKLearn

sklearn = SKLearn(
    entry_point='inference.py',
    framework_version='0.23-1',
    instance_type='ml.m5.large',
    role=role,
    sagemaker_session=sagemaker_session,
    hyperparameters={"max_leaf_nodes": 30},
    base_job_name='sm-training')

In [20]:
sklearn.fit({'train': preprocessed_train})

training_job_name = sklearn._current_job_name
training_job_name

2021-12-07 12:23:59 Starting - Starting the training job...
2021-12-07 12:24:29 Starting - Launching requested ML instancesProfilerReport-1638879839: InProgress
......
2021-12-07 12:25:29 Starting - Preparing the instances for training............
2021-12-07 12:27:30 Downloading - Downloading input data
2021-12-07 12:27:30 Training - Downloading the training image...
2021-12-07 12:28:01 Uploading - Uploading generated training model.[34m2021-12-07 12:27:50,824 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training[0m
[34m2021-12-07 12:27:50,826 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2021-12-07 12:27:50,836 sagemaker_sklearn_container.training INFO     Invoking user training script.[0m
[34m2021-12-07 12:27:51,101 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2021-12-07 12:27:54,124 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus i

'sm-training-2021-12-07-12-23-59-285'

## Setting up the inference pipeline <a class='anchor' id='pipeline_setup'></a>

We need a pipeline with two steps: preprocessing and inference. That is why we import PipelineModel.

In [21]:
from sagemaker.pipeline import PipelineModel
from time import gmtime, strftime

timestamp_prefix = strftime("%Y-%m-%d-%H-%M-%S", gmtime()) # get current time to name the model and the endpoint

scikit_learn_preprocess = sklearn_preprocessor.create_model() # create a model with our transformer
random_forest = sklearn.create_model() # create a model with our SKLearn constructor instanced object

model_name = 'inference-pipeline-titanic-' + timestamp_prefix
endpoint_name = 'inference-pipeline-ep-titanic-' + timestamp_prefix

sm_model = PipelineModel(
    name=model_name, role=role, models=[scikit_learn_preprocess, random_forest])

sm_model.deploy(initial_instance_count=1, instance_type='ml.c4.xlarge', endpoint_name=endpoint_name)

-----------------!

In [22]:
endpoint_name

'inference-pipeline-ep-titanic-2021-12-07-12-28-42'

## Make a request to our pipeline endpoint <a class='anchor' id='pipeline_inference_request'></a>

After the endpoint has been deployed, we can make requests of inference, in realtime. I'll demonstrate both a CSV and a JSON example.

### CSV

In [23]:
from sagemaker.predictor import Predictor
from sagemaker.serializers import CSVSerializer

payload = 'Age,Embarked,Fare,Parch,Pclass,Sex,SibSp,Family_Size\n22.0,S,7.25,0,3,male,1,1'

predictor = Predictor(
    endpoint_name=endpoint_name, 
    sagemaker_session=sagemaker_session, 
    serializer=CSVSerializer())

predictor.predict(payload)

Age,Embarked,Fare,Parch,Pclass,Sex,SibSp,Family_Size
22.0,S,7.25,0,3,male,1,1
b'{"instances": [0.0]}'


### JSON

In [25]:
import boto3

payload = '[{"Age":22,"Embarked":"S","Fare":7.25,"Parch":0,"Pclass":3,"Sex":"male","SibSp":1,"Family_Size":1}]'

client=boto3.client('sagemaker-runtime')
response=client.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=payload,
    ContentType='application/json')

response.get('Body').read()

b'{"instances": [0.0]}'

If the endpoint instance had returned 1.0, the person would probably have survived. It has returned 0.0, so, bad news.

In this example, I made the request with only one line. If you send more data for inference, the instance will return a list with all the results.

## CLI request

Alternatively, you can invoke the endpoint through command line interface:

```
aws sagemaker-runtime invoke-endpoint \
  --endpoint-name inference-pipeline-ep-titanic-2021-12-07-12-28-42 \
  --body fileb://test.json \
  --content-type application/json \
  --accept application/json \
  output_file.json
```

## Delete Endpoint <a class='anchor' id='delete_endpoint'></a>

Easily delete the endpoint if you are not using it.

In [None]:
sm_client = sagemaker_session.boto_session.client('sagemaker')
sm_client.delete_endpoint(EndpointName=endpoint_name)