# Serving a Model as an HTTPS Endpoint

We need to understand the application and business context to choose between real-time and batch predictions. Are we trying to optimize for latency or throughput? Does the application require our models to scale automatically throughout the day to handle cyclic traffic requirements? Do we plan to compare models in production through A/B tests?

If our application requires low latency, then we should deploy the model as a real-time API to provide super-fast predictions on single prediction requests over HTTPS. We can deploy, scale, and compare our model prediction servers with SageMaker Endpoints. 

<img src="img/sagemaker-architecture.png" width="80%" align="left">

In [2]:
import boto3
import sagemaker
import pandas as pd

sess   = sagemaker.Session()
bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
region = boto3.Session().region_name

sm = boto3.Session().client(service_name='sagemaker', region_name=region)

In [53]:
training_job_name='gold-training-master'

In [54]:
print(training_job_name)

gold-training-master


# Copy the Model to the Notebook

In [78]:
!aws s3 cp s3://$bucket/tensorflow-training-2020-06-04-05-39-51-084/output/model.tar.gz ./model.tar.gz

Completed 972 Bytes/972 Bytes (11.5 KiB/s) with 1 file(s) remainingdownload: s3://sagemaker-us-east-1-835319576252/tensorflow-training-2020-06-04-05-39-51-084/output/model.tar.gz to ./model.tar.gz


In [88]:
!tar -xvzf ./model.tar.gz 

tensorflow/
tensorflow/saved_model/
tensorflow/saved_model/0/
transformers/
transformers/fine-tuned/
transformers/
transformers/fine-tuned/
tensorflow/
tensorflow/saved_model/
tensorflow/saved_model/0/
transformers/
transformers/fine-tuned/
tensorflow/
tensorflow/saved_model/
tensorflow/saved_model/0/
transformers/
transformers/fine-tuned/
tensorflow/
tensorflow/saved_model/
tensorflow/saved_model/0/
transformers/
transformers/fine-tuned/
tensorflow/
tensorflow/saved_model/
tensorflow/saved_model/0/
tensorflow/
tensorflow/saved_model/
tensorflow/saved_model/0/
transformers/
transformers/fine-tuned/
tensorflow/
tensorflow/saved_model/
tensorflow/saved_model/0/
transformers/
transformers/fine-tuned/
tensorflow/
tensorflow/saved_model/
tensorflow/saved_model/0/
transformers/
transformers/fine-tuned/
transformers/
transformers/fine-tuned/
tensorflow/
tensorflow/saved_model/
tensorflow/saved_model/0/
tensorflow/
tensorflow/saved_model/
tensorfl

In [91]:
ls -al ./

total 261816
drwxrwxr-x 20 ec2-user ec2-user      4096 Aug 17 04:46 [0m[01;34m.[0m/
drwxrwxr-x 16 ec2-user ec2-user      4096 Aug 12 00:12 [01;34m..[0m/
-rw-rw-r--  1 ec2-user ec2-user      2371 Aug 14 15:33 00_Overview.ipynb
-rw-rw-r--  1 ec2-user ec2-user      8719 Jul 25 21:29 01_Invoke_SageMaker_Autopilot_Model_From_Athena.ipynb
-rw-rw-r--  1 ec2-user ec2-user     16646 Aug 12 00:07 02_Deploy_Reviews_BERT_PyTorch_REST_Endpoint.ipynb
-rw-rw-r--  1 ec2-user ec2-user      9472 Aug 16 20:36 03_Deploy_Reviews_BERT_TensorFlow_REST_Endpoint.ipynb
-rw-rw-r--  1 ec2-user ec2-user     34818 Aug 12 00:08 04_Perform_AB_Test_Reviews_BERT_TensorFlow_REST_Endpoints.ipynb
-rw-rw-r--  1 ec2-user ec2-user     11328 Aug 10 06:21 05_Deploy_Reviews_BERT_TensorFlow_Batch_Predictions_TSV.ipynb
-rw-rw-r--  1 ec2-user ec2-user     30675 Aug 17 04:45 99_Add_InferenceDotPy_To_Model.ipynb
drwxrwxr-x  2 ec2-user ec2-user      4096 Jul 29 02:17 [01;34mbatch_prediction_output[0m/
drwxr-xr-x  2 

In [97]:
!ls -al model/tensorflow/saved_model/0
#!mv ./tensorflow/saved_model/ saved_model/

total 8
drwxr-xr-x 2 ec2-user ec2-user 4096 Jun  4 05:45 .
drwxr-xr-x 3 ec2-user ec2-user 4096 Jun  4 05:45 ..


In [74]:
!ls ./saved_model/

In [70]:
!saved_model_cli show --all --dir ./saved_model/0/

2020-08-17 04:44:05.379402: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.0/lib64:/usr/local/cuda-10.0/extras/CUPTI/lib64:/usr/local/cuda-10.0/lib:/usr/local/cuda-10.0/efa/lib:/opt/amazon/efa/lib:/opt/amazon/efa/lib64:/usr/lib64/openmpi/lib/:/usr/local/lib:/usr/lib:/usr/local/mpi/lib:/lib/:/usr/lib64/openmpi/lib/:/usr/local/lib:/usr/lib:/usr/local/mpi/lib:/lib/:/usr/lib64/openmpi/lib/:/usr/local/lib:/usr/lib:/usr/local/mpi/lib:/lib/:
2020-08-17 04:44:05.379484: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.0/lib64:/usr/local/cuda-10.0/extras/CUPTI/lib64:/usr/local/cuda-10.0/lib:/usr/local/cuda-10.0/ef

# Show `inference.py`

In [9]:
!pygmentize ./code/inference.py

[34mimport[39;49;00m [04m[36mjson[39;49;00m
[34mimport[39;49;00m [04m[36msubprocess[39;49;00m
[34mimport[39;49;00m [04m[36msys[39;49;00m
subprocess.check_call([sys.executable, [33m'[39;49;00m[33m-m[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mpip[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33minstall[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mtensorflow==2.1.0[39;49;00m[33m'[39;49;00m])
subprocess.check_call([sys.executable, [33m'[39;49;00m[33m-m[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mpip[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33minstall[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mtransformers==2.8.0[39;49;00m[33m'[39;49;00m])
[34mimport[39;49;00m [04m[36mtensorflow[39;49;00m [34mas[39;49;00m [04m[36mtf[39;49;00m
[34mfrom[39;49;00m [04m[36mtransformers[39;49;00m [34mimport[39;49;00m DistilBertTokenizer

classes=[[34m1[39;49;00m, [34m2[39;49;00m, [34m3[39;49;00m, [34m4[39;49;00m, [

In [38]:
!gunzip ./model.tar.gz

gzip: ./model.tar.gz: No such file or directory


In [39]:
!ls model.tar

model.tar


In [40]:
!tar -uvf model.tar code/inference.py

code/inference.py


In [66]:
!tar -xzvf model.tar.gz

tensorflow/
tensorflow/saved_model/
tensorflow/saved_model/0/
transformers/
transformers/fine-tuned/
transformers/
transformers/fine-tuned/
tensorflow/
tensorflow/saved_model/
tensorflow/saved_model/0/
transformers/
transformers/fine-tuned/
tensorflow/
tensorflow/saved_model/
tensorflow/saved_model/0/
transformers/
transformers/fine-tuned/
tensorflow/
tensorflow/saved_model/
tensorflow/saved_model/0/
transformers/
transformers/fine-tuned/
tensorflow/
tensorflow/saved_model/
tensorflow/saved_model/0/
tensorflow/
tensorflow/saved_model/
tensorflow/saved_model/0/
transformers/
transformers/fine-tuned/
tensorflow/
tensorflow/saved_model/
tensorflow/saved_model/0/
transformers/
transformers/fine-tuned/
tensorflow/
tensorflow/saved_model/
tensorflow/saved_model/0/
transformers/
transformers/fine-tuned/
transformers/
transformers/fine-tuned/
tensorflow/
tensorflow/saved_model/
tensorflow/saved_model/0/
tensorflow/
tensorflow/saved_model/
tensorfl

In [68]:
!ls -al ./tensorflow/saved_model/0

total 4260
drwxr-xr-x 4 ec2-user ec2-user    4096 Jun  4 05:45 .
drwxr-xr-x 3 ec2-user ec2-user    4096 Jun  4 05:45 ..
drwxr-xr-x 2 ec2-user ec2-user    4096 Aug 10 04:15 assets
-rw-r--r-- 1 ec2-user ec2-user 4342992 Aug 10 04:15 saved_model.pb
drwxr-xr-x 2 ec2-user ec2-user    4096 Aug 10 04:15 variables


In [44]:
!gzip model.tar

In [45]:
!ls model.tar.gz

model.tar.gz


In [56]:
!aws s3 cp ./model.tar.gz s3://$bucket/$training_job_name/output/model.tar.gz

Completed 2.0 KiB/2.0 KiB (27.2 KiB/s) with 1 file(s) remainingupload: ./model.tar.gz to s3://sagemaker-us-east-1-835319576252/gold-training-master/output/model.tar.gz


# Deploy the Model
This will create a default `EndpointConfig` with a single model.  

The next notebook will demonstrate how to perform more advanced `EndpointConfig` strategies to support canary rollouts and A/B testing.

_Note:  If not using a US-based region, you may need to adapt the container image to your current region using the following table:_

https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html

In [57]:
import time
timestamp = int(time.time())

gold_training_master_model_name = '{}-{}-{}'.format('gm', 'tf', timestamp)

print(tensorflow_model_name)

gm-tf-1597634989


In [58]:
from sagemaker.tensorflow.serving import Model

gold_training_master = Model(name=gold_training_master_model_name,
                             model_data='s3://{}/{}/output/model.tar.gz'.format(bucket, training_job_name),
                             role=role,                
                             framework_version='2.1.0')

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.


In [60]:
gold_training_master_endpoint_name = '{}-{}-{}'.format(training_job_name, 'tf', timestamp)

print(gold_training_master_endpoint_name)

gold-training-master-tf-1597634989


In [61]:
tensorflow_model = tensorflow_model.deploy(endpoint_name=gold_training_master_endpoint_name,
                                           initial_instance_count=1, # Should use >=2 for high(er) availability 
                                           instance_type='ml.c5.9xlarge', # requires enough disk space for tensorflow, transformers, and bert downloads
                                           wait=False)

'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


In [63]:
from IPython.core.display import display, HTML

display(HTML('<b>Review <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/endpoints/{}">SageMaker REST Endpoint</a></b>'.format(region, gold_training_master_endpoint_name)))


In [None]:
waiter = sm.get_waiter('endpoint_in_service')
waiter.wait(EndpointName=gold_training_master_endpoint_name)

# _Wait Until the ^^ Endpoint ^^ is Deployed_

# Simulate a Prediction from an Application

In [None]:
import json
from sagemaker.tensorflow.serving import Predictor

predictor = Predictor(endpoint_name=gold_training_master_endpoint_name,
                      sagemaker_session=sess,
                      content_type='application/json',
                      model_name='saved_model',
                      model_version=0)

# Predict the `star_rating` with Ad Hoc `review_body` Samples

In [None]:
reviews = ["This is great!"]

predicted_classes = predictor.predict(reviews)

for predicted_class, review in zip(predicted_classes, reviews):
    print('[Predicted Star Rating: {}]'.format(predicted_class), review)

# Predict the `star_rating` with `review_body` Samples from our TSV's

In [None]:
import csv

df_reviews = pd.read_csv('./data/amazon_reviews_us_Digital_Software_v1_00.tsv.gz', 
                         delimiter='\t', 
                         quoting=csv.QUOTE_NONE,
                         compression='gzip')
df_sample_reviews = df_reviews[['review_body', 'star_rating']].sample(n=100)
df_sample_reviews = df_sample_reviews.reset_index()
df_sample_reviews.shape

In [None]:
import pandas as pd

def predict(review_body):
    return predictor.predict([review_body])[0]

df_sample_reviews['predicted_class'] = df_sample_reviews['review_body'].map(predict)
df_sample_reviews

# Save for Next Notebook(s)

In [None]:
#%store tensorflow_model_name

In [None]:
#%store tensorflow_endpoint_name 

In [None]:
#%store

# Delete Endpoint
To save cost, we should delete the endpoint.

In [None]:
# sm.delete_endpoint(
#      EndpointName=gold_training_master_endpoint_name
# )

In [None]:
// %%javascript
// Jupyter.notebook.save_checkpoint();
// Jupyter.notebook.session.delete();