# Serving a TensorFlow Model as a REST Endpoint with TensorFlow Serving and SageMaker

We need to understand the application and business context to choose between real-time and batch predictions. Are we trying to optimize for latency or throughput? Does the application require our models to scale automatically throughout the day to handle cyclic traffic requirements? Do we plan to compare models in production through A/B tests?

If our application requires low latency, then we should deploy the model as a real-time API to provide super-fast predictions on single prediction requests over HTTPS. We can deploy, scale, and compare our model prediction servers with SageMaker Endpoints.

## Interesting Reads
* [**How Roblox Scaled BERT to Serve 1+ Billion Daily Requests on CPUs**](https://blog.roblox.com/2020/05/scaled-bert-serve-1-billion-daily-requests-cpus/)

<img src="img/sagemaker-architecture.png" width="80%" align="left">

In [1]:
import boto3
import sagemaker
import pandas as pd

sess   = sagemaker.Session()
bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
region = boto3.Session().region_name

sm = boto3.Session().client(service_name='sagemaker', region_name=region)

In [2]:
%store -r training_job_name

In [3]:
try:
    training_job_name
    print('[OK]')
except NameError:
    print('+++++++++++++++++++++++++++++++')
    print('[ERROR] Please run the notebooks in the previous TRAIN section before you continue.')
    print('+++++++++++++++++++++++++++++++')

[OK]


# Copy the Model to the Notebook

In [4]:
!aws s3 cp s3://$bucket/$training_job_name/output/model.tar.gz ./model.tar.gz

download: s3://sagemaker-us-east-1-835319576252/tensorflow-training-2021-01-27-02-29-07-903/output/model.tar.gz to ./model.tar.gz


In [5]:
!rm -rf ./model/

In [6]:
!mkdir -p ./model/
!tar -xvzf ./model.tar.gz -C ./model/

code/
code/inference.py
tensorboard/
tensorboard/train/
tensorboard/train/events.out.tfevents.1611715129.ip-10-2-87-218.ec2.internal.97.28316.v2
tensorboard/train/plugins/
tensorboard/train/plugins/profile/
tensorboard/train/plugins/profile/2021_01_27_02_34_24/
tensorboard/train/plugins/profile/2021_01_27_02_34_24/ip-10-2-87-218.ec2.internal.memory_profile.json.gz
tensorboard/train/plugins/profile/2021_01_27_02_34_24/ip-10-2-87-218.ec2.internal.input_pipeline.pb
tensorboard/train/plugins/profile/2021_01_27_02_34_24/ip-10-2-87-218.ec2.internal.tensorflow_stats.pb
tensorboard/train/plugins/profile/2021_01_27_02_34_24/ip-10-2-87-218.ec2.internal.trace.json.gz
tensorboard/train/plugins/profile/2021_01_27_02_34_24/ip-10-2-87-218.ec2.internal.kernel_stats.pb
tensorboard/train/plugins/profile/2021_01_27_02_34_24/ip-10-2-87-218.ec2.internal.overview_page.pb
tensorboard/train/plugins/profile/2021_01_27_02_34_24/ip-10-2-87-218.ec2.internal.xplane.pb
tensorboard/train/events.out.tfevents.16117148

In [7]:
!saved_model_cli show --all --dir './model/tensorflow/saved_model/0/'

/bin/sh: 1: saved_model_cli: not found


In [8]:
!saved_model_cli run --dir './model/tensorflow/saved_model/0/' --tag_set serve --signature_def serving_default \
    --input_exprs 'input_ids=np.zeros((1,64));input_mask=np.zeros((1,64))'

/bin/sh: 1: saved_model_cli: not found


# Show `inference.py`

In [9]:
!pygmentize ./code/inference.py

[34mimport[39;49;00m [04m[36mjson[39;49;00m
[34mimport[39;49;00m [04m[36msubprocess[39;49;00m
[34mimport[39;49;00m [04m[36msys[39;49;00m
subprocess.check_call([sys.executable, [33m'[39;49;00m[33m-m[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mpip[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33minstall[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mtensorflow==2.3.1[39;49;00m[33m'[39;49;00m])
subprocess.check_call([sys.executable, [33m'[39;49;00m[33m-m[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mpip[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33minstall[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mtransformers==4.1.1[39;49;00m[33m'[39;49;00m])
[37m# Workaround for https://github.com/huggingface/tokenizers/issues/120 and[39;49;00m
[37m#                https://github.com/kaushaltrivedi/fast-bert/issues/174[39;49;00m
[37m#subprocess.check_call([sys.executable, '-m', 'pip', 'install', '--upgrade', 'tokenizers'])[39;49;00m

[34m

# Deploy the Model
This will create a default `EndpointConfig` with a single model.  

The next notebook will demonstrate how to perform more advanced `EndpointConfig` strategies to support canary rollouts and A/B testing.

_Note:  If not using a US-based region, you may need to adapt the container image to your current region using the following table:_

https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html

In [10]:
import time

timestamp = int(time.time())

tensorflow_model_name = '{}-{}-{}'.format(training_job_name, 'tf', timestamp)

print(tensorflow_model_name)

tensorflow-training-2021-01-27-02-29-07-903-tf-1611724324


In [11]:
from sagemaker.tensorflow.estimator import TensorFlow

estimator = TensorFlow.attach(training_job_name=training_job_name)


2021-01-27 02:53:02 Starting - Preparing the instances for training
2021-01-27 02:53:02 Downloading - Downloading input data
2021-01-27 02:53:02 Training - Training image download completed. Training in progress.
2021-01-27 02:53:02 Uploading - Uploading generated training model
2021-01-27 02:53:02 Completed - Training job completed


In [12]:
# requires enough disk space for tensorflow, transformers, and bert downloads
instance_type = 'ml.m5.4xlarge' # evt 

In [13]:
from sagemaker.tensorflow.model import TensorFlowModel

tensorflow_model = TensorFlowModel(name=tensorflow_model_name,
                                   source_dir='code',
                                   entry_point='inference.py',
                                   model_data='s3://{}/{}/output/model.tar.gz'.format(bucket, training_job_name),
                                   role=role,
                                   framework_version='2.3.1')

In [14]:
tensorflow_endpoint_name = '{}-{}-{}'.format(training_job_name, 'tf', timestamp)

print(tensorflow_endpoint_name)

tensorflow-training-2021-01-27-02-29-07-903-tf-1611724324


In [15]:
tensorflow_model.deploy(endpoint_name=tensorflow_endpoint_name,
                        initial_instance_count=1, # Should use >=2 for high(er) availability 
                        instance_type=instance_type,
                        wait=False)

update_endpoint is a no-op in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


<sagemaker.tensorflow.model.TensorFlowPredictor at 0x7fb0e04cda50>

In [16]:
from IPython.core.display import display, HTML

display(HTML('<b>Review <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/endpoints/{}">SageMaker REST Endpoint</a></b>'.format(region, tensorflow_endpoint_name)))


# _Wait Until the Endpoint is Deployed_

In [None]:
%%time

waiter = sm.get_waiter('endpoint_in_service')
waiter.wait(EndpointName=tensorflow_endpoint_name)

# _Wait Until the ^^ Endpoint ^^ is Deployed_

In [None]:
tensorflow_endpoint_arn = sm.describe_endpoint(EndpointName=tensorflow_endpoint_name)['EndpointArn']
print(tensorflow_endpoint_arn)

# Show the Experiment Tracking Lineage

In [None]:
from sagemaker.lineage.visualizer import LineageTableVisualizer

lineage_table_viz = LineageTableVisualizer(sess)
lineage_table_viz_df = lineage_table_viz.show(endpoint_arn=tensorflow_endpoint_arn)
lineage_table_viz_df

# Test the Deployed Model

In [None]:
import json
from sagemaker.tensorflow.model import TensorFlowPredictor

predictor = TensorFlowPredictor(endpoint_name=tensorflow_endpoint_name,
                                sagemaker_session=sess,
                                model_name='saved_model',
                                model_version=0,
                                content_type='application/jsonlines',
                                accept_type='application/jsonlines')

### Waiting for the Endpoint to be ready to Serve Predictions

In [None]:
import time

time.sleep(30)

# Predict the `star_rating` with Ad Hoc `review_body` Samples

In [None]:
inputs = [
    {"features": ["This is great!"]}
    {"features": ["This is bad."]}
]

predicted_classes_str = predictor.predict(inputs)
predicted_classes = predicted_classes_str.splitlines()

for predicted_class_json, input_data in zip(predicted_classes, inputs):
    predicted_class = json.loads(predicted_class_json)['predicted_label']
    print('Predicted star_rating: {} for review_body "{}"'.format(predicted_class, input_data["features"][0]))

# Predict the `star_rating` with `review_body` Samples from our TSV's

In [None]:
import csv

df_reviews = pd.read_csv('./data/amazon_reviews_us_Digital_Software_v1_00.tsv.gz', 
                         delimiter='\t', 
                         quoting=csv.QUOTE_NONE,
                         compression='gzip')
df_sample_reviews = df_reviews[['review_body', 'star_rating']].sample(n=5)
df_sample_reviews = df_sample_reviews.reset_index()
df_sample_reviews.shape

In [None]:
import pandas as pd

def predict(review_body):
    inputs = [
        {"features": [review_body]}
    ]
    predicted_classes_str = predictor.predict(inputs)
    predicted_classes_json = predicted_classes_str.splitlines()
    for predicted_class_json, input_data in zip(predicted_classes_json, inputs):
        predicted_class = json.loads(predicted_class_json)['predicted_label']
    return predicted_class

df_sample_reviews['predicted_class'] = df_sample_reviews['review_body'].map(predict)
df_sample_reviews.head(5)

# Save for Next Notebook(s)

In [None]:
%store tensorflow_model_name

In [None]:
%store tensorflow_endpoint_name

In [None]:
%store tensorflow_endpoint_arn

In [None]:
%store

# Release Resources
To save cost, we should delete the endpoint.

In [None]:
# sm.delete_endpoint(
#      EndpointName=tensorflow_endpoint_name
# )

In [None]:
%%html

<p><b>Shutting down your kernel for this notebook to release resources.</b></p>
<button class="sm-command-button" data-commandlinker-command="kernelmenu:shutdown" style="display:none;">Shutdown Kernel</button>
        
<script>
try {
    els = document.getElementsByClassName("sm-command-button");
    els[0].click();
}
catch(err) {
    // NoOp
}    
</script>

In [None]:
%%javascript

try {
    Jupyter.notebook.save_checkpoint();
    Jupyter.notebook.session.delete();
}
catch(err) {
    // NoOp
}