# Deploying our BERT PyTorch Model with TorchServe and Amazon SageMaker

We will deploy our BERT PyTorch Model as a REST Endpoint on SageMaker using TorchServe (https://github.com/pytorch/serve/).

TorchServe can be used for many types of inference in production settings. It provides an easy-to-use command line interface and utilizes REST based APIs handle state prediction requests.

<img src="../img/torchserve.png" width="90%">
  

More information on how to deploy Huggingface Transformers with TorchServe:
* https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers
* https://medium.com/analytics-vidhya/deploy-huggingface-s-bert-to-production-with-pytorch-serve-27b068026d18 

In [None]:
!pip install -q transformers==2.8.0
!pip install -q torch==1.5.0 --upgrade --ignore-installed
!pip install -q torchserve==0.1.1 
!pip install -q torch-model-archiver==0.1.1

In [None]:
import boto3
import sagemaker
import pandas as pd

sess   = sagemaker.Session()
bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
region = boto3.Session().region_name
account_id = boto3.client('sts').get_caller_identity().get('Account')

sm = boto3.Session().client(service_name='sagemaker', region_name=region)

In [None]:
#!pip install ./src_torchserve/serve/model-archiver/

# Retrieve Transformer/PyTorch Model

In [None]:
#%store -r s3_pytorch_model_path
#print(s3_pytorch_model_path)

In [None]:
%store -r s3_transformer_pytorch_model_path
print(s3_transformer_pytorch_model_path)

In [None]:
local_model_dir = './models/transformers/pytorch/'

In [None]:
!aws s3 cp --recursive $s3_transformer_pytorch_model_path $local_model_dir

# Create TorchServe Model Archive File

A key feature of TorchServe is the ability to package all model artifacts into a single model archive file. It is a separate command line interface (CLI), torch-model-archiver, that can take model checkpoints or model definition file with state_dict, and package them into a .mar file. This file can then be redistributed and served by anyone using TorchServe. It takes in the following model artifacts: a model checkpoint file in case of torchscript or a model definition file and a state_dict file in case of eager mode, and other optional assets that may be required to serve the model. The CLI creates a .mar file that TorchServe's server CLI uses to serve the models. 

You can find more information here: https://github.com/pytorch/serve/blob/master/model-archiver/README.md

## Create a Model Store

In [None]:
!mkdir -p ./model_store

In [None]:
#transformer_pytorch_model_name = 'pytorch_model.bin'

In [None]:
%store -r transformer_pytorch_model_name

## Create Model Archive File (.mar)

In [None]:
torchserve_model_name = 'DistilBertForSequenceClassification'

In [None]:
!torch-model-archiver \
    --model-name $torchserve_model_name \
    --export-path ./model_store \
    --version 1.0 \
    --serialized-file $local_model_dir/$transformer_pytorch_model_name \
    --handler ./src_torchserve/Transformer_handler_generalized.py \
    --extra-files "./models/transformers/pytorch/config.json,./src_torchserve/setup_config.json,./src_torchserve/Seq_classification_artifacts/index_to_name.json"

In [None]:
!ls ./model_store/*.mar

# Start TorchServe to serve the model

After you archive and store the model, use the torchserve command to serve the model.

# *Note: TorchServe requires Java 11 which is not installed by default in SageMaker Notebook Instances*

In [None]:
# %%bash

# sudo amazon-linux-extras install java-openjdk11

In [None]:
# %%bash 

# torchserve \
# --start \
# --model-store ./model_store \
# --models distilbert-pytorch=DistilBertForSequenceClassification.mar &

After you execute the torchserve command above, TorchServe runs on your host, listening for inference requests.

## To test the model server, send a request to the server's predictions API.

In [None]:
# !curl -X POST http://127.0.0.1:8080/predictions/distilbert-pytorch -T ./src_torchserve/Seq_classification_artifacts/sample_text.txt

# Prepare the Model for SageMaker Deployment

To deploy the model to a SageMaker REST endpoint, we need to upload our .mar file to S3 and build a TorchServe model container. 

## Upload TorchServe Model Archive File to S3

In [None]:
torchserve_mar = '{}.mar'.format(torchserve_model_name)
print(torchserve_mar)

In [None]:
s3_torchserve_mar = 's3://{}/models/torchserve/{}'.format(bucket, torchserve_mar)
print(s3_torchserve_mar)

In [None]:
!aws s3 cp ./model_store/$torchserve_mar $s3_torchserve_mar

In [None]:
%store s3_torchserve_mar

## TAR the .mar model archive file and upload to S3

In [None]:
!tar cvfz ./models/{torchserve_model_name}.tar.gz \
    ./model_store/{torchserve_model_name}.mar


In [None]:
s3_torchserve_tar = 's3://{}/models/torchserve/{}.tar.gz'.format(bucket, torchserve_model_name)
print(s3_torchserve_tar)

In [None]:
!aws s3 cp ./models/{torchserve_model_name}.tar.gz $s3_torchserve_tar

In [None]:
%store s3_torchserve_tar

## Create an Amazon ECR registry
Create a new docker container registry for our TorchServe container images.   
Ignore any error in case the registry already exists- this is OK.

In [None]:
registry_name = 'torchserve'
!aws ecr create-repository --repository-name {registry_name}

## Build a TorchServe Docker container and push it to Amazon ECR

In [None]:
image_label = 'v2'
image = f'{account_id}.dkr.ecr.{region}.amazonaws.com/{registry_name}:{image_label}'

In [None]:
!docker build -t {registry_name}:{image_label} -f ./src_torchserve/Dockerfile ./src_torchserve
!$(aws ecr get-login --no-include-email --region {region})
!docker tag {registry_name}:{image_label} {image}
!docker push {image}

## Create SageMaker Endpoint and Deploy TorchServe Model Container

In [None]:
print(s3_torchserve_tar)

In [None]:
from sagemaker.model import Model
from sagemaker.predictor import RealTimePredictor

sm_model_name = 'distilbert-pytorch'

torchserve_model = Model(model_data = s3_torchserve_tar, 
                         image = image,
                         role  = role,
                         predictor_cls=RealTimePredictor,
                         name  = sm_model_name)

In [None]:
import time

torchserve_endpoint_name = '{}-endpoint-'.format(sm_model_name) + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
print(endpoint_name)

predictor = torchserve_model.deploy(instance_type='ml.c5.4xlarge',
                                    initial_instance_count=1,
                                    endpoint_name = torchserve_endpoint_name)

In [None]:
print(torchserve_endpoint_name)

In [None]:
%store torchserve_endpoint_name

# _Wait Until the ^^ Endpoint ^^ is Deployed_

## Run A Sample Prediction

In [None]:
predicted_classes = predictor.predict("This is a wonderful product!")
print(predicted_classes.decode('utf-8'))