# Deploy our BERT PyTorch Model with TorchServe and Amazon SageMaker

We will deploy our BERT PyTorch Model as a REST Endpoint on SageMaker using TorchServe (https://github.com/pytorch/serve/).

TorchServe can be used for many types of inference in production settings. It provides an easy-to-use command line interface and utilizes REST based APIs handle state prediction requests.

<img src="./img/torchserve.png" width="90%">
  

More information on how to deploy Huggingface Transformers with TorchServe:
* https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers
* https://medium.com/analytics-vidhya/deploy-huggingface-s-bert-to-production-with-pytorch-serve-27b068026d18 

In [None]:
!pip install -q transformers==2.8.0
!pip install -q torch==1.5.0 --upgrade --ignore-installed
!pip install -q torch-model-archiver==0.1.1

In [None]:
import boto3
import sagemaker
import pandas as pd

sess   = sagemaker.Session()
bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
region = boto3.Session().region_name
account_id = boto3.client('sts').get_caller_identity().get('Account')

sm = boto3.Session().client(service_name='sagemaker', region_name=region)

# Copy the Transformer PyTorch Model from S3 to Local

In [None]:
%store -r transformer_pytorch_model_s3_uri

In [None]:
print(transformer_pytorch_model_s3_uri)

In [None]:
local_model_dir = './models/transformers/pytorch/'

In [None]:
!aws s3 cp --recursive $transformer_pytorch_model_s3_uri $local_model_dir

# Retrieve Transformer PyTorch Model Name (.bin) Created During Training

In [None]:
%store -r transformer_pytorch_model_name

In [None]:
print(transformer_pytorch_model_name)

# Create TorchServe Model Archive File (.mar)

https://github.com/pytorch/serve/blob/master/model-archiver/README.md

A key feature of TorchServe is the ability to package all model artifacts into a single model archive file. It is a separate command line interface (CLI), torch-model-archiver, that can take model checkpoints or model definition file with state_dict, and package them into a .mar file. This file can then be redistributed and served by anyone using TorchServe. It takes in the following model artifacts: a model checkpoint file in case of torchscript or a model definition file and a state_dict file in case of eager mode, and other optional assets that may be required to serve the model. The CLI creates a .mar file that TorchServe's server CLI uses to serve the models. 

We need to pass the the following:
* `--handler`:  Python code to adapt the `review_body` to BERT tokens (request handler) as well as the `star_rating` response of 1-5 (response handler)
* `config.json`:  used by the Huggingface transformers library when we saved the model in a previous notebook.  In 
* `setup_config.json`:  BERT-specific `setup_config.json` that defines the `max seq length`, `number of output classes` (1-5), etc.
* `Seq_classification_artifacts/index_to_name.json`:  BERT-specific mapping of response index (0-4) to class name (1-5 star rating) for our output classes

In [None]:
torchserve_model_name = 'reviews-distilbert-pytorch'

In [None]:
print(torchserve_model_name)

In [None]:
!mkdir -p ./model_store

In [None]:
!torch-model-archiver -f \
    --model-name model \
    --export-path ./model_store/ \
    --version 1.0 \
    --serialized-file $local_model_dir/$transformer_pytorch_model_name \
    --handler ./src_torchserve/Transformer_handler_generalized.py \
    --extra-files "./models/transformers/pytorch/config.json,./src_torchserve/setup_config.json,./src_torchserve/Seq_classification_artifacts/index_to_name.json"

In [None]:
!ls -al ./model_store/

# Start TorchServe locally to serve the model

After you archive and store the model, use the torchserve command to serve the model.

# Prepare the Model for SageMaker Deployment

To deploy the model to a SageMaker REST endpoint, we need to upload our .mar file to S3 and build a TorchServe model container. 

In [None]:
!unzip -y model.mar

# Upload TorchServe Model Archive File to S3

In [None]:
torchserve_mar = 'model.mar'

# Tar the `.mar` Archive File as `model.tar.gz` and Upload to S3
Per TorchServe convention, the `.mar` file must be under ./model_store/ in the `.tar` archive

In [None]:
!tar -cvzf ./model.tar.gz \
    ./model_store/$torchserve_mar

In [None]:
torchserve_tar_s3_uri = 's3://{}/models/torchserve/model.tar.gz'.format(bucket, torchserve_model_name)

# Upload `model.tar.gz` to S3

In [None]:
!aws s3 cp ./model.tar.gz $torchserve_tar_s3_uri

In [None]:
!tar -xvzf ./model.tar.gz

In [None]:
print(torchserve_tar_s3_uri)

## Create an Amazon ECR registry
Create a new docker container registry for our TorchServe container images.   
Ignore any error in case the registry already exists- this is OK.

In [None]:
registry_name = 'torchserve'

In [None]:
!aws ecr create-repository --repository-name {registry_name}

## Build a TorchServe Docker container and push it to Amazon ECR

In [None]:
image_tag = 'torch-1.5.0-1.0.0'
image_uri = f'{account_id}.dkr.ecr.{region}.amazonaws.com/{registry_name}:{image_tag}'

In [None]:
!docker build -t {registry_name}:{image_tag} -f ./docker/Dockerfile ./docker

!$(aws ecr get-login --no-include-email --region {region})

!docker tag {registry_name}:{image_tag} {image_uri}

!docker push {image_uri}

## Create SageMaker Endpoint and Deploy TorchServe Model Container

In [None]:
print(torchserve_tar_s3_uri)

In [None]:
from sagemaker.model import Model
from sagemaker.predictor import RealTimePredictor

torchserve_model = Model(model_data=torchserve_tar_s3_uri, 
                         image=image_uri,
                         role=role,
                         predictor_cls=RealTimePredictor,
                         name=torchserve_model_name)

In [None]:
import time

endpoint_name = '{}-endpoint-'.format(torchserve_model_name) + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())

print(endpoint_name)

predictor = torchserve_model.deploy(instance_type='ml.m5.large',
                                    initial_instance_count=1,
                                    endpoint_name=endpoint_name)

# _Wait Until the ^^ Endpoint ^^ is Deployed_

## Run A Sample Prediction

In [None]:
predicted_classes = predictor.predict("This is a wonderful product!")
print(predicted_classes.decode('utf-8'))

In [None]:
# sm.delete_endpoint(
#     EndpointName=endpoint_name
# )

In [None]:
%%javascript
Jupyter.notebook.save_checkpoint();
Jupyter.notebook.session.delete();