# Fine tune and host PyTorch BERT model on SageMaker


Amazon SageMaker is a fully managed service that provides developers and data scientists with the ability to build, train, and deploy machine learning (ML) models quickly. Amazon SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high-quality models. The SageMaker Python SDK provides open source APIs and containers that make it easy to train and deploy models in SageMaker with several different machine learning and deep learning frameworks.

For this example, we use an Amazon SageMaker Notebook Instance for running the code. For information on how to use Amazon SageMaker Notebook Instances, see the AWS documentation (https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html).

In [None]:
import sagemaker, os, pandas as pd
import numpy as np

sagemaker_session = sagemaker.Session()

bucket = sagemaker_session.default_bucket()
prefix = 'sagemaker/DEMO-pytorch-bert'

role = sagemaker.get_execution_role()

# Prepare training data

We use Corpus of Linguistic Acceptability (CoLA) (https://nyu-mll.github.io/CoLA/), a dataset of 10,657 English sentences labeled as grammatical or ungrammatical from published linguistics literature. We download and unzip the data using the following code:

### Download data

In [None]:
if not os.path.exists('./cola_public_1.1.zip'):
    !curl -o ./cola_public_1.1.zip https://nyu-mll.github.io/CoLA/cola_public_1.1.zip
if not os.path.exists('./cola_public/'):
    !unzip cola_public_1.1.zip

### Get sentence and label

Let us take a quick look at our data. First we read in the training data. The only two columns we need are the sentence itself and its label. 

In [None]:
df = pd.read_csv('./cola_public/raw/in_domain_train.tsv',
                 sep='\t',header=None, usecols=[1,3], names=['label','sentence'])
sentences = df.sentence.values
labels = df.label.values

In [None]:
print(sentences[20:25])
print(labels[20:25])

We then split the dataset for training and testing before uploading both to Amazon S3 for use later. The SageMaker Python SDK provides a helpful class for uploading to Amazon S3:

In [None]:
from sklearn.model_selection import train_test_split

train, test = train_test_split(df)
train.to_csv('./cola_public/train.csv', index=False)
test.to_csv('./cola_public/test.csv', index=False)

Training and testing data is uploaded to S3. 

In [None]:
inputs_train = sagemaker_session.upload_data('./cola_public/train.csv', bucket=bucket,key_prefix=prefix)
inputs_test = sagemaker_session.upload_data('./cola_public/test.csv', bucket=bucket,key_prefix=prefix)

# Run training

To start, we use the PyTorch estimator class to train our model. When creating our estimator, we make sure to specify a few things:

* entry_point: the name of our PyTorch script. It contains training script loads data from the input channels, configures training with hyperparameters, trains a model, saves a model, loads and runs model during inference.
* source_dir: the location of our training scripts and requirements.txt file. "requirements.txt" lists packages you want to use with your script.


We use PyTorch-Transformers library (https://pytorch.org/hub/huggingface_pytorch-transformers), which contains PyTorch implementations and pre-trained model weights for many NLP models, including BERT.

Our training script should save model artifacts learned during training to a file path called `model_dir`, as stipulated by the SageMaker PyTorch image. Upon completion of training, model artifacts saved in `model_dir` will be uploaded to S3 by SageMaker and will become available in S3 for deployment.

We save this script in a file named `train_deploy.py`, and put the file in a directory named `code`. The full training script can be viewed under `/code` folder.

In [None]:
from sagemaker.pytorch import PyTorch

estimator = PyTorch(entry_point='train_deploy.py',
                    source_dir='code',
                    role=role,
                    framework_version='1.3.1',
                    train_instance_count=2,  # this script only support distributed training for GPU instances.
                    train_instance_type='ml.p3.2xlarge',
                    hyperparameters={
                        'epochs': 1,
                        'num_labels':2,
                        'backend': 'gloo'
                    })

estimator.fit({'training': inputs_train, 'testing':inputs_test})

# Host

After training our model, we host it on an Amazon SageMaker Endpoint. To make the endpoint load the model and serve predictions, we implement a few methods in `train_deploy.py`.

* `model_fn()`: function defined to load the saved model and return a model object that can be used for model serving. The SageMaker PyTorch model server loads our model by invoking model_fn.
* `input_fn()`: deserializes and prepares the prediction input. In this example, our request body is first serialized to JSON and then sent to model serving endpoint. Therefore, in `input_fn()`, we first deserialize the JSON-formatted request body and return the input as a `torch.tensor`, as required for BERT.
* `predict_fn()`: performs the prediction and returns the result.

To deploy our endpoint, we call `deploy()` on our PyTorch estimator object, passing in our desired number of instances and instance type:


In [None]:
predictor = estimator.deploy(initial_instance_count=1,
                             instance_type='ml.g4dn.xlarge',
                             endpoint_name='g4dn-xlarge')

We then configure the predictor to use `application/json` for the content type when sending requests to our endpoint:

In [None]:
from sagemaker.predictor import json_deserializer, json_serializer

predictor.content_type = 'application/json'
predictor.accept = 'application/json'
predictor.serializer = json_serializer
predictor.deserializer = json_deserializer

In [None]:
result = predictor.predict('Somebody just left - guess who.')
print(np.argmax(result, axis=1))

# Use model that have been trained

If you want to reuse pretrained model, you can create PyTorchModel from existing model artifacts.

In [None]:
from sagemaker.pytorch.model import PyTorchModel 

pytorch_model = PyTorchModel(model_data='<S3 location>/model.tar.gz',
                             role=role,
                             framework_version='1.3.1',
                             source_dir='code',
                             entry_point='train_deploy.py')

predictor = pytorch_model.deploy(instance_type='ml.p2.8xlarge', initial_instance_count=20, endpoint_name='p2.8xlarge')

# Elastic Inference

Amazon Elastic Inference (https://aws.amazon.com/machine-learning/elastic-inference/) solves this problem by enabling you to attach the right amount of GPU-powered inference acceleration to any Amazon SageMaker (https://aws.amazon.com/sagemaker/) or EC2 (http://aws.amazon.com/ec2) instance, or Amazon ECS (http://aws.amazon.com/ecs) task. PyTorch is supported by Elastic Inference since Mar 2020. 

To use Elastic Inference, we must convert our trained model to TorchScript. The location of the model artifacts is `estimator.model_data`. 

In [None]:
estimator.model_data

First we create a folder to save model trained model. We download the model.tar.gz file to local directory. 

In [None]:
%%sh -s $estimator.model_data
mkdir model
aws s3 cp $1 model/ 
tar xvzf model/model.tar.gz --directory ./model

The following code convert our model into TorchScript format.

In [None]:
import subprocess
import torch
from transformers import BertForSequenceClassification

model_torchScript = BertForSequenceClassification.from_pretrained('model/', torchscript=True)
device = 'cpu'
for_jit_trace_input_ids = [0] * 64
for_jit_trace_attention_masks = [0] * 64
for_jit_trace_input, for_jit_trace_masks = torch.tensor([for_jit_trace_input_ids]), torch.tensor([for_jit_trace_input_ids])

# Creating the trace
traced_model = torch.jit.trace(model_torchScript, [for_jit_trace_input.to(device), for_jit_trace_masks.to(device)])
torch.jit.save(traced_model, 'traced_bert.pt')

subprocess.call(['tar', '-czvf', 'traced_bert.tar.gz', 'traced_bert.pt'])

Next we upload TorchScript model to s3 and deploy using Elastic Inference. Loading the TorchScript model and using it for prediction require small changes in our model loading and prediction functions. We create a new script `deploy_EI.py` that is slightly different from `train_deploy.py` script. The accelerator_type=`ml.eia2.xlarge` parameter is how we attach the Elastic Inference accelerator to our endpoint.

In [None]:
import sagemaker
from sagemaker.pytorch import PyTorchModel

sagemaker_session = sagemaker.Session()

instance_type = 'm5.large'
accelerator_type = 'eia2.xlarge'

# TorchScript model
tar_filename = 'traced_bert.tar.gz'

# Returns S3 bucket URL
print('Upload tarball to S3')
model_data = sagemaker_session.upload_data(path=tar_filename)

endpoint_name = 'bert-ei-traced-{}-{}'.format(instance_type, accelerator_type).replace('.', '').replace('_', '')

pytorch = PyTorchModel(model_data=model_data,
                       role=role,
                       source_dir='code',
                       framework_version='1.3.1',
                       py_version='py3',
                       entry_point='deploy_ei.py',
                       sagemaker_session=sagemaker_session)

# Function will exit before endpoint is finished creating
predictor = pytorch.deploy(initial_instance_count=1,
                           instance_type='ml.' + instance_type,
                           accelerator_type='ml.' + accelerator_type,
                           endpoint_name=endpoint_name,
                           wait=False)

Please don't forget delete endpoints afterwards.

In [None]:
predictor.delete_endpoint()