# Finetuning PyTorch BERT with NGC
The BERT family of models are a powerful set of natural language understanding models based on the transformer architecture from the paper Attention Is All You Need, which you can find here:  https://arxiv.org/abs/1706.03762

These models work by running unsupervised pre-training on massive sets of text data. This process requires an enormous amount of time and compute. Luckily for us, BERT models are built for transfer learning. BERT models are able to be finetuned to perform many different NLU tasks like question answering, sentiment analysis, document summarization, and more.

For this tutorial, we are going to download a BERT base model and finetune this model on the Stanford Question Answering Dataset and walk through the steps necessary to deploy it to a Sagemaker endpoint.

In [None]:
!wget https://api.ngc.nvidia.com/v2/models/nvidia/bert_base_pyt_amp_ckpt_pretraining_lamb/versions/1/files/bert_base.pt -O bert_base.pt

In [19]:
import collections
import math
import torch
import os, tarfile, json
import time, datetime
from io import StringIO
import numpy as np
import sagemaker
from sagemaker.pytorch import estimator, PyTorchModel, PyTorchPredictor, PyTorch
from sagemaker.utils import name_from_base
import boto3
from file_utils import PYTORCH_PRETRAINED_BERT_CACHE
from modeling import BertForQuestionAnswering, BertConfig, WEIGHTS_NAME, CONFIG_NAME
from tokenization import (BasicTokenizer, BertTokenizer, whitespace_tokenize)
from types import SimpleNamespace
from helper_funcs import *

sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()
bucket = sagemaker_session.default_bucket() # can replace with your own S3 bucket 'privisaa-bucket-virginia'
prefix = 'bert_pytorch_ngc'
runtime_client = boto3.client('runtime.sagemaker')

with open('s3_bucket.txt','w') as f:
    f.write(f's3://{bucket}')
with open('hyperparameters.json', 'r') as f:
    params = json.load(f)
params['save_to_s3'] = bucket
with open('hyperparameters.json', 'w') as f:
    json.dump(params, f)

## Create our training docker container

Now we are going to create a custom docker container based on the NGC Bert container and push it to AWS Elastic Container Registry (ECR)

In [None]:
%%sh

# The name of our algorithm
algorithm_name=bert-ngc-torch-train

chmod +x train
chmod +x serve

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-east-1}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.

aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build  -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

# some kind of security auth issue with pushing this to ecr, not authorized to perform ecr:InitiateLayerUpload
docker push ${fullname}

## Instantiate the model

Now we are going to instantiate our model, here we are going to specify our hyperparameters for training as well as the number of GPUs we are going to use. The ml.p3.16xlarge instances contain 8 V100 volta GPUs, making them ideal for heavy duty deep learning training. 

Once we have set our hyperparameters, we will instantiate a Sagemaker Estimator that we will use to run our training job. We specify the Docker image we just pushed to ECR as well as an entrypoint giving instructions for what operations our container should perform when it starts up. Our Docker container has two commands, train and serve. When we instantiate a training job, behind the scenes Sagemaker is running our Docker container and telling it to run the train command.

In [None]:
account=!aws sts get-caller-identity --query Account --output text

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=!aws configure get region

algoname = 'bert-ngc-torch-train'

fullname="{}.dkr.ecr.{}.amazonaws.com/{}".format(account[0], region[0], algoname)

fullname

In [5]:
# set our hyperparameters
hyperparameters = {'bert_model': 'bert-base-uncased',  'num_train_epochs': 1, 
                   'vocab_file': '/workspace/bert/data/bert_vocab.txt',
                   'config_file':'/workspace/bert/bert_config.json', 
                  'output_dir': 'opt/ml/model',
                  'train_file': '/workspace/bert/data/squad/v1.1/train-v1.1.json',
                  'num_gpus':8, 'train_batch_size':8, 'max_seq_length':512, 'doc_stride':128, 'seed':1,
                  'learning_rate':3e-5,
                  'save_to_s3':bucket}

# instantiate model
torch_model = PyTorch( role=role,
                      train_instance_count=2,
                      train_instance_type='ml.p3.16xlarge',
                      entry_point='transform_script.py',
                      image_name=fullname,
                      framework_version='1.4.0',
                      hyperparameters=hyperparameters
                     )


## Fine-tune the model

If you use an instance with 4 GPUs and a batch size of 3 this process will take ~15 minutes to complete for this particular finetuning task with 2 epochs. Each additional epoch will add another 7 or so minutes. It's recommended to at minimum use a training instance with 4 GPUs, although you will likely get better performance with one of the ml.p3.16xlarge or ml.p3dn.24xlarge instances. 

In [None]:
torch_model.fit()

## Deploy our trained model

Now that we've finetuned our base BERT model, what now? Let's deploy our trained model to an endpoint and ask it some questions!

In [8]:
endpoint_name = 'bert-endpoint-byoc'

#     bert_end = torch_model.deploy(instance_type='ml.g4dn.4xlarge', initial_instance_count=1, 
#                           endpoint_name=endpoint_name)

model_data = f's3://{bucket}/model.tar.gz'

torch_model = PyTorchModel(model_data=model_data,
                       role=role,
                      entry_point='transform_script.py',
                      framework_version='1.4.0')
bert_end = torch_model.deploy(instance_type='ml.g4dn.4xlarge', initial_instance_count=1, 
                          endpoint_name=endpoint_name)

---------------!

Now that our endpoint has been deployed, let's send it some requests! 

In [15]:
%%time
context='Danielle is a girl who really loves her cat, Steve. Steve is a large cat with a very furry belly. He gets very excited by the prospect of eating chicken covered in gravy.'
question='who loves Steve?'  # 'What kind of food does Steve like?'

pass_in_data = {'context':context, 'question':question}
json_data = json.dumps(pass_in_data)


if model_data:
    response = runtime_client.invoke_endpoint(EndpointName=endpoint_name,
                                           ContentType='application/json',
                                           Body=json_data)
    response = eval(response['Body'].read().decode('utf-8'))
    doc_tokens = context.split()
    tokenizer = BertTokenizer('vocab', do_lower_case=True, max_len=512)
    query_tokens = tokenizer.tokenize(question)
    feature = preprocess_tokenized_text(doc_tokens, 
                                        query_tokens, 
                                        tokenizer, 
                                        max_seq_length=384, 
                                        max_query_length=64)
    tensors_for_inference, tokens_for_postprocessing = feature
    response = get_predictions(doc_tokens, tokens_for_postprocessing, 
                             response[0], response[1], n_best_size=1, 
                             max_answer_length=64, do_lower_case=True, 
                             can_give_negative_answer=True, 
                             null_score_diff_threshold=-11.0)

#response = bert_end.predict(json.dumps(pass_in_data), initial_args={'ContentType':'application/json'}) 

# print result
print(f'{question} : {response[0]["text"]}')

who loves Steve? : Danielle
CPU times: user 61.3 ms, sys: 0 ns, total: 61.3 ms
Wall time: 121 ms


In [None]:
!rm bert_base.pt
!rm s3_bucket.txt
bert_end.delete_endpoint()