# Finetuning PyTorch BERT with NGC
The BERT family of models are a powerful set of natural language understanding models based on the transformer architecture originating from the paper Attention Is All You Need, which you can find here:  https://arxiv.org/abs/1706.03762

These models work by running unsupervised pre-training on massive sets of text data. This process requires an enormous amount of time and compute. Luckily for us, BERT models are built for transfer learning. BERT models are able to be finetuned to perform many different NLU tasks like question answering, sentiment analysis, document summarization, and more.

For this tutorial, we are going to download a BERT base model and finetune this model on the Stanford Question Answering Dataset and walk through the steps necessary to deploy it to a Sagemaker endpoint.

In [None]:
!wget https://api.ngc.nvidia.com/v2/models/nvidia/bert_base_pyt_amp_ckpt_pretraining_lamb/versions/1/files/bert_base.pt -O bert_base.pt

In [153]:
import collections
import math
import random
import torch
import os, tarfile, json
import time, datetime
from io import StringIO
import numpy as np
import sagemaker
from sagemaker.pytorch import estimator, PyTorchModel, PyTorchPredictor, PyTorch
from sagemaker.utils import name_from_base
import boto3
from model_utils.file_utils import PYTORCH_PRETRAINED_BERT_CACHE
from model_utils.modeling import BertForQuestionAnswering, BertConfig, WEIGHTS_NAME, CONFIG_NAME
from model_utils.tokenization import (BasicTokenizer, BertTokenizer, whitespace_tokenize)
from types import SimpleNamespace
from model_utils.helper_funcs import *

sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()
bucket = sagemaker_session.default_bucket() # can replace with your own S3 bucket 'privisaa-bucket-virginia'
prefix = 'bert_pytorch_ngc'
runtime_client = boto3.client('runtime.sagemaker')


## Create our training docker container

Now we are going to create a custom docker container based on the NGC Bert container and push it to AWS Elastic Container Registry (ECR). In order to perform this operation, our sagemaker execution role needs to have access to ECR, this can be configured in IAM.

In [144]:
%%sh

# The name of our algorithm
algorithm_name=bert-ngc-torch-train

chmod +x train
chmod +x serve

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-east-1}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.

aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build  -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

# some kind of security auth issue with pushing this to ecr, not authorized to perform ecr:InitiateLayerUpload
docker push ${fullname}

Login Succeeded
Sending build context to Docker daemon  1.939GB
Step 1/15 : ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:19.12-py3
Step 2/15 : FROM ${FROM_IMAGE_NAME}
 ---> be021446e08c
Step 3/15 : RUN apt-get update && apt-get install -y pbzip2 pv bzip2 cabextract nginx wget
 ---> Using cache
 ---> 57fa0c288af0
Step 4/15 : ENV BERT_PREP_WORKING_DIR /workspace/bert/data
 ---> Using cache
 ---> 18a2d20ca0b5
Step 5/15 : WORKDIR /workspace
 ---> Using cache
 ---> f351886be037
Step 6/15 : RUN git clone https://github.com/attardi/wikiextractor.git
 ---> Using cache
 ---> c25298e5adad
Step 7/15 : RUN git clone https://github.com/soskek/bookcorpus.git
 ---> Using cache
 ---> 899be14d9667
Step 8/15 : WORKDIR /workspace/bert
 ---> Using cache
 ---> e3a5b294a784
Step 9/15 : RUN pip install --upgrade --no-cache-dir pip  && pip install --no-cache-dir  gevent flask pathlib gunicorn tqdm boto3 requests six ipdb h5py html2text nltk progressbar onnxruntime git+https://github.com/NVIDIA/dllogger
 ---> Us

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



## Download and inspect the data

To get an idea of what the Stanford Question Answering Dataset contains, let's download it locally and look at it.

In [None]:
!cd data/squad/ && bash squad_download.sh


In [35]:
# load the v2.0 dev set

with open('data/squad/v2.0/dev-v2.0.json', 'r') as f:
    squad_data = json.load(f)


Now that we've loaded some of the data, you can use the below block to look at a random context, question, and answer

In [37]:
ind = random.randint(0,34)
sq = squad_data['data'][ind]
print('Paragraph title: ',sq['title'], '\n')
print(sq['paragraphs'][0]['context'],'\n')
print('Question:', sq['paragraphs'][0]['qas'][0]['question'])
print('Answer:', sq['paragraphs'][0]['qas'][0]['answers'][0]['text'])


Paragraph title:  Force 

Philosophers in antiquity used the concept of force in the study of stationary and moving objects and simple machines, but thinkers such as Aristotle and Archimedes retained fundamental errors in understanding force. In part this was due to an incomplete understanding of the sometimes non-obvious force of friction, and a consequently inadequate view of the nature of natural motion. A fundamental error was the belief that a force is required to maintain motion, even at a constant velocity. Most of the previous misunderstandings about motion and force were eventually corrected by Galileo Galilei and Sir Isaac Newton. With his mathematical insight, Sir Isaac Newton formulated laws of motion that were not improved-on for nearly three hundred years. By the early 20th century, Einstein developed a theory of relativity that correctly predicted the action of forces on objects with increasing momenta near the speed of light, and also provided insight into the forces pr

## View BERT input

BERT needs us to transform our text data into a numeric representation known as tokens. There are a variety of tokenizers available, we are going to use a tokenizer specially designed for BERT that we will instantiate with our vocabulary file. Let's take a look at our transformed question and context we will be supplying BERT for inference.

In [38]:
doc_tokens = sq['paragraphs'][0]['context'].split()
tokenizer = BertTokenizer('vocab', do_lower_case=True, max_len=512)
query_tokens = tokenizer.tokenize(sq['paragraphs'][0]['qas'][0]['question'])

feature = preprocess_tokenized_text(doc_tokens, 
                                    query_tokens, 
                                    tokenizer, 
                                    max_seq_length=384, 
                                    max_query_length=64)

tensors_for_inference, tokens_for_postprocessing = feature
tokens_for_postprocessing
tensors_for_inference

namespace(input_ids=[101, 2054, 4145, 2106, 17586, 1999, 16433, 2224, 2000, 2817, 3722, 6681, 1029, 102, 17586, 1999, 16433, 2109, 1996, 4145, 1997, 2486, 1999, 1996, 2817, 1997, 17337, 1998, 3048, 5200, 1998, 3722, 6681, 1010, 2021, 24762, 2107, 2004, 17484, 1998, 7905, 14428, 6155, 6025, 8050, 10697, 1999, 4824, 2486, 1012, 1999, 2112, 2023, 2001, 2349, 2000, 2019, 12958, 4824, 1997, 1996, 2823, 2512, 1011, 5793, 2486, 1997, 15012, 1010, 1998, 1037, 8821, 14710, 3193, 1997, 1996, 3267, 1997, 3019, 4367, 1012, 1037, 8050, 7561, 2001, 1996, 6772, 2008, 1037, 2486, 2003, 3223, 2000, 5441, 4367, 1010, 2130, 2012, 1037, 5377, 10146, 1012, 2087, 1997, 1996, 3025, 24216, 2015, 2055, 4367, 1998, 2486, 2020, 2776, 13371, 2011, 21514, 14891, 9463, 2072, 1998, 2909, 7527, 8446, 1012, 2007, 2010, 8045, 12369, 1010, 2909, 7527, 8446, 19788, 4277, 1997, 4367, 2008, 2020, 2025, 5301, 1011, 2006, 2005, 3053, 2093, 3634, 2086, 1012, 2011, 1996, 2220, 3983, 2301, 1010, 15313, 2764, 1037, 3399, 1997, 2

## Send data to s3

In [None]:
!aws s3 cp --recursive data/squad s3://{bucket}/{prefix}

In [133]:
s3train = f's3://{bucket}/{prefix}/v1.1/train-v1.1.json'

train = sagemaker.session.s3_input(s3train, distribution='FullyReplicated', 
                        content_type=None, s3_data_type='S3Prefix')

data_channels = {'train': train}

's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.


## Instantiate the model

Now we are going to instantiate our model, here we are going to specify our hyperparameters for training as well as the number of GPUs we are going to use. The ml.p3.16xlarge instances contain 8 V100 volta GPUs, making them ideal for heavy duty deep learning training. 

Once we have set our hyperparameters, we will instantiate a Sagemaker Estimator that we will use to run our training job. We specify the Docker image we just pushed to ECR as well as an entrypoint giving instructions for what operations our container should perform when it starts up. Our Docker container has two commands, train and serve. When we instantiate a training job, behind the scenes Sagemaker is running our Docker container and telling it to run the train command.

In [None]:
account=!aws sts get-caller-identity --query Account --output text

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=!aws configure get region

algoname = 'bert-ngc-torch-train'

fullname="{}.dkr.ecr.{}.amazonaws.com/{}".format(account[0], region[0], algoname)

fullname

In [149]:
# set our hyperparameters
hyperparameters = {'bert_model': 'bert-base-uncased',  'num_train_epochs': 2, 
                   'vocab_file': '/workspace/bert/data/bert_vocab.txt',
                   'config_file':'/workspace/bert/bert_config.json', 
                  'output_dir': '/opt/ml/model',
                  'train_file': '/opt/ml/input/data/train/train-v1.1.json', #'/workspace/bert/data/squad/v1.1/train-v1.1.json',
                  'num_gpus':8, 'train_batch_size':32, 'max_seq_length':512, 'doc_stride':128, 'seed':1,
                  'learning_rate':3e-5,
                  'save_to_s3':bucket}

# instantiate model
torch_model = PyTorch( role=role,
                      train_instance_count=1,
                      train_instance_type='ml.p3dn.24xlarge',
                      entry_point='transform_script.py',
                      image_name=fullname,
                      framework_version='1.5.0',
                      hyperparameters=hyperparameters
                     )


Parameter image_name will be renamed to image_uri in SageMaker Python SDK v2.


## Fine-tune the model

If you use an instance with 4 GPUs and a batch size of 4 this process will take ~15 minutes to complete for this particular finetuning task with 2 epochs. Each additional epoch will add another 7 or so minutes. It's recommended to at minimum use a training instance with 4 GPUs, although you will likely get better performance with one of the ml.p3.16xlarge or ml.p3dn.24xlarge instances. 

In [150]:
torch_model.fit(data_channels)

2020-08-21 02:08:52 Starting - Starting the training job...
2020-08-21 02:09:00 Starting - Launching requested ML instances......
2020-08-21 02:10:13 Starting - Preparing the instances for training......
2020-08-21 02:11:17 Downloading - Downloading input data
[34m== PyTorch ==[0m
[0m
[34mNVIDIA Release 19.12 (build 9142930)[0m
[34mPyTorch Version 1.4.0a0+a5b4d78
[0m
[34mContainer image Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.
[0m
[34mCopyright (c) 2014-2019 Facebook Inc.[0m
[34mCopyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)[0m
[34mCopyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)[0m
[34mCopyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)[0m
[34mCopyright (c) 2011-2013 NYU                      (Clement Farabet)[0m
[34mCopyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)[0m
[34mCopyright (c) 2006      Idiap Research Institute (Samy Beng

## Deploy our trained model

Now that we've finetuned our base BERT model, what now? Let's deploy our trained model to an endpoint and ask it some questions!

In [75]:
endpoint_name = 'bert-endpoint-byoc-150'

# if deploying from a model you trained in the same session 
#     bert_end = torch_model.deploy(instance_type='ml.g4dn.4xlarge', initial_instance_count=1, 
#                           endpoint_name=endpoint_name)

model_data = f's3://{bucket}/model.tar.gz'

# We are going to use a SageMaker serving container
torch_model = PyTorchModel(model_data=model_data,
                       role=role,
                      entry_point='transform_script.py',
                      framework_version='1.5.0')
bert_end = torch_model.deploy(instance_type='ml.g4dn.4xlarge', initial_instance_count=1, 
                          endpoint_name=endpoint_name)

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


-----------------!

Now that our endpoint has been deployed, let's send it some requests! 

In [83]:
%%time

context='Danielle is a girl who really loves her cat, Steve. Steve is a large cat with a very furry belly. He gets very excited by the prospect of eating chicken covered in gravy.'
question='who loves Steve?'  # 'What kind of food does Steve like?'

pass_in_data = {'context':context, 'question':question}
json_data = json.dumps(pass_in_data)


if model_data:
    response = runtime_client.invoke_endpoint(EndpointName=endpoint_name,
                                           ContentType='application/json',
                                           Body=json_data)
    response = eval(response['Body'].read().decode('utf-8'))
    doc_tokens = context.split()
    tokenizer = BertTokenizer('vocab', do_lower_case=True, max_len=512)
    query_tokens = tokenizer.tokenize(question)
    feature = preprocess_tokenized_text(doc_tokens, 
                                        query_tokens, 
                                        tokenizer, 
                                        max_seq_length=384, 
                                        max_query_length=64)
    tensors_for_inference, tokens_for_postprocessing = feature
    response = get_predictions(doc_tokens, tokens_for_postprocessing, 
                             response[0], response[1], n_best_size=1, 
                             max_answer_length=64, do_lower_case=True, 
                             can_give_negative_answer=True, 
                             null_score_diff_threshold=-11.0)

#response = bert_end.predict(json.dumps(pass_in_data), initial_args={'ContentType':'application/json'}) 

# print result
print(f'{question} : {response[0]["text"]}')

who loves Steve? : Danielle
CPU times: user 53.3 ms, sys: 0 ns, total: 53.3 ms
Wall time: 118 ms


In [None]:
!rm bert_base.pt
!rm s3_bucket.txt
bert_end.delete_endpoint()