# Maximizing NLP model performance using automatic model tuning in Amazon SageMaker

# Introduction

This notebook shows how to fine tune natural language processing (NLP) models in Amazon SageMaker and do automatic model tunning using hyperparameter optimization. We use the Hugging Face's [pytorch-transformers](https://github.com/huggingface/pytorch-transformers) as example code and library to build and train models.

There are two datasets to be used in this demo. One is the MRPC data for the General Language Understanding Evaluation ([GLUE](https://gluebenchmark.com/tasks/)) task, and the other is [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) 1.1 data for questions and answering.

More Amazon SageMaker hyperparameter tunning notebook examples can be found [here](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/hyperparameter_tuning).

From this blog post:  https://aws.amazon.com/blogs/machine-learning/maximizing-nlp-model-performance-with-automatic-model-tuning-in-amazon-sagemaker/

# Data and training script preparation

### Download data and code

GLUE data can be download by using this [script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e).

In [1]:
# Download all GLUE data to a local folder

!python download_glue_data.py --data_dir glue_data --tasks all

Downloading and extracting CoLA...
	Completed!
Downloading and extracting SST...
	Completed!
Processing MRPC...
Local MRPC data not specified, downloading data from https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_train.txt
	Completed!
Downloading and extracting QQP...
	Completed!
Downloading and extracting STS...
	Completed!
Downloading and extracting MNLI...
	Completed!
Downloading and extracting SNLI...
	Completed!
Downloading and extracting QNLI...
	Completed!
Downloading and extracting RTE...
	Completed!
Downloading and extracting WNLI...
	Completed!
Downloading and extracting diagnostic...
	Completed!


Training scripts can be download with git cloning [pytorch-transformers](https://github.com/huggingface/pytorch-transformers). The `examples` folder has training script `run_glue.py` for GLUE data and  `run_squad.py` for SQuAD data.

In [2]:
# Download GitHub code to local machine

!git clone https://github.com/huggingface/pytorch-transformers.git

Cloning into 'pytorch-transformers'...
remote: Enumerating objects: 143, done.[K
remote: Counting objects: 100% (143/143), done.[K
remote: Compressing objects: 100% (54/54), done.[K
remote: Total 22228 (delta 70), reused 122 (delta 62), pack-reused 22085[K
Receiving objects: 100% (22228/22228), 13.18 MiB | 47.01 MiB/s, done.
Resolving deltas: 100% (15949/15949), done.


### Modify scripts for Amazon SageMaker use

To avoid editing the scripts inside the git folder, we copied the relevant python scripts from the folder ./pytorch-transformers/examples/ to ./train_scripts/. 

We made minimal changes to run_glue.py and run_squad.py to make them work with the Amazon SageMaker PyTorch framework. The changes can be found by checking the comments for `'for SageMaker use'`. These changes are largely around the way to pass arugments to the python script. In Amazon SageMaker, the easiest way to pass input arguments is as hyperparameters passed to the training job. Here are some examples of the changes made to the script:

The original run_glue.py treats argument `do_train` as a boolean, to trigger model training:
```Python
parser.add_argument("--do_train", action='store_true', help="Whether to run training.")
```

We've modified the `do_train` argument to accept string inputs:
```Python
parser.add_argument("--do_train", type=str2bool, nargs='?', const=True, default=False, help="Whether to run training.")
```

with the function `str2bool()` defined in this way:

```Python
def str2bool(v):
    if isinstance(v, bool):
        return v
    if v.lower() in ('yes', 'true', 't', 'y', '1'):
        return True
    elif v.lower() in ('no', 'false', 'f', 'n', '0'):
        return False
    else:
        raise argparse.ArgumentTypeError('Boolean value expected.')`
```
        
We do this because it is not possible to pass boolean arguments into the Amazon SageMaker training job implicitly, as the orginal format was expecting; instead, we must pass an explicit value along with the `do_train` parameter. Similar changes applied the the `run_squad.py` script as well. We also made a minor change in `utils_glue.py` to allow using Python 3 to read data. Another change in the script is to print out the model evaluaton results into the CloudWatch history.

### Create requirements.txt for installing dependent packages in PyTorch container

We need to create a `requirements.txt` file in the same directory (./train_scripts/) as the training scripts. The requirements.txt file should include packages required by the training script that are not pre-installed by default in the Amazon SageMaker PyTorch container. We will need to install 3 pacakges for this demo:

*pytorch_transformers* <br>
*tensorboardX* <br>
*scikit-learn*

A `requirements.txt` file is text file that contains a list of items that are installed via pip. When we launch training jobs, the Amazon SageMaker container automatically looks for a `requirements.txt` file in the script source folder and uses `pip install` to install all packages listed in that file. 

# Enviornment set up

In [8]:
import os
import boto3
import sagemaker
import numpy as np
import matplotlib.pyplot as plt

from time import gmtime, strftime 
from sagemaker.tuner import IntegerParameter, CategoricalParameter, ContinuousParameter, HyperparameterTuner
from sagemaker.pytorch import PyTorch

role = sagemaker.get_execution_role() # we are using the notebook instance role for training in this example

sagemaker_session = sagemaker.Session()
bucket = sagemaker_session.default_bucket() # you can specify a bucket name here

# Example 1: fine tune MRPC dataset

## Upload data to S3

In [9]:
task_name = 'MRPC'
s3_prefix = 'sagemaker/pytorch-transfomers/' + task_name

# data path in SageMaker notebook instance. Here we use the glue data MRPC for model fine tuning
data_dir = os.path.join(os.path.join(os.getcwd(), 'glue_data'), task_name)

# upload data to S3
inputs_glue = sagemaker_session.upload_data(path=data_dir, bucket=bucket, key_prefix=s3_prefix)
print('input spec (in this case, just an S3 path): {}'.format(inputs_glue))

input spec (in this case, just an S3 path): s3://sagemaker-us-east-1-835319576252/sagemaker/pytorch-transfomers/MRPC


## Train model

In [10]:
# data path for the SageMaker PyTorch container. We don't need to create an own container. 
container_data_dir = '/opt/ml/input/data/training'
container_model_dir = '/opt/ml/model'

# input arguments for the training script and initial values for some hyperparameters
parameters = {
    'model_type': 'bert',
    'model_name_or_path' : 'bert-base-uncased',
    'task_name': task_name,
    'data_dir': container_data_dir,
    'output_dir': container_model_dir,
    'num_train_epochs': 1,
    'per_gpu_train_batch_size': 64,
    'per_gpu_eval_batch_size': 64,
    'save_steps': 150,
    'logging_steps': 150,
    'do_train': True,
    'do_eval': True,
    'do_lower_case': True
    # you can add more input arguments here
}

In [11]:
# Amazon SageMaker PyTorch framework

train_instance_type = 'ml.p3.2xlarge'

glue_estimator = PyTorch(entry_point='run_glue.py',
                    source_dir = './train_scripts/', # the local directory stores all relevant scripts for modeling
                    hyperparameters=parameters,
                    role=role,
                    framework_version='1.1.0',
                    train_instance_count=1,
                    train_instance_type=train_instance_type
                   )

In [12]:
# check input data's s3 path
inputs_glue

's3://sagemaker-us-east-1-835319576252/sagemaker/pytorch-transfomers/MRPC'

In [None]:
# launch model training job
glue_estimator.fit({'training': inputs_glue})

2020-03-18 17:43:50 Starting - Starting the training job...
2020-03-18 17:43:52 Starting - Launching requested ML instances......
2020-03-18 17:44:56 Starting - Preparing the instances for training..

## Automatic model tuning - hyperparameter optimization

SageMaker uses the training job CloudWatch logs to extract metrics for hyperparameter optimization, processing the logs with a simple regular expression.

For example, the `glue_estimator` training log has this printout for the model evaluation results:

*Evaluation result =  {'acc_': 0.8455882352941176, 'f1_': 0.8941176470588236, 'acc_and_f1_': 0.8698529411764706}*

Here, we want to use F1 score as the optimization metric.

In [None]:
# step 1: define optimization metric

metric_definitions = [{'Name': 'f1_score',
                       'Regex': '\'f1_\': ([0-9\\.]+)'}]

In [None]:
import os
import boto3
import sagemaker
import numpy as np
import matplotlib.pyplot as plt

from time import gmtime, strftime 
from sagemaker.tuner import IntegerParameter, CategoricalParameter, ContinuousParameter, HyperparameterTuner
from sagemaker.pytorch import PyTorch

# step 2: define the hyperparameter range. Here we only tune the learning rate. 

hyperparameter_ranges = {
        'learning_rate': ContinuousParameter(5e-06, 5e-04, scaling_type="Logarithmic")       
    }

objective_metric_name = 'f1_score'

In [None]:
# step 3: launch the hyperparameter tuning job

tuner = HyperparameterTuner(glue_estimator,
                            objective_metric_name,
                            hyperparameter_ranges,
                            metric_definitions,
                            strategy = 'Bayesian',
                            objective_type = 'Maximize',
                            max_jobs = 5,
                            max_parallel_jobs = 5,
                            early_stopping_type = 'Auto')



In [None]:
# we can track the tuning job progress in the SageMaker console by the tuning_job_name
glue_tuning_job_name = "pt-bert-mrpc-bs-{}".format(strftime("%d-%H-%M-%S", gmtime())) 

# launch model tuning job
tuner.fit({'training': inputs_glue}, job_name = glue_tuning_job_name)
tuner.wait()

## Optional: check hyperparameter tuning results

In [None]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

matplotlib.rc('xtick', labelsize=12) 
matplotlib.rc('ytick', labelsize=12) 

In [None]:
tuner_metrics = sagemaker.HyperparameterTuningJobAnalytics(glue_tuning_job_name)
hpo_report = tuner_metrics.dataframe().sort_values(['FinalObjectiveValue'], ascending=False)

hpo_report['job_id'] = len(hpo_report) - hpo_report.index
hpo_report.sort_values(by='job_id', inplace=True)

In [None]:
# the value of the best learning rate is extracted from the 'Hyperparameter tuning jobs' console

best_lr = 6.470088521571402e-05 # update this value

In [None]:
plt.figure(figsize=(6,4))
x = hpo_report['learning_rate']
y = hpo_report['FinalObjectiveValue']
plt.scatter(x, y, alpha=0.8)

line_x = [best_lr, best_lr]
line_y = [0, 1]
plt.plot(line_x, line_y, linestyle='--', linewidth=1, color='orange')

plt.xlim(5e-6, 6e-4)
plt.xscale('log')
plt.ylim(0.75, 0.95)
plt.xlabel('Learning rate', fontsize=14)
plt.ylabel('F1 score', fontsize=14)
plt.title('MRPC: F1 score curve over learning rate', fontsize=14)
plt.grid()
#plt.savefig('figures/MRPC_F1_learning_rate.png', dpi=200, transparent=True, bbox_inches='tight')
plt.show()

plt.figure(figsize=(6,4))
x = hpo_report['job_id']
y = hpo_report['FinalObjectiveValue']
plt.plot(x, y, alpha=0.8, linestyle='-', marker='o')
plt.ylim(0.75, 0.95)
plt.ylabel('F1 score', fontsize=14)
plt.xlabel('Training job order index', fontsize=14)
plt.title('MRPC: F1 score history', fontsize=14)
plt.grid()
#plt.savefig('figures/MRPC_F1_job_order.png', dpi=200, transparent=True, bbox_inches='tight')
plt.show()

plt.figure(figsize=(6,4))
x = len(hpo_report) - hpo_report.index
y = hpo_report['learning_rate']

line_y = [best_lr, best_lr]
line_x = [0, 40]
plt.plot(x, y, alpha=0.8, linestyle='-', marker='o')
plt.plot(line_x, line_y, linestyle='--', linewidth=1, color='orange')

plt.xlim(0, 31)
plt.ylim(5e-6, 6e-4)
plt.yscale('log')
plt.ylabel('Learning rate', fontsize=14)
plt.xlabel('Training job order index', fontsize=14)
plt.title('MRPC: learning rate search history', fontsize=14)
plt.grid()
#plt.savefig('figures/MRPC_lr_job_order.png', dpi=200, transparent=True, bbox_inches='tight')
plt.show()

# Example 2: fine tune SQuAD dataset

## Download SQuAD dataset


In [None]:
!wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json -P squad_data/
!wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json -P squad_data/

## Upload data to S3

In [None]:
task_name = 'squad'
s3_prefix = 'sagemaker/pytorch-transfomers/' + task_name

# data path in SageMaker notebook instance. Here we use the glue data MRPC for model fine tuning
data_dir = os.path.join(os.getcwd(), 'squad_data')

# upload data to S3
inputs_squad = sagemaker_session.upload_data(path=data_dir, bucket=bucket, key_prefix=s3_prefix)

print('input spec (in this case, just an S3 path): {}'.format(inputs_squad))

## Train model

In [None]:
# data path for the SageMaker PyTorch container. We don't need to create an own container. 
container_data_dir = '/opt/ml/input/data/training'
container_model_dir = '/opt/ml/model'

# input arguments for the training script and initial values for some hyperparameters
parameters = {
    'model_type': 'bert',
    'model_name_or_path' : 'bert-base-uncased',
    'train_file': container_data_dir+'/train-v1.1.json', # specify dataset version
    'predict_file': container_data_dir+'/dev-v1.1.json',
    'output_dir': container_model_dir,
    'learning_rate': 5e-5,
    'per_gpu_train_batch_size': 16,
    'per_gpu_eval_batch_size': 16,
    'num_train_epochs': 1,
    'max_seq_length': 384,
    'doc_stride': 128,
    'save_steps': 10000,
    'logging_steps': 10000,
    'do_train': True,
    'do_eval': True,
    'do_lower_case': True,
    'version_2_with_negative': False # False is for the 1.1 dataset. True is for SQuAD 2.0. 
}


In [None]:
# Amazon SageMaker PyTorch framework

train_instance_type = 'ml.p3.2xlarge'

squad_estimator = PyTorch(entry_point='run_squad.py',
                    source_dir = './train_scripts/',  # the local directory stores all relevant scripts for modeling
                    hyperparameters=parameters,
                    role=role,
                    framework_version='1.1.0',
                    train_instance_count=1,
                    train_instance_type=train_instance_type
                   )

In [None]:
# check input data's s3 path
inputs_squad

In [None]:
# launch model training job
squad_estimator.fit({'training': inputs_squad})

## Automatic model tuning - hyperparameter optimization

SageMaker uses the training job CloudWatch logs to extract metrics for hyperparameter optimization, processing the logs with a simple regular expression.

For example, the `squad_estimator` training log has this printout for the model evaluation results:

*Evaluation result ={'exact': 80.71901608325449, 'f1': 88.0493020797288, 
                     'total': 10570, 'HasAns_exact': 80.71901608325449, 
                     'HasAns_f1': 88.0493020797288, 'HasAns_total': 10570}*

Here, we want to use F1 score as the optimization metric.

In [None]:
# step 1: define optimization metric

metric_definitions = [{'Name': 'f1_score',
                       'Regex': '\'f1\': ([0-9\\.]+)'}]

In [None]:
# step 2: define the hyperparameter range. Here we only tune the learning rate. 

hyperparameter_ranges = {
        'learning_rate': ContinuousParameter(1e-05, 5e-04, scaling_type="Logarithmic")       
    }

objective_metric_name = 'f1_score'
objective_type = 'Maximize'

In [None]:
# step 3: launch the hyperparameter tuning job

tuner = HyperparameterTuner(squad_estimator,
                            objective_metric_name,
                            hyperparameter_ranges,
                            metric_definitions,
                            strategy = 'Bayesian',
                            objective_type = 'Maximize',
                            max_jobs = 5,
                            max_parallel_jobs = 5,
                            early_stopping_type = 'Auto')

# we can track the tuning job progress in the SageMaker console by the tuning_job_name
squad_tuning_job_name = "pt-squad1-bs-{}".format(strftime("%d-%H-%M-%S", gmtime()))

# launch model tuning job
tuner.fit({'training': inputs_squad}, job_name=squad_tuning_job_name)
tuner.wait()

In [None]:
squad_tuning_job_name

## Optional: check hyperparameter tunning results

In [None]:
tuner_metrics = sagemaker.HyperparameterTuningJobAnalytics(squad_tuning_job_name)
tuner_metrics.dataframe().sort_values(['FinalObjectiveValue'], ascending=False).head(10)

hpo_report = tuner_metrics.dataframe().sort_values(['FinalObjectiveValue'], ascending=False)
hpo_report['job_id'] = len(hpo_report) - hpo_report.index
hpo_report.sort_values(by='job_id', inplace=True)

In [None]:
# the value of the best learning rate is extracted from the 'Hyperparameter tuning jobs' console

best_lr = 5.7330400829294637e-05 # update this value

In [None]:
plt.figure(figsize=(6,4))
x = hpo_report['learning_rate']
y = hpo_report['FinalObjectiveValue']
plt.scatter(x, y, alpha=0.8)

line_x = [best_lr, best_lr]
line_y = [0, 1]
plt.plot(line_x, line_y, linestyle='--', linewidth=1, color='orange')

plt.xlim(5e-6, 6e-4)
plt.xscale('log')
plt.ylim(0.75, 0.95)
plt.xlabel('Learning rate', fontsize=14)
plt.ylabel('F1 score', fontsize=14)
plt.title('MRPC: F1 score curve over learning rate', fontsize=14)
plt.grid()
#plt.savefig('figures/SQUAD_F1_learning_rate.png', dpi=200, transparent=True, bbox_inches='tight')
plt.show()

plt.figure(figsize=(6,4))
x = hpo_report['job_id']
y = hpo_report['FinalObjectiveValue']
plt.plot(x, y, alpha=0.8, linestyle='-', marker='o')
plt.ylim(0.75, 0.95)
plt.ylabel('F1 score', fontsize=14)
plt.xlabel('Training job order index', fontsize=14)
plt.title('MRPC: F1 score history', fontsize=14)
plt.grid()
#plt.savefig('figures/SQUAD_F1_job_order.png', dpi=200, transparent=True, bbox_inches='tight')
plt.show()

plt.figure(figsize=(6,4))
x = len(hpo_report) - hpo_report.index
y = hpo_report['learning_rate']

line_y = [best_lr, best_lr]
line_x = [0, 40]
plt.plot(x, y, alpha=0.8, linestyle='-', marker='o')
plt.plot(line_x, line_y, linestyle='--', linewidth=1, color='orange')

plt.xlim(0, 31)
plt.ylim(5e-6, 6e-4)
plt.yscale('log')
plt.ylabel('Learning rate', fontsize=14)
plt.xlabel('Training job order index', fontsize=14)
plt.title('MRPC: learning rate search history', fontsize=14)
plt.grid()
#plt.savefig('figures/SQUAD_lr_job_order.png', dpi=200, transparent=True, bbox_inches='tight')
plt.show()