# RoBERTA Fine-Tuning

References:  
    - https://arxiv.org/abs/1907.11692  
    - https://huggingface.co/transformers/model_doc/roberta.html  
    - https://github.com/huggingface/transformers/pull/1275/files  
    - https://github.com/huggingface/transformers/tree/master/examples/token-classification  
    - https://www.kaggle.com/debanga/huggingface-tokenizers-cheat-sheet  
    - https://github.com/billpku/NLP_In_Action  
    - https://androidkt.com/name-entity-recognition-with-bert-in-tensorflow/  
    - https://github.com/smart-patrol/sagemaker-bert

In [1]:
import pandas as pd
import math
import numpy as np
import sagemaker
import torch
import torch.nn.functional as F
import os
import json
from sagemaker.pytorch import PyTorch

In [2]:
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()
bucket = sagemaker_session.default_bucket()

In [3]:
data_directories = json.load(open("utils/objects/data_directories_roberta.json", "r"))
display(data_directories)

{'train_data_directory': 's3://sagemaker-eu-west-1-087816224558/named_entity_recognition/roberta_data/train_roberta.csv',
 'test_data_directory': 's3://sagemaker-eu-west-1-087816224558/named_entity_recognition/roberta_data/test_roberta.csv'}

## Training script

Uncomment cell below to display training script

In [None]:
# ! pygmentize source_roberta/train_roberta.py

## Define hyperparameters

Based on RoBERTa paper and Hugging Face documentation (both mentioned in references) I choosed following set of hyperparameters to fine-tune RoBERTa. BERT-family publications suggests to try larger batch-sizes, however, computation time was too big on batch-size 64 on current machine and I had to terminate it. Anyway, I finally extended training time for one more epoch and that resulted in improvement of F1 Score by almost 1 point (comparing to recommended 3 epochs).

In [4]:
hyperparameters = {'epochs': 4,
                   'n_tags': 20,
                   'max_len': 45,
                   'batch-size': 32
                  }

I used Adam optimizer with weight decay with parameters specified in fine-tuning script for token classification from Hugging Face.  
Initialization of AdamW (snippet from training script):
>```python
    no_decay = ["bias", "LayerNorm.weight"]
    optimizer_grouped_parameters = [
        {"params": [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)],
         "weight_decay": 0.01},
        {"params": [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)],
         "weight_decay": 0.0}
    ]
    optimizer = AdamW(optimizer_grouped_parameters, lr=5e-5, eps=1e-8)
```  

I also prevent exploding gradients by gradient clipping:
>```python
    torch.nn.utils.clip_grad_norm_(parameters=model.parameters(),
                                   max_norm=1.0)
```

## Define estimator

In [5]:
estimator = PyTorch(entry_point="train_roberta.py",
                    source_dir="source_roberta",
                    role=role,
                    framework_version='1.1.0',
                    train_instance_count=1,
                    train_instance_type='ml.p2.xlarge',
                    hyperparameters = hyperparameters
                   )

Training on K80 instance:

In [6]:
estimator.fit({'training': data_directories['train_data_directory'],
               'validation': data_directories['test_data_directory']})

2020-06-06 17:15:12 Starting - Starting the training job...
2020-06-06 17:15:14 Starting - Launching requested ML instances......
2020-06-06 17:16:21 Starting - Preparing the instances for training.........
2020-06-06 17:18:07 Downloading - Downloading input data
2020-06-06 17:18:07 Training - Downloading the training image...
2020-06-06 17:18:40 Training - Training image download completed. Training in progress..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2020-06-06 17:18:41,398 sagemaker-containers INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2020-06-06 17:18:41,424 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2020-06-06 17:18:42,043 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2020-06-06 17:18:42,286 sagemaker-containers INFO     Module train_roberta does not provide a setu

In [None]:
print(estimator.model_data)