# BertLargeUncased - Pytorch
This notebook shows how to compile a pre-trainded BertLarge/Pytorch to AWS Inferentia (inf1 instances) using NeuronSDK. The original implementation is provided by HuggingFace.

**Reference:** https://huggingface.co/bert-large-uncased

## 1) Install dependencies

In [None]:
# Set Pip repository  to point to the Neuron repository
%pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com
# now restart the kernel

In [None]:
#Install Neuron PyTorch
%pip install -U torch-neuron==1.10.1.2.2.0.0 neuron-cc[tensorflow] "protobuf<4" "transformers==4.6.0"
# use --force-reinstall if you're facing some issues while loading the modules
# now restart the kernel again

## 2) Initialize libraries and prepare input samples

In [None]:
import torch
import torch.neuron
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig

# Build tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased")

# Setup some example inputs
sequence_0 = "The company HuggingFace is based in New York City"
sequence_1 = "Apples are especially bad for your health"
sequence_2 = "HuggingFace's headquarters are situated in Manhattan"

max_length=128
paraphrase = tokenizer.encode_plus(sequence_0, sequence_2, max_length=max_length, padding='max_length', truncation=True, return_tensors="pt")
not_paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=max_length, padding='max_length', truncation=True, return_tensors="pt")

# Convert example inputs to a format that is compatible with TorchScript tracing
example_inputs_paraphrase = paraphrase['input_ids'], paraphrase['attention_mask'], paraphrase['token_type_ids']
example_inputs_not_paraphrase = not_paraphrase['input_ids'], not_paraphrase['attention_mask'], not_paraphrase['token_type_ids']

## 3) Load a pre-trained model and check if it is .jit traceable

In [None]:
print(f'Loading a pre-trained model')
model = AutoModelForSequenceClassification.from_pretrained("bert-large-uncased", return_dict=False)
model.eval()
    
y = model(**paraphrase) # warmup the model
try:
    traced_model = torch.jit.trace(model, example_inputs_paraphrase)
    print("Cool! Model is jit traceable")
except Exception as e:
    print("Ops. Something went wrong. Model is not traceable")
## ok the model is .jit traceable. now let's compile it with NeuronSDK

## 4) Analyze & compile the model for Inferentia with NeuronSDK

In [None]:
torch.neuron.analyze_model(model, example_inputs_paraphrase)

In [None]:
neuron_model = torch.neuron.trace(model, example_inputs_paraphrase)
neuron_model.save(f"neuron_bert_large_uncased.pt")

### 4.1) Verify the optimized model

In [None]:
y = neuron_model(*example_inputs_paraphrase) # warmup
%timeit neuron_model(*example_inputs_paraphrase)

## 5) A simple test to check the predictions

In [7]:
import torch
import torch.neuron
# Load TorchScript back
model_neuron = torch.jit.load('neuron_bert_large_uncased.pt')

# Verify the TorchScript works on both example inputs
paraphrase_classification_logits_neuron = model_neuron(*example_inputs_paraphrase)
not_paraphrase_classification_logits_neuron = model_neuron(*example_inputs_not_paraphrase)

classes = ['not paraphrase', 'paraphrase']
paraphrase_prediction = paraphrase_classification_logits_neuron[0][0].argmax().item()
not_paraphrase_prediction = not_paraphrase_classification_logits_neuron[0][0].argmax().item()
print('BERT says that "{}" and "{}" are {}'.format(sequence_0, sequence_2, classes[paraphrase_prediction]))
print('BERT says that "{}" and "{}" are {}'.format(sequence_0, sequence_1, classes[not_paraphrase_prediction]))

BERT says that "The company HuggingFace is based in New York City" and "HuggingFace's headquarters are situated in Manhattan" are paraphrase
BERT says that "The company HuggingFace is based in New York City" and "Apples are especially bad for your health" are not paraphrase
