---
Author: **`Crispen Gari`**

Date: **`2021-09-09`**

Topic: **`Part of Speech Tagging (PoS) with transformers.`**

Language: **`Python`**

Library: **`Pytorch`**

Main: **`Natural Language Processing (NLP)`**

----

### Transformers in PoS Tagging (Fine Tunning)

In this notebook we are going to perform PoS Tagging with different tags from the previous one. Basically we are going to use the PTB tags intead of the UD tags that we have used in the previous notebook. We are going to fine tune the BERT model as we didi in the previous notebook. 

Everythin will remain almost the same from the previous notebook since, this notebook is a clone of the previous notebook.


### Imports

In [1]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.10.0-py3-none-any.whl (2.8 MB)
[K     |████████████████████████████████| 2.8 MB 5.2 MB/s 
Collecting tokenizers<0.11,>=0.10.1
  Downloading tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3 MB)
[K     |████████████████████████████████| 3.3 MB 31.5 MB/s 
Collecting sacremoses
  Downloading sacremoses-0.0.45-py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 48.7 MB/s 
[?25hCollecting huggingface-hub>=0.0.12
  Downloading huggingface_hub-0.0.16-py3-none-any.whl (50 kB)
[K     |████████████████████████████████| 50 kB 6.4 MB/s 
[?25hCollecting pyyaml>=5.1
  Downloading PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636 kB)
[K     |████████████████████████████████| 636 kB 47.3 MB/s 
Installing collected packages: tokenizers, sacremoses, pyyaml, huggingface-hub, transformers
  Attempting uninstall: pyyaml
    Found existing installation: Py

In [2]:
import torch
from torch import nn
from torch.nn import functional as F

from torchtext.legacy import data, datasets
from transformers import BertTokenizer, BertModel

import numpy as np

import os, time, random, functools

from prettytable import PrettyTable

torch.__version__

'1.9.0+cu102'

### Seeds and Device

In [3]:

SEED = 42

random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)

torch.backends.cudnn.deterministic = True

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

device(type='cuda')

### BERT Tokenizer
This tokenizer defines how the text processed for the model, but most importantly it contains the vocabulary that the BERT model was trained with. We will be using the `bert-base-uncased` tokenizer model, which was trained with lower cased text.

In [4]:
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

In order to use pretrained models for NLP the vocabulary used needs to exactly match that of the pretrained model.

Another thing that we need to do is make sure the input sequence is formatted in the same way in which the BERT model was trained.

BERT was trained on sequences that begin with a ``[CLS]`` token. Example:

```py
text = ["i", "love", "python", "ai"]

# will be
text = ["[CLS]", "i", "love", "python", "ai"]
```

Along with making our vocabularies match we also need to make sure our padding and unk tokens match those used in the pretrained model. By default `TorchText` uses `<pad>` and `<unk>`, but the BERT model uses `[PAD]` and `[UNK]`.

Let's have a look at bert special tokens





In [5]:
init_token = tokenizer.cls_token
pad_token = tokenizer.pad_token
unk_token = tokenizer.unk_token

print(init_token, pad_token, unk_token)

[CLS] [PAD] [UNK]


We are mainly interested in the actual integer representations of the special tokens. This is because we aren't using TorchText's vocabulary module, but using the one provided by the pretrained model.

We can find those by executing the following cell

In [6]:
init_token_idx = tokenizer.cls_token_id
pad_token_idx = tokenizer.pad_token_id
unk_token_idx = tokenizer.unk_token_id

print(init_token_idx, pad_token_idx, unk_token_idx)

101 0 100


Another this is that our pretrained model was trained with a sequence of maximum length which is `512`. We need to make sure that our sequences also matches this.

In [7]:
max_input_length = tokenizer.max_model_input_sizes["bert-base-uncased"]
max_input_length

512

Next we need to create some helper functions.

The first helper function will cut sequences of tokens to desired maximum length specified by our pretrained model and then converts the tokens into indexes by passing them throught the vocabulary. This is what we will be using on our input sequnences we want to tag.

Note that we acually cut tokens to `max_input_length-1`. This is because we want to add the `[CLS]` token at the begining of the sequence.

In [8]:
def cut_and_convert_to_ids(tokens, tokenizer, max_input_length=512):
  tokens = tokens[:max_input_length - 1]
  return tokenizer.convert_tokens_to_ids(tokens)

The second helper function simply cuts the sequence to the maximum length. This is used for our tags. We do not pass the tags through pretrained model's vocabulary as the vocab was only built for English sentences, and not for part-of-speech tags. We will be building the tag vocabulary ourselves.

In [9]:
def cut_to_max_length(tokens, max_input_length):
  return tokens[:max_input_length - 1]


We need to pass the above two functions to the Field, the TorchText abstraction that handles a lot of the data processing for us. We make use of Python's functools that allow us to pass functions which already have some of their arguments supplied.

In [10]:
text_preprocessor = functools.partial(
    cut_and_convert_to_ids,
    tokenizer=tokenizer,
    max_input_length =max_input_length
)
tag_preprocessor = functools.partial(
    cut_to_max_length,
    max_input_length =max_input_length
)

### Fields

For the `TEXT` field, which will be processing the sequences we want to tag, we first tell TorchText that we do not want to use a vocabulary with `use_vocab = False`. As our model is uncased, we also want to ensure all text is lowercased with `lower=True`. The preprocessing argument is a function applied to sequences after they have been tokenized, but before they are numericalized. As we have set `use_vocab` to false, they will never actually be numericalized, and as we are using TorchText's POS datasets they have also already been tokenized - so the argument to this will just be applied to the sequence of tokens. This is where our help functions from above come in handy and `text_preprocessor` will both numericalize our data using the pretrained model's vocabulary, as well as cutting it to the maximum length. The remaining four arguments define the special tokens required by the pretrained model.

For the ``PTB_TAGS`` field, we need to ensure the length of our tags matches the length of our text sequence. As we have added a ``[CLS]`` token to the beginning of the text sequence, we need to do the same with the sequence of tags. We do this by adding a ``<pad>`` token to the beginning which we will later tell our model to not use when calculating losses or accuracy. We won't have unknown tags in our sequence of tags, so we set the ``unk_token`` to None. Finally, we pass our ``tag_preprocessor`` defined above, which simply cuts the tags to the maximum length our pretrained model can handle.

In [12]:
TEXT = data.Field(
    use_vocab=False,
    preprocessing = text_preprocessor,
    init_token = init_token_idx,
    pad_token = pad_token_idx,
    unk_token= unk_token_idx
)

PTB_TAGS = data.Field(unk_token=None,
                     init_token="<pad>",
                     preprocessing=tag_preprocessor
                    )

Then we will define which fields defined above correspond to which fields in the dataset.

In [14]:
fields = (("text", TEXT), (None, None), ("tags", PTB_TAGS))

We will then loads the data.

In [None]:
train_data, valid_data, test_data = datasets.UDPOS.splits(
    fields
)

Now we can check our examples and see that they are already numericalised.

In [16]:
print(vars(train_data.examples[0]))

{'text': [100, 1011, 100, 1024, 100, 2749, 2730, 100, 100, 2632, 1011, 100, 1010, 1996, 14512, 2012, 1996, 8806, 1999, 1996, 2237, 1997, 100, 1010, 2379, 1996, 100, 3675, 1012], 'tags': ['NNP', 'HYPH', 'NNP', ':', 'JJ', 'NNS', 'VBD', 'NNP', 'NNP', 'NNP', 'HYPH', 'NNP', ',', 'DT', 'NN', 'IN', 'DT', 'NN', 'IN', 'DT', 'NN', 'IN', 'NNP', ',', 'IN', 'DT', 'JJ', 'NN', '.']}


Next we are going to build the vocabulary of the tags.

In [17]:
PTB_TAGS.build_vocab(train_data)

In [18]:
print(PTB_TAGS.vocab.stoi)

defaultdict(None, {'<pad>': 0, 'NN': 1, 'IN': 2, 'DT': 3, 'NNP': 4, 'PRP': 5, 'JJ': 6, 'RB': 7, '.': 8, 'VB': 9, 'NNS': 10, ',': 11, 'CC': 12, 'VBD': 13, 'VBP': 14, 'VBZ': 15, 'CD': 16, 'VBN': 17, 'VBG': 18, 'MD': 19, 'TO': 20, 'PRP$': 21, '-RRB-': 22, '-LRB-': 23, 'WDT': 24, 'WRB': 25, ':': 26, '``': 27, "''": 28, 'WP': 29, 'RP': 30, 'UH': 31, 'POS': 32, 'HYPH': 33, 'JJR': 34, 'NNPS': 35, 'JJS': 36, 'EX': 37, 'NFP': 38, 'GW': 39, 'ADD': 40, 'RBR': 41, '$': 42, 'PDT': 43, 'RBS': 44, 'SYM': 45, 'LS': 46, 'FW': 47, 'AFX': 48, 'WP$': 49, 'XX': 50})


### Iterators

Again as from the previous notebook we are going to define our iterators.

In [19]:
BATCH_SIZE = 32

train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
    (train_data, valid_data, test_data),
    device= device,
    batch_size= BATCH_SIZE
)

### Model
Next up is defining our model. The model is relatively simple, with all of the complicated parts contained inside the BERT module which we do not have to worry about. We can think of the BERT as an embedding layer and all we do is add a linear layer on top of these embeddings to predict the tag for each token in the input sequence.

![img](https://camo.githubusercontent.com/4b9ff887ad76b826189f0721505dc1cc248492a8/68747470733a2f2f6769746875622e636f6d2f62656e747265766574742f7079746f7263682d706f732d74616767696e672f626c6f622f6d61737465722f6173736574732f706f732d626572742e706e673f7261773d31)

One thing to note is that we do not define an embedding_dim for our model, it is the size of the output of the pretrained BERT model and we cannot change it. Thus, we simply get the embedding_dim from the model's hidden_size attribute.

BERT also wants sequences with the batch element first, hence we permute our input sequence before passing it to BERT.

In [20]:
class BERTPoSTagger(nn.Module):
  def __init__(self, bert, output_dim, dropout=.5):
    super(BERTPoSTagger, self).__init__()

    self.bert = bert
    embedding_dim = bert.config.to_dict()['hidden_size']

    self.fc = nn.Linear(embedding_dim, output_dim)
    self.dropout = nn.Dropout(dropout)

  def forward(self, text):
    # text = [sent len, batch size]
    text = text.permute(1, 0)
    # text = [batch size, sent len]
    embedded = self.dropout(self.bert(text)[0])
    # embedded = [batch size, seq len, emb dim]
    embedded = embedded.permute(1, 0, 2)
    # embedded = [sent len, batch size, emb dim]
    out = self.fc(self.dropout(embedded))
    # out = [sent len, batch size, output_dim]
    return out

Next, we load the actual pretrained BERT uncased model - before we only loaded the tokenizer associated with the model.

In [21]:
bert = BertModel.from_pretrained('bert-base-uncased')

Downloading:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


### Model Instance

In [22]:
OUTPUT_DIM = len(PTB_TAGS.vocab)
DROPOUT = 0.25

model = BERTPoSTagger(bert,
                      OUTPUT_DIM, 
                      DROPOUT)
model

BERTPoSTagger(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True

### Counting model parameters

In [23]:

def count_trainable_params(model):
  return sum(p.numel() for p in model.parameters()), sum(p.numel() for p in model.parameters() if p.requires_grad)

n_params, trainable_params = count_trainable_params(model)
print(f"Total number of paramaters: {n_params:,}\nTotal tainable parameters: {trainable_params:,}")

Total number of paramaters: 109,521,459
Total tainable parameters: 109,521,459


Next, we define our optimizer. Usually when fine-tuning you want to use a lower learning rate than normal, this is because we don't want to drastically change the parameters as it may cause our model to forget what it has learned. This phenomenon is called catastrophic forgetting.

We pick ``5e-5 (0.00005)`` as it is one of the three values recommended in the BERT paper. Again, there may be better values for this dataset.

In [24]:

LEARNING_RATE = 5e-5

optimizer = torch.optim.Adam(model.parameters(), lr = LEARNING_RATE)

### Criterion

In [25]:
TAG_PAD_IDX = PTB_TAGS.vocab.stoi[PTB_TAGS.pad_token]
criterion = nn.CrossEntropyLoss(ignore_index = TAG_PAD_IDX)

Model and criterion to device

In [26]:
model = model.to(device)
criterion = criterion.to(device)

Categorical accuracy function

In [27]:
def categorical_accuracy(preds, y, tag_pad_idx):
  max_preds = preds.argmax(dim = 1, keepdim = True)
  non_pad_elements = (y != tag_pad_idx).nonzero()
  correct = max_preds[non_pad_elements].squeeze(1).eq(y[non_pad_elements])
  return correct.sum() / y[non_pad_elements].shape[0]

Train and evaluate functions

In [28]:
def train(model, iterator,  optimizer, criterion, tag_pad_idx):
  epoch_loss = 0
  epoch_acc = 0
  model.train()
  for batch in iterator:
    text = batch.text # text = [sent len, batch size]
    tags = batch.tags # tags = [sent len, batch size]

    optimizer.zero_grad()

    predictions = model(text)
    # predictions = [sent len, batch size, output dim]
    predictions = predictions.view(-1, predictions.shape[-1])
    # predictions = [sent len * batch size, output dim]
    tags = tags.view(-1) # tags = [sent len * batch size]
    loss = criterion(predictions, tags)
    acc = categorical_accuracy(predictions, tags, tag_pad_idx)
    
    loss.backward()
    optimizer.step()
    epoch_loss += loss.item()
    epoch_acc += acc.item()

  return epoch_loss / len(iterator), epoch_acc / len(iterator)


def evaluate(model, iterator, criterion, tag_pad_idx):
  epoch_loss = 0
  epoch_acc = 0
  model.eval()

  with torch.no_grad():
    for batch in iterator:
      text = batch.text # text = [sent len, batch size]
      tags = batch.tags # tags = [sent len, batch size]
      predictions = model(text)
      # predictions = [sent len, batch size, output dim]
      predictions = predictions.view(-1, predictions.shape[-1])
      # predictions = [sent len * batch size, output dim]
      tags = tags.view(-1) # tags = [sent len * batch size]
      loss = criterion(predictions, tags)
      acc = categorical_accuracy(predictions, tags, tag_pad_idx)
      
      epoch_loss += loss.item()
      epoch_acc += acc.item()
  return epoch_loss / len(iterator), epoch_acc / len(iterator)

### Next we will run the train loop.

We are going to have helper functions that will helps us to visualizing our trainig epoch

1. Time to string function



In [29]:
def hms_string(sec_elapsed):
  h = int(sec_elapsed / (60 * 60))
  m = int((sec_elapsed % (60 * 60)) / 60)
  s = sec_elapsed % 60
  return "{}:{:>02}:{:>05.2f}".format(h, m, s)

2. Visualize training epoch

In [30]:
def visualize_training(start, end, train_loss, train_accuracy, val_loss, val_accuracy, title):
  data = [
       ["Training", f'{train_loss:.3f}', f'{train_accuracy:.3f}', f"{hms_string(end - start)}" ],
       ["Validation", f'{val_loss:.3f}', f'{val_accuracy:.3f}', "" ],       
  ]
  table = PrettyTable(["CATEGORY", "LOSS", "ACCURACY", "ETA"])
  table.align["CATEGORY"] = 'l'
  table.align["LOSS"] = 'r'
  table.align["ACCURACY"] = 'r'
  table.align["ETA"] = 'r'
  table.title = title
  for row in data:
    table.add_row(row)
  print(table)

In [31]:
N_EPOCHS = 10
best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):
  start = time.time()
  train_loss, train_acc = train(model, train_iterator, optimizer,
                                criterion, TAG_PAD_IDX)
  valid_loss, valid_acc = evaluate(model,
                                   valid_iterator,
                                   criterion, TAG_PAD_IDX)
  
  title = f"EPOCH: {epoch+1:02}/{N_EPOCHS:02} {'saving best model...' if valid_loss < best_valid_loss else 'not saving...'}"
  if valid_loss < best_valid_loss:
      best_valid_loss = valid_loss
      torch.save(model.state_dict(), 'best-model.pt')
  end = time.time()
  visualize_training(start, end, train_loss, train_acc,
                     valid_loss, valid_acc, title)

+--------------------------------------------+
|     EPOCH: 01/10 saving best model...      |
+------------+-------+----------+------------+
| CATEGORY   |  LOSS | ACCURACY |        ETA |
+------------+-------+----------+------------+
| Training   | 0.770 |    0.810 | 0:04:12.25 |
| Validation | 0.726 |    0.820 |            |
+------------+-------+----------+------------+
+--------------------------------------------+
|     EPOCH: 02/10 saving best model...      |
+------------+-------+----------+------------+
| CATEGORY   |  LOSS | ACCURACY |        ETA |
+------------+-------+----------+------------+
| Training   | 0.296 |    0.920 | 0:04:10.76 |
| Validation | 0.620 |    0.837 |            |
+------------+-------+----------+------------+
+--------------------------------------------+
|     EPOCH: 03/10 saving best model...      |
+------------+-------+----------+------------+
| CATEGORY   |  LOSS | ACCURACY |        ETA |
+------------+-------+----------+------------+
| Training   

KeyboardInterrupt: ignored

### Evaluating the best model.

In [32]:
def visualize_test(start, end, test_loss,
                       test_accuracy, title):
  data = [
       ["test", f'{test_loss:.3f}', f'{test_accuracy:.3f}', f"{hms_string(end - start)}" ],       
  ]
  table = PrettyTable(["CATEGORY", "LOSS", "ACCURACY", "ETA"])
  table.align["CATEGORY"] = 'l'
  table.align["LOSS"] = 'r'
  table.align["ACCURACY"] = 'r'
  table.align["ETA"] = 'r'
  table.title = title
  for row in data:
    table.add_row(row)
  print(table)
  

model.load_state_dict(torch.load('best-model.pt'))

start = time.time()
test_loss, test_acc = evaluate(model, test_iterator, criterion, tag_pad_idx=TAG_PAD_IDX)
end = time.time()

visualize_test(start, end, test_loss, test_acc, "MODEL EVALUATION SUMMARY")


+------------------------------------------+
|         MODEL EVALUATION SUMMARY         |
+----------+-------+----------+------------+
| CATEGORY |  LOSS | ACCURACY |        ETA |
+----------+-------+----------+------------+
| test     | 0.648 |    0.824 | 0:00:04.52 |
+----------+-------+----------+------------+


### Model Inference

We'll now see how to use our model to tag actual sentences. This is similar to the inference function from the previous notebook with the tokenization changed to match the format of our pretrained model.

If we pass in a string, this means we need to split it into individual tokens which we do by using the tokenize function of the tokenizer. Afterwards, numericalize our tokens the same way we did before, using convert_tokens_to_ids. Then, we add the [CLS] token index to the beginning of the sequence.

**Note:** if we forget to add the [CLS] token our results will not be good!


In [33]:
import en_core_web_sm
nlp = en_core_web_sm.load()
def tag_sentence(model, device, sentence, tokenizer, text_field,
                 tag_field):
  model.eval()

  if isinstance(sentence, str):
    tokens = tokenizer.tokenize(sentence)
  else:
    tokens = sentence

  if text_field.lower:
    tokens = [t.lower() for t in tokens]

  numericalized_tokens = tokenizer.convert_tokens_to_ids(tokens)
  numericalized_tokens = [text_field.init_token] + numericalized_tokens

  unk_idx = text_field.unk_token
  unks = [t for t, n in zip(tokens, numericalized_tokens) if n == unk_idx]
  token_tensor = torch.LongTensor(numericalized_tokens)
  token_tensor = token_tensor.unsqueeze(-1).to(device)
  predictions = model(token_tensor)
  top_predictions = predictions.argmax(-1)
  predicted_tags = [tag_field.vocab.itos[t.item()] for t in top_predictions]
  predicted_tags = predicted_tags[1:]
  return tokens, predicted_tags, unks


Taking a single example from the train set.

In [34]:
example_index = 1

sentence = vars(train_data.examples[example_index])['text']
actual_tags = vars(train_data.examples[example_index])['tags']
print(sentence)


[1031, 100, 4288, 1997, 1037, 9768, 29307, 2097, 2022, 4786, 2149, 4390, 2005, 2086, 2000, 2272, 1012, 1033]


In [35]:
tokens, pred_tags, unks = tag_sentence(model, 
                                       device, 
                                       sentence,
                                       tokenizer, 
                                       TEXT, 
                                       PTB_TAGS)

print(unks) 

[100, 4288, 1997, 1037, 9768, 29307, 2097, 2022, 4786, 2149, 4390, 2005, 2086, 2000, 2272, 1012, 1033]


In [None]:
print("Pred. Tag\tActual Tag\tCorrect?\tToken\n")

for token, pred_tag, actual_tag in zip(tokens, pred_tags, actual_tags):
    correct = '✔' if pred_tag == actual_tag else '✘'
    print(f"{pred_tag}\t\t{actual_tag}\t\t{correct}\t\t{token}")

Taking as single example from the validation set

In [None]:
example_index = 1

sentence = vars(valid_data.examples[example_index])['text']
actual_tags = vars(valid_data.examples[example_index])['tags']
print(sentence)

In [None]:
tokens, pred_tags, unks = tag_sentence(model, 
                                       device, 
                                       sentence,
                                       tokenizer, 
                                       TEXT, 
                                       PTB_TAGS)

In [None]:
print("Pred. Tag\tActual Tag\tCorrect?\tToken\n")

for token, pred_tag, actual_tag in zip(tokens, pred_tags, actual_tags):
    correct = '✔' if pred_tag == actual_tag else '✘'
    print(f"{pred_tag}\t\t{actual_tag}\t\t{correct}\t\t{token}")

Taking a single example from the test set

In [None]:
example_index = 1

sentence = vars(test_data.examples[example_index])['text']
actual_tags = vars(test_data.examples[example_index])['tags']
print(sentence)

In [None]:
tokens, pred_tags, unks = tag_sentence(model, 
                                       device, 
                                       sentence,
                                       tokenizer, 
                                       TEXT, 
                                       PTB_TAGS)

In [None]:
print("Pred. Tag\tActual Tag\tCorrect?\tToken\n")

for token, pred_tag, actual_tag in zip(tokens, pred_tags, actual_tags):
    correct = '✔' if pred_tag == actual_tag else '✘'
    print(f"{pred_tag}\t\t{actual_tag}\t\t{correct}\t\t{token}")

Using our own sentence

In [None]:
sentence = 'The Queen will deliver a speech about the conflict in North Korea at 1pm tomorrow.'

tokens, tags, unks = tag_sentence(model, 
                                  device, 
                                  sentence,
                                  tokenizer,
                                  TEXT, 
                                  PTB_TAGS)

print(unks)

In [None]:
print("Pred. Tag\tToken\n")
for token, tag in zip(tokens, tags):
    print(f"{tag}\t\t{token}")