---
Author: **`Crispen Gari`**

Date: **`2020-09-08`**

Topic: **`Part of Speech Tagging (PoS)Tagging`**

Library: **`Pytorch`**

Language: **`Python`**

Main: **`Natural Language Processing (NLP)`**

---

### BiLSTM for POS Tagging

In this notebook we are going to clone the previous notebook and make use of it as a base notebook to the task that we are going to perform today of creating a POS tagger using Bi-Directional LSTM. WE are only going to change the tags type that we will be working with. We are going to predict ``PTB`` tags instead of ``UD`` tags since we have done that in the previous notebook.

The rest of the notebook will just remain the same with the cloned notebook where there's a change i will highlight.

### Imports

In [1]:
import torch
from torch.nn import functional as F
from torch import nn

from torchtext.legacy import data, datasets

import spacy
import numpy as np

import time, os, random
from prettytable import PrettyTable

torch.__version__

'1.9.0+cu102'

### Seeds and Device

In [2]:
SEED = 42

random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)

torch.backends.cudnn.deterministic = True

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

device(type='cuda')

### Helper function that will visualise data

In [3]:
def tabulate(column_names, data, title="VISUALIZING SETS EXAMPLES"):
  table = PrettyTable(column_names)
  table.title= title
  for row in data:
    table.add_row(row)
  print(table)

### Data preparation

This time around we will be working with PTB.
### Fields

In [4]:
TEXT = data.Field(lower=True)
UD_TAGS = data.Field(unk_token = None)
PTB_TAGS = data.Field(unk_token = None)

In [5]:
fields = (("text", TEXT), (None, None), ("ptbtags", PTB_TAGS))

Now we will load our data

In [6]:
train_data, valid_data, test_data = datasets.UDPOS.splits(fields)

Checking examples

In [7]:
column_names = ["SUBSET", "EXAMPLE(s)"]
row_data = [
        ["training", f"{len(train_data):,}"],
        ['validation', f"{len(valid_data):,}"],
        ['test', f"{len(test_data):,}"]
]
tabulate(column_names, row_data)

+-----------------------------+
|  VISUALIZING SETS EXAMPLES  |
+--------------+--------------+
|    SUBSET    |  EXAMPLE(s)  |
+--------------+--------------+
|   training   |    12,543    |
|  validation  |    2,002     |
|     test     |    2,077     |
+--------------+--------------+


Printing examples

In [8]:
print(vars(train_data.examples[0]))

{'text': ['al', '-', 'zaman', ':', 'american', 'forces', 'killed', 'shaikh', 'abdullah', 'al', '-', 'ani', ',', 'the', 'preacher', 'at', 'the', 'mosque', 'in', 'the', 'town', 'of', 'qaim', ',', 'near', 'the', 'syrian', 'border', '.'], 'ptbtags': ['NNP', 'HYPH', 'NNP', ':', 'JJ', 'NNS', 'VBD', 'NNP', 'NNP', 'NNP', 'HYPH', 'NNP', ',', 'DT', 'NN', 'IN', 'DT', 'NN', 'IN', 'DT', 'NN', 'IN', 'NNP', ',', 'IN', 'DT', 'JJ', 'NN', '.']}


In [9]:
print(vars(train_data.examples[0]).get("text"))

['al', '-', 'zaman', ':', 'american', 'forces', 'killed', 'shaikh', 'abdullah', 'al', '-', 'ani', ',', 'the', 'preacher', 'at', 'the', 'mosque', 'in', 'the', 'town', 'of', 'qaim', ',', 'near', 'the', 'syrian', 'border', '.']


In [10]:
print(vars(train_data.examples[0]).get("ptbtags"))

['NNP', 'HYPH', 'NNP', ':', 'JJ', 'NNS', 'VBD', 'NNP', 'NNP', 'NNP', 'HYPH', 'NNP', ',', 'DT', 'NN', 'IN', 'DT', 'NN', 'IN', 'DT', 'NN', 'IN', 'NNP', ',', 'IN', 'DT', 'JJ', 'NN', '.']


### Vocabulary mapping
Next we will build a maping of tokens to integers. We are going to set the `min_freq` to 2 so that the tokens that appears less that 2 times in the corpus will be automatically converted to `<unk>`

WE are also going to load the Glove pretrainded word embedding. Using pretrained word vectors lead to impovement in the performance of the model. In our case we are going to use the `glove.6B.100d` which was trained with about 6 billion words and each word is `100d` vector of numbers.

_`unk_init`_ is used to initialize the token embeddings which are not in the pre-trained embedding vocabulary. By default this sets those embeddings to zeros, however it is better to not have them all initialized to the same value, so we initialize them from a **Normal/Gaussian** distribution.

In [11]:
MIN_FREQ = 2
TEXT.build_vocab(train_data, min_freq=MIN_FREQ,
                 vectors = "glove.6B.100d",
                 unk_init = torch.Tensor.normal_
                 )

UD_TAGS.build_vocab(train_data)
PTB_TAGS.build_vocab(train_data)

Let's check how many tokens are in our vobabulary.

In [12]:
column_names = ["FIELD", "TOKEN(s)"]
row_data = [
        ["TEXT", f"{len(TEXT.vocab):,}"],
        ['PTB_TAGS', f"{len(PTB_TAGS.vocab):,}"]
]
tabulate(column_names, row_data, title="VOCABULARY SIZES")

+---------------------+
|   VOCABULARY SIZES  |
+----------+----------+
|  FIELD   | TOKEN(s) |
+----------+----------+
|   TEXT   |  8,866   |
| PTB_TAGS |    51    |
+----------+----------+


Checking the most common words in the vocabulary..


In [13]:
print(TEXT.vocab.freqs.most_common(10))

[('the', 9076), ('.', 8640), (',', 7021), ('to', 5137), ('and', 5002), ('a', 3782), ('of', 3622), ('i', 3379), ('in', 3112), ('is', 2239)]


We can check the vocabularies of our tags

In [14]:
print(PTB_TAGS.vocab.itos)

['<pad>', 'NN', 'IN', 'DT', 'NNP', 'PRP', 'JJ', 'RB', '.', 'VB', 'NNS', ',', 'CC', 'VBD', 'VBP', 'VBZ', 'CD', 'VBN', 'VBG', 'MD', 'TO', 'PRP$', '-RRB-', '-LRB-', 'WDT', 'WRB', ':', '``', "''", 'WP', 'RP', 'UH', 'POS', 'HYPH', 'JJR', 'NNPS', 'JJS', 'EX', 'NFP', 'GW', 'ADD', 'RBR', '$', 'PDT', 'RBS', 'SYM', 'LS', 'FW', 'AFX', 'WP$', 'XX']


Checking most common tags

In [15]:

print(PTB_TAGS.vocab.freqs.most_common(10))

[('NN', 26915), ('IN', 20724), ('DT', 16817), ('NNP', 12449), ('PRP', 12193), ('JJ', 11591), ('RB', 10831), ('.', 10317), ('VB', 9476), ('NNS', 8438)]


We can also check tag percentages that are in our training data set as follows:

In [16]:
def tabulate_percentage(column_names, data, title="TAGS STATISTICS"):
  table = PrettyTable(column_names)
  table.title= title
  table.align[column_names[0]] = 'l'
  table.align[column_names[1]] = 'r'
  table.align[column_names[2]] = 'r'
  for row in data:
    table.add_row(row)
  print(table)

def tag_percentage(tag_counts):
  total_count = sum([count for tag, count in tag_counts])
  tag_counts_percentages = [
      (tag, count, count/total_count) for tag, count in tag_counts
  ]
  return tag_counts_percentages

column_names = ["Tag", "Count", "Percentage"]
row_data = []


In [17]:
for tag, count, percent in tag_percentage(PTB_TAGS.vocab.freqs.most_common()):
  row_data.append([
    tag, f"{count:,}", f"{percent * 100:3.1f}%"
  ])

tabulate_percentage(column_names, row_data )

+-----------------------------+
|       TAGS STATISTICS       |
+-------+--------+------------+
| Tag   |  Count | Percentage |
+-------+--------+------------+
| NN    | 26,915 |      13.2% |
| IN    | 20,724 |      10.1% |
| DT    | 16,817 |       8.2% |
| NNP   | 12,449 |       6.1% |
| PRP   | 12,193 |       6.0% |
| JJ    | 11,591 |       5.7% |
| RB    | 10,831 |       5.3% |
| .     | 10,317 |       5.0% |
| VB    |  9,476 |       4.6% |
| NNS   |  8,438 |       4.1% |
| ,     |  8,062 |       3.9% |
| CC    |  6,706 |       3.3% |
| VBD   |  5,402 |       2.6% |
| VBP   |  5,374 |       2.6% |
| VBZ   |  4,578 |       2.2% |
| CD    |  3,998 |       2.0% |
| VBN   |  3,967 |       1.9% |
| VBG   |  3,330 |       1.6% |
| MD    |  3,294 |       1.6% |
| TO    |  3,286 |       1.6% |
| PRP$  |  3,068 |       1.5% |
| -RRB- |  1,008 |       0.5% |
| -LRB- |    973 |       0.5% |
| WDT   |    948 |       0.5% |
| WRB   |    869 |       0.4% |
| :     |    866 |       0.4% |
| ``    

### Creating an iterator

We are going to use the `BucketIterator` to create iterators for our different sets, train, test and validation.

In [18]:
BATCH_SIZE = 128

train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
    (train_data, valid_data, test_data),
    batch_size = BATCH_SIZE,
    device=device
)

### Model building

We are going to build a BiDirectional LSTM (BiLSTM) model.


In [19]:
class BiLSTMPOSTagger(nn.Module):
  def __init__(self,
               input_dim,
               embedding_dim,
               hidden_dim,
               output_dim,
               pad_idx,
               n_layers =2,
               bidirectional=True,
               dropout=.5,
               ):
    super(BiLSTMPOSTagger, self).__init__()
    self.embedding = nn.Embedding(input_dim, embedding_dim,
                                  padding_idx=pad_idx)
    self.lstm = nn.LSTM(embedding_dim,
                        hidden_dim,
                        num_layers=n_layers,
                        bidirectional = bidirectional,
                        dropout = dropout if n_layers > 1 else 0
                        )
    self.fc = nn.Linear(hidden_dim * 2 if bidirectional else hidden_dim,
                        output_dim )
    self.dropout = nn.Dropout(dropout)

  def forward(self, text):
    # text = [sent len, batch size]
    embedded = self.dropout(self.embedding(text))
    
    # embedded = [sent len, batch size, emb dim]
    outputs, (h_0, c_0) = self.lstm(embedded)
    """
    outputs holds the backward and forward hidden states 
    in the final layer
    
    hidden and cell are the backward and forward hidden
    and cell states at the final time-step
    
    output = [sent len, batch size, hid dim * n directions]
    hidden/cell = [n layers * n directions, batch size, hid dim]
    
    we use our outputs to make a prediction of what the tag should be
    """
    out = self.fc(self.dropout(outputs))
    # out [sent len, batch size, output dim]
    return out

### Model training

In [20]:
INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 100
HIDDEN_DIM = 128
OUTPUT_DIM = len(PTB_TAGS.vocab)

N_LAYERS = 2
BIDIRECTIONAL = True
DROPOUT = 0.25
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]

model = BiLSTMPOSTagger(
    INPUT_DIM, 
    EMBEDDING_DIM, 
    HIDDEN_DIM, 
    OUTPUT_DIM, 
    PAD_IDX,
    n_layers= N_LAYERS, 
    bidirectional = BIDIRECTIONAL, 
    dropout= DROPOUT, 
)
model

BiLSTMPOSTagger(
  (embedding): Embedding(8866, 100, padding_idx=1)
  (lstm): LSTM(100, 128, num_layers=2, dropout=0.25, bidirectional=True)
  (fc): Linear(in_features=256, out_features=51, bias=True)
  (dropout): Dropout(p=0.25, inplace=False)
)

We are going then to initialize the model weights using Normal distribution.

In [21]:
def init_weights(m):
  for name, param in m.named_parameters():
    nn.init.normal_(param.data, mean=0, std=0.1)

model.apply(init_weights)

BiLSTMPOSTagger(
  (embedding): Embedding(8866, 100, padding_idx=1)
  (lstm): LSTM(100, 128, num_layers=2, dropout=0.25, bidirectional=True)
  (fc): Linear(in_features=256, out_features=51, bias=True)
  (dropout): Dropout(p=0.25, inplace=False)
)

Next we are going to count moedl parameters

In [22]:
def count_trainable_params(model):
  return sum(p.numel() for p in model.parameters()), sum(p.numel() for p in model.parameters() if p.requires_grad)

n_params, trainable_params = count_trainable_params(model)
print(f"Total number of paramaters: {n_params:,}\nTotal tainable parameters: {trainable_params:,}")

Total number of paramaters: 1,530,491
Total tainable parameters: 1,530,491


We will then initialize the model embedding layer with pretained word vectors.

In [23]:
pretrained_embeddings = TEXT.vocab.vectors
model.embedding.weight.data.copy_(pretrained_embeddings)

tensor([[ 1.9269,  1.4873,  0.9007,  ...,  0.1233,  0.3499,  0.6173],
        [ 0.7262,  0.0912, -0.3891,  ...,  0.0821,  0.4440, -0.7240],
        [-0.0382, -0.2449,  0.7281,  ..., -0.1459,  0.8278,  0.2706],
        ...,
        [-0.6808,  0.5419, -1.5231,  ..., -1.1103, -0.5245, -0.9152],
        [-0.5972,  0.0471, -0.2406,  ..., -0.9446, -0.1126, -0.2260],
        [-1.8684, -1.3026, -0.8013,  ...,  0.2404,  0.4319, -1.3682]])

We will then initialize the padding tokens to zeros.

In [24]:
model.embedding.weight.data[PAD_IDX] = torch.zeros(EMBEDDING_DIM)
model.embedding.weight.data

tensor([[ 1.9269,  1.4873,  0.9007,  ...,  0.1233,  0.3499,  0.6173],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [-0.0382, -0.2449,  0.7281,  ..., -0.1459,  0.8278,  0.2706],
        ...,
        [-0.6808,  0.5419, -1.5231,  ..., -1.1103, -0.5245, -0.9152],
        [-0.5972,  0.0471, -0.2406,  ..., -0.9446, -0.1126, -0.2260],
        [-1.8684, -1.3026, -0.8013,  ...,  0.2404,  0.4319, -1.3682]])

### Optimizer

We are going to use the Adam optimizer

In [25]:
optimizer = torch.optim.Adam(model.parameters())

### Criterion/Loss Function

Next we are going to define our loss function.

Even though we have no <unk> tokens within our tag vocab, we still have <pad> tokens. This is because all sentences within a batch need to be the same size. However, we don't want to calculate the loss when the target is a <pad> token as we aren't training our model to recognize padding tokens.

We handle this by setting the ignore_index in our loss function to the index of the padding token in our tag vocabulary.


In [26]:
TAG_PAD_IDX = PTB_TAGS.vocab.stoi[PTB_TAGS.pad_token]
criterion = nn.CrossEntropyLoss(ignore_index = TAG_PAD_IDX)

### Model and criterion to GPU if available

In [27]:
model = model.to(device)
criterion = criterion.to(device)

We dont want to caculate the accuracy over `<pad>` tokens as we are not interested in predicting them. So we are going to create a function that calculate the accuracy of non-padded tokens.

In [28]:
def categorical_accuracy(preds, y, tag_pad_idx):
  max_preds = preds.argmax(dim = 1, keepdim = True)
  non_pad_elements = (y != tag_pad_idx).nonzero()
  correct = max_preds[non_pad_elements].squeeze(1).eq(y[non_pad_elements])
  return correct.sum() / y[non_pad_elements].shape[0]

### Training and evaluation functions

The train function will remail unchanged.

In [29]:
def train(model, iterator,  optimizer, criterion, tag_pad_idx):
  epoch_loss = 0
  epoch_acc = 0
  model.train()
  for batch in iterator:
    text = batch.text # text = [sent len, batch size]
    tags = batch.ptbtags # tags = [sent len, batch size]

    optimizer.zero_grad()

    predictions = model(text)
    # predictions = [sent len, batch size, output dim]
    predictions = predictions.view(-1, predictions.shape[-1])
    # predictions = [sent len * batch size, output dim]
    tags = tags.view(-1) # tags = [sent len * batch size]
    loss = criterion(predictions, tags)
    acc = categorical_accuracy(predictions, tags, tag_pad_idx)
    
    loss.backward()
    optimizer.step()
    epoch_loss += loss.item()
    epoch_acc += acc.item()

  return epoch_loss / len(iterator), epoch_acc / len(iterator)


def evaluate(model, iterator, criterion, tag_pad_idx):
  epoch_loss = 0
  epoch_acc = 0
  model.eval()

  with torch.no_grad():
    for batch in iterator:
      text = batch.text # text = [sent len, batch size]
      tags = batch.ptbtags # tags = [sent len, batch size]
      predictions = model(text)
      # predictions = [sent len, batch size, output dim]
      predictions = predictions.view(-1, predictions.shape[-1])
      # predictions = [sent len * batch size, output dim]
      tags = tags.view(-1) # tags = [sent len * batch size]
      loss = criterion(predictions, tags)
      acc = categorical_accuracy(predictions, tags, tag_pad_idx)
      
      epoch_loss += loss.item()
      epoch_acc += acc.item()
  return epoch_loss / len(iterator), epoch_acc / len(iterator)

### Training loop

We are going to have helper functions that will helps us to visualizing our trainig epoch

1. Time to string function

In [30]:
def hms_string(sec_elapsed):
    h = int(sec_elapsed / (60 * 60))
    m = int((sec_elapsed % (60 * 60)) / 60)
    s = sec_elapsed % 60
    return "{}:{:>02}:{:>05.2f}".format(h, m, s)

2. visualize training epoch.

In [31]:
def visualize_training(start, end, train_loss, train_accuracy, val_loss, val_accuracy, title):
  data = [
       ["Training", f'{train_loss:.3f}', f'{train_accuracy:.3f}', f"{hms_string(end - start)}" ],
       ["Validation", f'{val_loss:.3f}', f'{val_accuracy:.3f}', "" ],       
  ]
  table = PrettyTable(["CATEGORY", "LOSS", "ACCURACY", "ETA"])
  table.align["CATEGORY"] = 'l'
  table.align["LOSS"] = 'r'
  table.align["ACCURACY"] = 'r'
  table.align["ETA"] = 'r'
  table.title = title
  for row in data:
    table.add_row(row)
  print(table)
  

In [32]:

N_EPOCHS = 10
best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):
  start = time.time()
  train_loss, train_acc = train(model, train_iterator, optimizer,
                                criterion, TAG_PAD_IDX)
  valid_loss, valid_acc = evaluate(model,
                                   valid_iterator,
                                   criterion, TAG_PAD_IDX)
  
  title = f"EPOCH: {epoch+1:02}/{N_EPOCHS:02} {'saving best model...' if valid_loss < best_valid_loss else 'not saving...'}"
  if valid_loss < best_valid_loss:
      best_valid_loss = valid_loss
      torch.save(model.state_dict(), 'best-model.pt')
  end = time.time()
  visualize_training(start, end, train_loss, train_acc,
                     valid_loss, valid_acc, title)
  

+--------------------------------------------+
|     EPOCH: 01/10 saving best model...      |
+------------+-------+----------+------------+
| CATEGORY   |  LOSS | ACCURACY |        ETA |
+------------+-------+----------+------------+
| Training   | 1.914 |    0.484 | 0:00:06.89 |
| Validation | 1.028 |    0.722 |            |
+------------+-------+----------+------------+
+--------------------------------------------+
|     EPOCH: 02/10 saving best model...      |
+------------+-------+----------+------------+
| CATEGORY   |  LOSS | ACCURACY |        ETA |
+------------+-------+----------+------------+
| Training   | 0.666 |    0.814 | 0:00:06.57 |
| Validation | 0.675 |    0.811 |            |
+------------+-------+----------+------------+
+--------------------------------------------+
|     EPOCH: 03/10 saving best model...      |
+------------+-------+----------+------------+
| CATEGORY   |  LOSS | ACCURACY |        ETA |
+------------+-------+----------+------------+
| Training   

### Evaluating the best model

In [33]:
def visualize_test(start, end, test_loss,
                       test_accuracy, title):
  data = [
       ["test", f'{test_loss:.3f}', f'{test_accuracy:.3f}', f"{hms_string(end - start)}" ],       
  ]
  table = PrettyTable(["CATEGORY", "LOSS", "ACCURACY", "ETA"])
  table.align["CATEGORY"] = 'l'
  table.align["LOSS"] = 'r'
  table.align["ACCURACY"] = 'r'
  table.align["ETA"] = 'r'
  table.title = title
  for row in data:
    table.add_row(row)
  print(table)
  

model.load_state_dict(torch.load('best-model.pt'))

start = time.time()
test_loss, test_acc = evaluate(model, test_iterator, criterion, tag_pad_idx=TAG_PAD_IDX)
end = time.time()

visualize_test(start, end, test_loss, test_acc, "MODEL EVALUATION SUMMARY")

+------------------------------------------+
|         MODEL EVALUATION SUMMARY         |
+----------+-------+----------+------------+
| CATEGORY |  LOSS | ACCURACY |        ETA |
+----------+-------+----------+------------+
| test     | 0.484 |    0.862 | 0:00:00.13 |
+----------+-------+----------+------------+


### Model inference

Now we are ready to create out `tag_sentence` function that will:

* put the model into evaluation mode
* tokenize the sentence with spaCy if it is not a list
* lowercase the tokens if the Field did
numericalize the tokens using the vocabulary
* find out which tokens are not in the vocabulary, i.e. are ``<unk>`` tokens
convert the numericalized tokens into a tensor and add a batch dimension
* feed the tensor into the model
* get the predictions over the sentence
* convert the predictions into readable tags


In [34]:
import en_core_web_sm
nlp = en_core_web_sm.load()
def tag_sentence(model, device, sentence, text_field, tag_field):
  model.eval()

  if isinstance(sentence, str):
    tokens = [token.text for token in nlp.tokenizer(sentence)]
  else:
    tokens = [token for token in sentence]

  if text_field.lower:
    tokens = [t.lower() for t in tokens]

  numericalized_tokens = [text_field.vocab.stoi[t] for t in tokens]
  unk_idx = text_field.vocab.stoi[text_field.unk_token]
  unks = [t for t, n in zip(tokens, numericalized_tokens) if n == unk_idx]
  token_tensor = torch.LongTensor(numericalized_tokens)
  token_tensor = token_tensor.unsqueeze(-1).to(device)
  predictions = model(token_tensor)
  top_predictions = predictions.argmax(-1)
  predicted_tags = [tag_field.vocab.itos[t.item()] for t in top_predictions]
  return tokens, predicted_tags, unks


Taking a single example on our train data.

In [35]:
example_index = 1

sentence = vars(train_data.examples[example_index])['text']
actual_tags = vars(train_data.examples[example_index])['ptbtags']
print(sentence)


['[', 'this', 'killing', 'of', 'a', 'respected', 'cleric', 'will', 'be', 'causing', 'us', 'trouble', 'for', 'years', 'to', 'come', '.', ']']


In [37]:
tokens, pred_tags, unks = tag_sentence(model, 
                                       device, 
                                       sentence, 
                                       TEXT, 
                                       PTB_TAGS)

print(unks) # 'respected', 'cleric' have unkown tags

['respected', 'cleric']


We can check how correct the nodel is right now, the model missed "respected" because it is an unknown word in the cu=orpus.

In [38]:
print("Pred. Tag\tActual Tag\tCorrect?\tToken\n")

for token, pred_tag, actual_tag in zip(tokens, pred_tags, actual_tags):
    correct = '✔' if pred_tag == actual_tag else '✘'
    print(f"{pred_tag}\t\t{actual_tag}\t\t{correct}\t\t{token}")

Pred. Tag	Actual Tag	Correct?	Token

-LRB-		-LRB-		✔		[
DT		DT		✔		this
NN		NN		✔		killing
IN		IN		✔		of
DT		DT		✔		a
NN		JJ		✘		respected
NN		NN		✔		cleric
MD		MD		✔		will
VB		VB		✔		be
VBG		VBG		✔		causing
PRP		PRP		✔		us
NN		NN		✔		trouble
IN		IN		✔		for
NNS		NNS		✔		years
TO		TO		✔		to
VB		VB		✔		come
.		.		✔		.
-RRB-		-RRB-		✔		]


Next we will make our own sentence and test this out.

In [40]:
sentence = 'The Queen will deliver a speech about the conflict in North Korea at 1pm tomorrow.'

tokens, tags, unks = tag_sentence(model, 
                                  device, 
                                  sentence, 
                                  TEXT, 
                                  PTB_TAGS)

print(unks) # we cont have unknowns here.

[]


In [41]:
print("Pred. Tag\tToken\n")
for token, tag in zip(tokens, tags):
    print(f"{tag}\t\t{token}")

Pred. Tag	Token

DT		the
NN		queen
MD		will
VB		deliver
DT		a
NN		speech
IN		about
DT		the
NN		conflict
IN		in
NNP		north
NNP		korea
IN		at
CD		1
NN		pm
NN		tomorrow
.		.


### Conclusion

Next we will be looking at PoS tagging using fine tuned transformers.
