# Assignment 4 - Named Entity Recognition (NER)

Welcome to the forth programming assignment of Course. In this assignment, you will learn to build more complicated models with pytorch. By completing this assignment, you will be able to: 

- Design the architecture of a neural network, train it, and test it. 
- Process features and represents them
- Understand word padding
- Implement LSTMs
- Test with your own sentence



<a name="0"></a>
## Introduction

We first start by defining named entity recognition (NER). NER is a subtask of information extraction that locates and classifies named entities in a text. The named entities could be organizations, persons, locations, times, etc. 

For example:

<img src = 'images/ner.png' width="width" height="height" style="width:600px;height:150px;"/>

Is labeled as follows: 

- French: geopolitical entity
- Morocco: geographic entity 
- Christmas: time indicator

Everything else that is labeled with an `O` is not considered to be a named entity. In this assignment, you will train a named entity recognition system that could be trained in a few seconds (on a GPU) and will get around 75% accuracy. Then, you will load in the exact version of your model, which was trained for a longer period of time. You could then evaluate the trained version of your model to get 96% accuracy! Finally, you will be able to test your named entity recognition system with your own sentence.

In [None]:
import os 
import numpy as np
import pandas as pd
import random as rnd

from sklearn.metrics import accuracy_score

import torch
import torch.optim as optim
from torch import nn
from torch.utils.data import Dataset
from  torch.utils.data import DataLoader

from utils import get_params, get_vocab

# set random seeds to make this notebook easier to replicate
rnd.seed(33)

<a name="1"></a>
## 1 - Exploring the Data

We will be using a dataset from Kaggle, which we will preprocess for you. The original data consists of four columns: the sentence number, the word, the part of speech of the word, and the tags.  A few tags you might expect to see are: 

* geo: geographical entity
* org: organization
* per: person 
* gpe: geopolitical entity
* tim: time indicator
* art: artifact
* eve: event
* nat: natural phenomenon
* O: filler word


In [None]:
# display original kaggle data
data = pd.read_csv("data/ner_dataset.csv", encoding = "ISO-8859-1") 
train_sents = open('data/small/train/sentences.txt', 'r').readline()
train_labels = open('data/small/train/labels.txt', 'r').readline()
print('SENTENCE:', train_sents)
print('SENTENCE LABEL:', train_labels)
print('ORIGINAL DATA:\n', data.head(5))
del(data, train_sents, train_labels)

<a name="1-1"></a>
### 1.1 - Importing the Data

In this part, we will import the preprocessed data and explore it.

In [None]:
vocab, tag_map = get_vocab('data/large/words.txt', 'data/large/tags.txt')
t_sentences, t_labels, t_size = get_params(vocab, tag_map, 'data/large/train/sentences.txt', 'data/large/train/labels.txt')
v_sentences, v_labels, v_size = get_params(vocab, tag_map, 'data/large/val/sentences.txt', 'data/large/val/labels.txt')
test_sentences, test_labels, test_size = get_params(vocab, tag_map, 'data/large/test/sentences.txt', 'data/large/test/labels.txt')

`vocab` is a dictionary that translates a word string to a unique number. Given a sentence, you can represent it as an array of numbers translating with this dictionary. The dictionary contains a `<PAD>` token. 

When training an LSTM using batches, all your input sentences must be the same size. To accomplish this, you set the length of your sentences to a certain number and add the generic `<PAD>` token to fill all the empty spaces. 

In [None]:
# vocab translates from a word to a unique number
print('vocab["the"]:', vocab["the"])
# Pad token
print('padded token:', vocab['<PAD>'])

The `tag_map` is a dictionary that maps the tags that you could have to numbers. Run the cell below to see the possible classes you will be predicting. The prepositions in the tags mean:
* I: Token is inside an entity.
* B: Token begins an entity.

In [None]:
print(tag_map)

If you had the sentence 

**"Sharon flew to Miami on Friday"**

The tags would look like:

```
Sharon B-per
flew   O
to     O
Miami  B-geo
on     O
Friday B-tim
```

where you would have three tokens beginning with B-, since there are no multi-token entities in the sequence. But if you added Sharon's last name to the sentence:

**"Sharon Floyd flew to Miami on Friday"**

```
Sharon B-per
Floyd  I-per
flew   O
to     O
Miami  B-geo
on     O
Friday B-tim
```

your tags would change to show first "Sharon" as B-per, and "Floyd" as I-per, where I- indicates an inner token in a multi-token sequence.

In [None]:
# Exploring information about the data
print('The number of outputs is tag_map', len(tag_map))
# The number of vocabulary tokens (including <PAD>)
g_vocab_size = len(vocab)
print(f"Num of vocabulary words: {g_vocab_size}")
print('The training size is', t_size)
print('The validation size is', v_size)
print('An example of the first sentence is', t_sentences[0])
print('An example of its corresponding label is', t_labels[0])

So you can see that we have already encoded each sentence into a tensor by converting it into a number. We also have 16 possible tags (excluding the '0' tag), as shown in the tag map.


<a name="1-2"></a>
### 1.2 - Data Generator

In python, a generator is a function that behaves like an iterator. It returns the next item in a pre-defined sequence. Here is a [link](https://wiki.python.org/moin/Generators) to review python generators. 

In many AI applications it is very useful to have a data generator. You will now implement a data generator for our NER application.

<a name="ex-1"></a>
### Exercise 1 - data_generator

**Instructions:** Implement a dataset class that takes in ` x, y, pad ` where $x$ is a large list of sentences, and $y$ is a list of the tags associated with those sentences and pad is a pad value. This class you need to implement 

`__init__ function to initiate the sentences, labels and mask for the entire dataset`

`__getitem__ function will take in index and return the coresponding sentence , label , mask`

`__len__ return the size of the dataset`
 

`X` and `Y` are arrays of dimension (`batch_size, max_len`), where `max_len` is the length of the longest sentence *in that batch*. You will pad the `X` and `Y` examples with the pad argument. If `shuffle=True`, the data will be traversed in a random order.

**Details:**

Use this code as an outer loop
```
while True:  
...  
yield((X,Y))  
```

so your data generator runs continuously. Within that loop, define two `for` loops:  

1. The first stores temporal lists of the data samples to be included in the batch, and finds the maximum length of the sentences contained in it.

2. The second one moves the elements from the temporal list into NumPy arrays pre-filled with pad values.

There are three features useful for defining this generator:

1. The NumPy `full` function to fill the NumPy arrays with a pad value. See [full function documentation](https://numpy.org/doc/1.18/reference/generated/numpy.full.html).

2. Tracking the current location in the incoming lists of sentences. Generators variables hold their values between invocations, so we create an `index` variable, initialize to zero, and increment by one for each sample included in a batch. However, we do not use the `index` to access the positions of the list of sentences directly. Instead, we use it to select one index from a list of indexes. In this way, we can change the order in which we traverse our original list, keeping untouched our original list.  

3. Since `batch_size` and the length of the input lists are not aligned, gathering a batch_size group of inputs may involve wrapping back to the beginning of the input loop. In our approach, it is just enough to reset the `index` to 0. We can re-shuffle the list of indexes to produce different batches each time.

In [None]:
class NerDataSet(Dataset):
    def __init__(self,x,y,pad):
        
        self._dataset_size = len(x)
        max_length = 0

        # create empty buffer for sentences and labels simply use empty array
        buffer_x = []
        buffer_y = []
        
        for sentence,label in zip(x,y):
            max_length = max(max_length,len(sentence))
            # add the sentence to the buffer
            buffer_x.append(sentence)
            # add the label to the buffer
            buffer_y.append(label)

        
        ### START CODE HERE (Replace instances of 'None' with your code) ###
        
        # create X,Y, NumPy arrays of size (dataset_size, max_len) 'full' of pad value
        # use self._dataset_size , calculated max_length and fill the value with pad
        X = None
        # use self._dataset_size , calculated max_length and fill the value with pad
        Y = None
        # use self._dataset_size , calculated max_length and fill the value with 0
        MASK = None

        # copy values from lists to NumPy arrays. Use the buffered values
        for i in range(self._dataset_size):
            
            # get the example (sentence as a tensor)
            # in `buffer_x` at the `i` index
            x_i = None
            
            # get the example (label as a tensor)
            # in `buffer_y` at the `i` index
            y_i = None
            
            # Walk through each sentence and words in x_i
            for j in range(len(x_i)):
                
                # store the word in x_i at position j into X
                X[i,j] = None
                
                # store the label in y_i at position j into Y
                Y[i,j] = None

                MASK[i,j] = 1
        ### END CODE HERE ###
        
        self.sentences = X
        self.labels = Y
        self.mask = MASK
        
    
    def __len__(self):
        return self._dataset_size
        
    def __getitem__(self, index):

        ### START CODE HERE (Replace instances of 'None' with your code) ###
        # return the reletive sentence (sentences are stored in self.sentences) data according to index
        sentence = None
        # return the reletive labels (sentences are stored in self.labels) data according to index
        label = None
        # return the reletive mask (sentences are stored in self.mask) data according to index
        mask = None
        ### END CODE HERE ###
        
        return {'sentence':sentence ,  'label': label, 'mask': mask}

In [None]:
def generate_batches(dataset, batch_size, shuffle=True,drop_last=True, device="cpu"):
    # load dataset as dataloader
    dataloader = DataLoader(dataset = dataset, batch_size=batch_size,shuffle=shuffle,drop_last=drop_last)

    
    for data_dict in dataloader:
        batch = {}
        ### START CODE HERE (Replace instances of 'None' with your code) ###
        # implement a loop that iterate over data_dict.items() and return the related data according to the dictionary you return in the __getitem__ in dataset
        # you need to copy the current batch in to the batch dictionary
        
        
        ### END CODE HERE ###
        yield batch
    
    

In [None]:
batch_size = 5
mini_sentences = t_sentences[0: 8]
mini_labels = t_labels[0: 8]
mini_dataset = NerDataSet(mini_sentences,mini_labels,vocab['<PAD>'])
dg = generate_batches(mini_dataset,batch_size,False,False,"cpu")

batch1 = next(dg)
batch2 = next(dg)
X1 , Y1, mask1 = batch1['sentence'], batch1['label'], batch1['mask']
X2 , Y2, mask2 = batch2['sentence'], batch2['label'], batch1['mask']
print(Y1.size(), X1.size(), Y2.size(), X2.size())
print(X1[0][:], "\n", Y1[0][:] , "\n", mask1[0][:])

**Expected output:**   
```
torch.Size([5, 30]) torch.Size([5, 30]) torch.Size([3, 30]) torch.Size([3, 30])

(5, 30) (5, 30) (5, 30) (5, 30)
[    0     1     2     3     4     5     6     7     8     9    10    11
    12    13    14     9    15     1    16    17    18    19    20    21
 35180 35180 35180 35180 35180 35180] 
 [    0     0     0     0     0     0     1     0     0     0     0     0
     1     0     0     0     0     0     2     0     0     0     0     0
 35180 35180 35180 35180 35180 35180]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        0, 0, 0, 0, 0, 0])
```

<a name="2"></a>
## 2 - Building the Model

You will now implement the model that will be able to determining the tags of sentences like the following:
<table>
    <tr>
        <td>
<img src = 'images/ner1.png' width="width" height="height" style="width:500px;height:150px;"/>
        </td>
    </tr>
</table>

The model architecture will be as follows: 

<img src = 'images/ner2.png' width="width" height="height" style="width:600px;height:250px;"/>


Concretely, your inputs will be sentences represented as tensors that are fed to a model with:

* An Embedding layer,
* A LSTM layer
* A Dense layer
* A log softmax layer.

Good news! We won't make you implement the LSTM cell drawn above. You will be in charge of the overall architecture of the model.

<a name="ex-2"></a>
### Exercise 2 - NER

**Instructions:** Implement the initialization step and the forward function of your Named Entity Recognition system.  
Please utilize help function e.g. `help(tl.Dense)` for more information on a layer
   


-  [nn.Embedding](https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html): Initializes the embedding. In this case it is the dimension of the model by the size of the vocabulary. 
    - `tl.Embedding(vocab_size, d_feature)`.
    - `vocab_size` is the number of unique words in the given vocabulary.
    - `d_feature` is the number of elements in the word embedding (some choices for a word embedding size range from 150 to 300, for example).
    

-  [nn.LSTM](https://pytorch.org/docs/stable/generated/torch.ao.nn.quantizable.LSTM.html#lstm):`NN` LSTM layer. 
    - `LSTM(n_units)` Builds an LSTM layer with hidden state and cell sizes equal to `n_units`. In trax, `n_units` should be equal to the size of the embeddings `d_feature`.



-  [nn.Linear](https://pytorch.org/docs/stable/generated/torch.ao.nn.quantized.functional.linear.html#linear):  A dense layer.
    - `nn.Linear(input , output)`: The parameter `n_units` is the number of units chosen for this dense layer.  

  

In [None]:
class NER(nn.Module):
    def __init__(self,tags,vocab_size,model_dim,hidden_size):
        super(NER, self).__init__()
        ### START CODE HERE (Replace instances of 'None' with your code) ###
        self.emb = None # Embedding layer
        self.lstm = None # LSTM layer
        self.linear = None # Linear layer with len(tags) units
        ### END CODE HERE ###
        

    def forward(self,sentences):
        ### START CODE HERE (Replace instances of 'None' with your code) ###
        embeds = None # send sentences to the emb layer
        
        lstm_out, _ = None # send embeds to the LSTM and get the output 
        
        tag_space = None # send lstm_out to linear layer
        
        return tag_space
        ### END CODE HERE ###

In [None]:
# initializing your model
model = NER(tags=tag_map,vocab_size=len(vocab), model_dim=100,hidden_size=34)
# display your model
print(model)

**Expected output:**  
```
NER(
  (emb): Embedding(35181, 50)
  (lstm): LSTM(50, 17)
  (linear): Linear(in_features=17, out_features=17, bias=Trone)
)
```  


<a name="3"></a>
## 3 - Train the Model 

This section will train your model.


Use this step to implement the prediction 


we have the baches as represented:

<img src = 'images/ner3.png' width="width" height="height" style="width:600px;height:250px;"/>


we need to flatten the matrix in order to be able to compare first we need to flat the mask and then use the mask to remove the padding as they are not part of our prediction

<img src = 'images/ner4.png' width="width" height="height" style="width:600px;height:250px;"/>

In [None]:
def train_model(model, train_dataset , eval_dataset,train_step = 1, batch_size = 64 , verbos = False , save_model = False):
    loss_func = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    train_loss, eval_loss = [], []
    
    for epoch in range(train_step):
        
        train_generator = generate_batches(train_dataset,batch_size=batch_size,shuffle=True,drop_last=False)
        eval_generator = generate_batches(eval_dataset,batch_size=batch_size,shuffle=True,drop_last=False)

        # train
        train_runnign_loss = 0.0
        train_runnign_acc = 0.0
        model.train()
        for batch_index, batch_dict in enumerate(train_generator):
            
            optimizer.zero_grad()

            ### START CODE HERE (Replace instances of 'None' with your code) ###

            # run the model and send the sentencs from the batch_dict
            y_pred = model(None)

            # get the labels from batch_dict
            labels = None
            
            # make the pad to zero then we can remove the larg number that is not in the map
            # to do that simply use mask > 0
            mask_batch = None

            # to make the pridection and compare the result we need to reshape the mask usign view function and send -1 dimension
            # use view(-1) to flatten the mask_batch
            mask_batch = None

            # use view(-1,len(tag_map)) to flatten the y_pred make sure you are using the mask_batch to remove the padding
            batch_y_pred = None

            # use view(-1) to flatten the labels make sure you are using the mask_batch to remove the padding
            batch_y_lables = None
            
            loss = loss_func(batch_y_pred,batch_y_lables.long())
            
            loss_batch = loss.item()
            
            train_runnign_loss += (loss_batch - train_runnign_loss) / (batch_index + 1)

            # use the accuracy_score function and calculate the accuracy
            # to be able to use the batch_y_pred make sure you detach and convert numpy batch_y_pred.detach().numpy()
            # !!!! hint you should use np.argmax for batch_y_pred !!!
            acc =  accuracy_score(None)
            
            ### END CODE HERE ###
            
            train_runnign_acc += (acc - train_runnign_acc) / (batch_index+1)

            if epoch == 0 or epoch == train_step // 2 or epoch == train_step-1:
                if verbos:
                    print( "(Prediction,Real) wrong Predictions : \n " , [(p,r) for p,r in zip( np.argmax( batch_y_pred.detach().numpy() , axis=1), batch_y_lables.numpy()) if p!=r ] , "\n accuracy: \n" , acc)
            
            loss.backward()
            optimizer.step()
        
        #evaluation
        eval_running_loss = 0.0
        eval_running_acc = 0.0
        model.eval()
        for batch_index, batch_dict in enumerate(eval_generator):
    
            ### START CODE HERE (Replace instances of 'None' with your code) ###

            # run the model and send the sentencs from the batch_dict
            y_pred = model(None)

            # get the labels from batch_dict
            labels = None
            
            # make the pad to zero then we can remove the larg number that is not in the map
            # to do that simply use mask > 0
            mask_batch = None

            # to make the pridection and compare the result we need to reshape the mask usign view function and send -1 dimension
            # use view(-1) to flatten the mask_batch
            mask_batch = None

            # use view(-1,len(tag_map)) to flatten the y_pred make sure you are using the mask_batch to remove the padding
            batch_y_pred = None

            # use view(-1) to flatten the labels make sure you are using the mask_batch to remove the padding
            batch_y_lables = None
            
            loss = loss_func(batch_y_pred,batch_y_lables.long())
            loss_batch = loss.item()
            
            eval_running_loss += (loss_batch - eval_running_loss) / (batch_index + 1)

            # use the accuracy_score function and calculate the accuracy
            # to be able to use the batch_y_pred make sure you detach and convert numpy batch_y_pred.detach().numpy()
            # !!!! hint you should use np.argmax for batch_y_pred !!!
            acc =  accuracy_score(None)

            ### END CODE HERE ###
            
            eval_running_acc += (acc - eval_running_acc) / (batch_index+1)
            
        train_loss.append(train_runnign_loss)
        eval_loss.append(eval_running_loss)
        
        print('step %s train loss %.3f  acc %.2f' % (epoch+ 1 , train_runnign_loss, train_runnign_acc ) , end = '\r')
        if epoch == 0 or epoch == train_step // 2 or epoch == train_step-1:
            print('step %s: train loss %.3f acc %.2f'% (epoch+1,train_runnign_loss , train_runnign_acc))
            print('step %s: eval loss %.3f  acc %.2f'% (epoch+1,eval_running_loss, eval_running_acc))
    if save_model:
        torch.save(model.state_dict(),'model/model.pkl.gz')

In [None]:
train_dataset = NerDataSet(t_sentences[:2],t_labels[:2], vocab['<PAD>'])
eval_dataset = NerDataSet(v_sentences[:2],v_labels[:2], vocab['<PAD>'])
model = NER(tags=tag_map,vocab_size=len(vocab), model_dim=100,hidden_size=64)
train_model(model,train_dataset,eval_dataset,20,2, verbos=True)

**Expected output:**   

```

(Prediction,Real) wrong Predictions : 
  [(7, 0), (6, 0), (4, 0), (4, 0), (4, 0), (13, 0), (7, 0), (13, 0), (13, 0), (4, 0), (4, 0), (7, 0), (10, 0), (13, 0), (4, 0), (4, 0), (8, 0), (4, 0), (4, 3), (4, 0), (7, 0), (8, 0), (4, 0), (7, 0), (4, 0), (13, 0), (13, 0), (13, 0), (7, 0), (4, 0), (13, 0), (6, 0), (4, 0), (7, 0), (7, 0), (8, 0), (13, 1), (4, 0), (8, 0), (13, 0), (4, 0), (4, 0), (8, 1), (7, 0), (4, 0), (15, 0), (8, 0), (6, 0), (6, 2), (7, 0), (13, 0), (4, 0), (4, 0), (7, 0)] 
 accuracy: 
 0.0
step 1: train loss 2.825 acc 0.00
step 1: eval loss 2.826  acc 0.00
(Prediction,Real) wrong Predictions : 
  [(13, 0), (13, 0), (13, 1), (7, 0), (13, 0), (4, 0), (7, 0), (4, 0), (1, 0), (10, 0), (13, 0), (13, 0)] 
 accuracy: 
 0.7777777777777778
step 11: train loss 2.523 acc 0.78
step 11: eval loss 2.729  acc 0.24
(Prediction,Real) wrong Predictions : 
  [(4, 0), (1, 0), (13, 0)] 
 accuracy: 
 0.9444444444444444
step 20: train loss 2.141 acc 0.94
step 20: eval los

```s 2.590  acc 0.41

In [None]:
train_steps = 20            # In coursera we can only train 100 steps
#!rm -f 'model/model.pkl.gz'  # Remove old model.pkl if it exists
train_dataset = NerDataSet(t_sentences,t_labels, vocab['<PAD>'])
eval_dataset = NerDataSet(v_sentences,v_labels, vocab['<PAD>'])

ner_model = NER(tags=tag_map,vocab_size=len(vocab), model_dim=200, hidden_size=64)
train_model(ner_model,train_dataset,eval_dataset,train_steps,128, verbos=False, save_model = True)

**Expected output (Approximately)**

```
...
step 1: train loss 0.763 acc 0.84
step 1: eval loss 0.437  acc 0.89
step 11: train loss 0.145 acc 0.96
step 11: eval loss 0.213  acc 0.94
step 20: train loss 0.127 acc 0.96
step 20: eval loss 0.223  acc 0.94
...
```
This value may change between executions, but it must be around 90% of accuracy on train and validations sets, after 100 training steps.

We have trained the model longer, and we give you such a trained model. In that way, we ensure you can continue with the rest of the assignment even if you had some troubles up to here, and also we are sure that everybody will get the same outputs for the last example. However, you are free to try your model, as well. 

<a name="4"></a>
## 4 - Compute Accuracy

You will now evaluate in the test set. Previously, you have seen the accuracy on the training set and the validation (noted as eval) set. You will now evaluate on your test set. To get a good evaluation, you will need to create a mask to avoid counting the padding tokens when computing the accuracy. 

<a name="ex-4"></a>
### Exercise 4 - evaluate_prediction

**Instructions:** Write a program that takes in your model and uses it to evaluate on the test set. You should be able to get an accuracy of 95%.  


In [None]:
# Please do not change the dimension and parameters in the loading model you might get an error to load a model

ner_model_loaded = NER(tags=tag_map,vocab_size=len(vocab), model_dim=200, hidden_size=64)
ner_model_loaded.load_state_dict(torch.load('model/model.pkl.gz'))

In [None]:

def predict(model,dataset,batch_size):
    model.eval()
    test_generator = generate_batches(dataset,batch_size=batch_size,shuffle=True,drop_last=True)
    test_running_acc = 0.0
    for batch_index, batch_dict in enumerate(test_generator):

         ### START CODE HERE (Replace instances of 'None' with your code) ###

        # run the model and send the sentencs from the batch_dict
        y_pred = model(None)

        # get the labels from batch_dict
        labels = None
        
        # make the pad to zero then we can remove the larg number that is not in the map
        # to do that simply use mask > 0
        mask_batch = None

        # to make the pridection and compare the result we need to reshape the mask usign view function and send -1 dimension
        # use view(-1) to flatten the mask_batch
        mask_batch = None

        # use view(-1,len(tag_map)) to flatten the y_pred make sure you are using the mask_batch to remove the padding
        batch_y_pred = None

        # use view(-1) to flatten the labels make sure you are using the mask_batch to remove the padding
        batch_y_lables = None
        
        # use the accuracy_score function and calculate the accuracy
        # to be able to use the batch_y_pred make sure you detach and convert numpy batch_y_pred.detach().numpy()
        # !!!! hint you should use np.argmax for batch_y_pred !!!
        acc =  accuracy_score(None)

        ### END CODE HERE ###

        test_running_acc += (acc - test_running_acc) / (batch_index+1)
    
    print('test acc %.2f'% ( test_running_acc))
    

In [None]:
test_dataset = NerDataSet(test_sentences,test_labels, vocab['<PAD>'])
predict(ner_model_loaded,test_dataset,64)

** Expected Output **
```
test acc 0.94

```

In [None]:
# This is the function you will be using to test your own sentence.
def predict(sentence, model, vocab, tag_map):
    s = [vocab[token] if token in vocab else vocab['UNK'] for token in sentence.split(' ')]
    batch_data = np.zeros((1, len(s)))
    batch_data[0][:] = s
    sentence = torch.tensor(np.array(batch_data)).long()
    output = model(sentence)
    outputs = np.argmax(output.detach().numpy(), axis=2)
    labels = list(tag_map.keys())
    pred = []
    for i in range(len(outputs[0])):
        idx = outputs[0][i] 
        pred_label = labels[idx]
        pred.append(pred_label)
    return pred

In [None]:
# Try the output for the introduction example
#sentence = "Many French citizens are goin to visit Morocco for summer"
#sentence = "Sharon Floyd flew to Miami last Friday"

# New york times news:
sentence = "Peter Navarro, the White House director of trade and manufacturing policy of U.S, said in an interview on Sunday morning that the White House was working to prepare for the possibility of a second wave of the coronavirus in the fall, though he said it wouldnâ€™t necessarily come"
predictions = predict(sentence, ner_model_loaded, vocab, tag_map)
for x,y in zip(sentence.split(' '), predictions):
    if y != 'O':
        print(x,y)

** Expected output **

it should be somethign like this however there is no guarante that you get the same result 
```
Peter B-per
White B-org
House I-org
Sunday B-tim
morning I-tim
White B-org
House I-org

```
  