# Assignment 3 - Named Entity Recognition (NER)

Welcome to the third programming assignment of Course 3. In this assignment, you will learn to build more complicated models with pytorch. By completing this assignment, you will be able to: 

- Design the architecture of a neural network, train it, and test it. 
- Process features and represents them
- Understand word padding
- Implement LSTMs
- Test with your own sentence


## Table of Contents
- [Introduction](#0)
- [1 - Exploring the Data](#1)
    - [1.1 - Importing the Data](#1-1)
    - [1.2 - Data Generator](#1-2)
		- [Exercise 1 - data_generator (UNQ_C1)](#ex-1)
- [2 - Building the Model](#2)
	- [Exercise 2 - NER (UNQ_C2)](#ex-2)
- [3 - Train the Model](#3)
    - [3.1 - Training the Model](#3-1)
        - [Exercise 3 - train_model (UNQ_C3)](#ex-3)
- [4 - Compute Accuracy](#4)
	- [Exercise 4 - evaluate_prediction (UNQ_C4)](#ex-4)
- [5 - Testing with your Own Sentence](#5)

<a name="0"></a>
## Introduction

We first start by defining named entity recognition (NER). NER is a subtask of information extraction that locates and classifies named entities in a text. The named entities could be organizations, persons, locations, times, etc. 

For example:

<img src = 'images/ner.png' width="width" height="height" style="width:600px;height:150px;"/>

Is labeled as follows: 

- French: geopolitical entity
- Morocco: geographic entity 
- Christmas: time indicator

Everything else that is labeled with an `O` is not considered to be a named entity. In this assignment, you will train a named entity recognition system that could be trained in a few seconds (on a GPU) and will get around 75% accuracy. Then, you will load in the exact version of your model, which was trained for a longer period of time. You could then evaluate the trained version of your model to get 96% accuracy! Finally, you will be able to test your named entity recognition system with your own sentence.

In [10]:
import os 
import numpy as np
import pandas as pd
import random as rnd

import torch
import torch.nn as nn
from torch.nn import functional as F
import torch.optim as optim


from utils import get_params, get_vocab
# set random seeds to make this notebook easier to replicate
rnd.seed(33)


<a name="1"></a>
## 1 - Exploring the Data

We will be using a dataset from Kaggle, which we will preprocess for you. The original data consists of four columns: the sentence number, the word, the part of speech of the word, and the tags.  A few tags you might expect to see are: 

* geo: geographical entity
* org: organization
* per: person 
* gpe: geopolitical entity
* tim: time indicator
* art: artifact
* eve: event
* nat: natural phenomenon
* O: filler word


In [11]:
# display original kaggle data
data         = pd.read_csv("data/ner_dataset.csv", encoding = "ISO-8859-1") 
train_sents  = open('data/small/train/sentences.txt', 'r').readline()
train_labels = open('data/small/train/labels.txt', 'r').readline()

print('SENTENCE:', train_sents)
print('SENTENCE LABEL:', train_labels)
print('ORIGINAL DATA:\n', data.head(5))
del(data, train_sents, train_labels)


SENTENCE: Thousands of demonstrators have marched through London to protest the war in Iraq and demand the withdrawal of British troops from that country .

SENTENCE LABEL: O O O O O O B-geo O O O O O B-geo O O O O O B-gpe O O O O O

ORIGINAL DATA:
     Sentence #           Word  POS Tag
0  Sentence: 1      Thousands  NNS   O
1          NaN             of   IN   O
2          NaN  demonstrators  NNS   O
3          NaN           have  VBP   O
4          NaN        marched  VBN   O


<a name="1-1"></a>
### 1.1 - Importing the Data

In this part, we will import the preprocessed data and explore it.

In [12]:
vocab, tag_map = get_vocab('data/large/words.txt', 'data/large/tags.txt')

t_sentences, t_labels, t_size = get_params(vocab, tag_map, 'data/large/train/sentences.txt', 'data/large/train/labels.txt')
v_sentences, v_labels, v_size = get_params(vocab, tag_map, 'data/large/val/sentences.txt', 'data/large/val/labels.txt')

test_sentences, test_labels, test_size = get_params(vocab, tag_map, 'data/large/test/sentences.txt', 'data/large/test/labels.txt')


`vocab` is a dictionary that translates a word string to a unique number. Given a sentence, you can represent it as an array of numbers translating with this dictionary. The dictionary contains a `<PAD>` token. 

When training an LSTM using batches, all your input sentences must be the same size. To accomplish this, you set the length of your sentences to a certain number and add the generic `<PAD>` token to fill all the empty spaces. 

In [13]:
# vocab translates from a word to a unique number
print('vocab["the"]:', vocab["the"])
# Pad token
print('padded token:', vocab['<PAD>'])


vocab["the"]: 9
padded token: 35180


The `tag_map` is a dictionary that maps the tags that you could have to numbers. Run the cell below to see the possible classes you will be predicting. The prepositions in the tags mean:
* I: Token is inside an entity.
* B: Token begins an entity.

In [14]:
print(tag_map)


{'O': 0, 'B-geo': 1, 'B-gpe': 2, 'B-per': 3, 'I-geo': 4, 'B-org': 5, 'I-org': 6, 'B-tim': 7, 'B-art': 8, 'I-art': 9, 'I-per': 10, 'I-gpe': 11, 'I-tim': 12, 'B-nat': 13, 'B-eve': 14, 'I-eve': 15, 'I-nat': 16}


If you had the sentence 

**"Sharon flew to Miami on Friday"**

The tags would look like:

```
Sharon B-per
flew   O
to     O
Miami  B-geo
on     O
Friday B-tim
```

where you would have three tokens beginning with B-, since there are no multi-token entities in the sequence. But if you added Sharon's last name to the sentence:

**"Sharon Floyd flew to Miami on Friday"**

```
Sharon B-per
Floyd  I-per
flew   O
to     O
Miami  B-geo
on     O
Friday B-tim
```

your tags would change to show first "Sharon" as B-per, and "Floyd" as I-per, where I- indicates an inner token in a multi-token sequence.

In [15]:
# Exploring information about the data
print('The number of outputs is tag_map', len(tag_map))
# The number of vocabulary tokens (including <PAD>)
g_vocab_size = len(vocab)
print()

print(f"Num of vocabulary words:                 {g_vocab_size}")
print()

print('The training size is                    ', t_size)
print()

print('The validation size is                  ', v_size)
print()

print('An example of the first sentence is     ', t_sentences[0])
print()

print('An example of its corresponding label is', t_labels[0])



The number of outputs is tag_map 17

Num of vocabulary words:                 35181

The training size is                     33570

The validation size is                   7194

An example of the first sentence is      [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 9, 15, 1, 16, 17, 18, 19, 20, 21]

An example of its corresponding label is [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0]


So you can see that we have already encoded each sentence into a tensor by converting it into a number. We also have 16 possible tags (excluding the '0' tag), as shown in the tag map.


<a name="1-2"></a>
### 1.2 - Data Generator

In python, a generator is a function that behaves like an iterator. It returns the next item in a pre-defined sequence. Here is a [link](https://wiki.python.org/moin/Generators) to review python generators. 

In many AI applications it is very useful to have a data generator. You will now implement a data generator for our NER application.

<a name="ex-1"></a>
### Exercise 1 - data_generator

**Instructions:** Implement a data generator function that takes in `batch_size, x, y, pad, shuffle` where $x$ is a large list of sentences, and $y$ is a list of the tags associated with those sentences and pad is a pad value. Return a subset of those inputs in a tuple of two arrays `(X,Y)`. 

`X` and `Y` are arrays of dimension (`batch_size, max_len`), where `max_len` is the length of the longest sentence *in that batch*. You will pad the `X` and `Y` examples with the pad argument. If `shuffle=True`, the data will be traversed in a random order.

**Details:**

Use this code as an outer loop
```
while True:  
...  
yield((X,Y))  
```

so your data generator runs continuously. Within that loop, define two `for` loops:  

1. The first stores temporal lists of the data samples to be included in the batch, and finds the maximum length of the sentences contained in it.

2. The second one moves the elements from the temporal list into NumPy arrays pre-filled with pad values.

There are three features useful for defining this generator:

1. The NumPy `full` function to fill the NumPy arrays with a pad value. See [full function documentation](https://numpy.org/doc/1.18/reference/generated/numpy.full.html).

2. Tracking the current location in the incoming lists of sentences. Generators variables hold their values between invocations, so we create an `index` variable, initialize to zero, and increment by one for each sample included in a batch. However, we do not use the `index` to access the positions of the list of sentences directly. Instead, we use it to select one index from a list of indexes. In this way, we can change the order in which we traverse our original list, keeping untouched our original list.  

3. Since `batch_size` and the length of the input lists are not aligned, gathering a batch_size group of inputs may involve wrapping back to the beginning of the input loop. In our approach, it is just enough to reset the `index` to 0. We can re-shuffle the list of indexes to produce different batches each time.

In [16]:
# UNQ_C1 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION: data_generator
def data_generator(batch_size, x, y, pad, shuffle=False, verbose=False):
    '''
      Input: 
        batch_size - integer describing the batch size
        x - list containing sentences where words are represented as integers
        y - list containing tags associated with the sentences
        shuffle - Shuffle the data order
        pad - an integer representing a pad character
        verbose - Print information during runtime
      Output:
        a tuple containing 2 elements:
        X - np.ndarray of dim (batch_size, max_len) of padded sentences
        Y - np.ndarray of dim (batch_size, max_len) of tags associated with the sentences in X
    '''
    
    # count the number of lines in data_lines
    num_lines = len(x)
    
    # create an array with the indexes of data_lines that can be shuffled
    lines_index = [*range(num_lines)]
    
    # shuffle the indexes if shuffle is set to True
    if shuffle:
        rnd.shuffle(lines_index)
    
    index = 0 # tracks current location in x, y
    while True:
        buffer_x = [0] * batch_size # Temporal array to store the raw x data for this batch
        buffer_y = [0] * batch_size # Temporal array to store the raw y data for this batch
                
  ### START CODE HERE (Replace instances of 'None' with your code) ###
        
        # Copy into the temporal buffers the sentences in x[index] 
        # along with their corresponding labels y[index]
        # Find maximum length of sentences in x[index] for this batch. 
        # Reset the index if we reach the end of the data set, and shuffle the indexes if needed.
        max_len = 0 
        for i in range(batch_size):
             # if the index is greater than or equal to the number of lines in x
            if index >= num_lines:
                # then reset the index to 0
                index = 0
                # re-shuffle the indexes if shuffle is set to True
                if shuffle:
                    rnd.shuffle(lines_index)
            
            # The current position is obtained using `lines_index[index]`
            # Store the x value at the current position into the buffer_x
            buffer_x[i] = x[lines_index[index]]
            
            # Store the y value at the current position into the buffer_y
            buffer_y[i] = y[lines_index[index]]
            
            lenx = len(x[lines_index[index]]) #length of current x[]
            if lenx > max_len:
                max_len = lenx #max_len tracks longest x[]
            
            # increment index by one
            index += 1


        # create X,Y, NumPy arrays of size (batch_size, max_len) 'full' of pad value
        X = np.full((batch_size, max_len), pad)
        Y = np.full((batch_size, max_len), pad)

        # copy values from lists to NumPy arrays. Use the buffered values
        for i in range(batch_size):
            # get the example (sentence as a tensor)
            # in `buffer_x` at the `i` index
            x_i = buffer_x[i]
            
            # similarly, get the example's labels
            # in `buffer_y` at the `i` index
            y_i = buffer_y[i]
            
            # Walk through each word in x_i
            for j in range(len(x_i)):
                # store the word in x_i at position j into X
                X[i, j] = x_i[j]
                
                # store the label in y_i at position j into Y
                Y[i, j] = y_i[j]

    ### END CODE HERE ###
        if verbose: print("index=", index)
        yield((X,Y))


In [17]:
batch_size     = 5
mini_sentences = t_sentences[0: 8]
mini_labels    = t_labels[0: 8]

dg = data_generator(batch_size, mini_sentences, mini_labels, vocab["<PAD>"], shuffle=False, verbose=True)

X1, Y1 = next(dg)  # index= 5 mean k next iteration par kis index say data uthana shuru krna hay is nay
X2, Y2 = next(dg)

print('Y1.shape: ', Y1.shape)
#print()
print('Y2.shape: ', Y2.shape)
print()
print('X2.shape: ', X2.shape)
#print()
print('X2.shape: ', X2.shape)
print('\n')

print('X1[0][:]: ', X1[0][:], "\n\n", 'Y1[0][:]: ', Y1[0][:])


index= 5
index= 2
Y1.shape:  (5, 30)
Y2.shape:  (5, 30)

X2.shape:  (5, 30)
X2.shape:  (5, 30)


X1[0][:]:  [    0     1     2     3     4     5     6     7     8     9    10    11
    12    13    14     9    15     1    16    17    18    19    20    21
 35180 35180 35180 35180 35180 35180] 

 Y1[0][:]:  [    0     0     0     0     0     0     1     0     0     0     0     0
     1     0     0     0     0     0     2     0     0     0     0     0
 35180 35180 35180 35180 35180 35180]


**Expected output:**   
```
index= 5
index= 2
(5, 30) (5, 30) (5, 30) (5, 30)
[    0     1     2     3     4     5     6     7     8     9    10    11
    12    13    14     9    15     1    16    17    18    19    20    21
 35180 35180 35180 35180 35180 35180] 
 [    0     0     0     0     0     0     1     0     0     0     0     0
     1     0     0     0     0     0     2     0     0     0     0     0
 35180 35180 35180 35180 35180 35180]  
```

## Building the Model

You will now implement the model that will be able to determining the tags of sentences like the following:
<table>
    <tr>
        <td>
<img src = 'images/ner1.png' width="width" height="height" style="width:500px;height:150px;"/>
        </td>
    </tr>
</table>

The model architecture will be as follows: 

<img src = 'images/ner2.png' width="width" height="height" style="width:600px;height:250px;"/>


Concretely, your inputs will be sentences represented as tensors that are fed to a model with:

* An Embedding layer,
* A LSTM layer
* A Dense layer
* A log softmax layer.

Good news! We won't make you implement the LSTM cell drawn above. You will be in charge of the overall architecture of the model.


### Exercise - Named Entity Recognition - NER

**Instructions:** Implement the initialization step and the forward function of your Named Entity Recognition system.  
Please utilize help function e.g. `help(nn.Linear)` for more information on a layer
   

-  nn.Embedding: Initializes the embedding. In this case it is the dimension of the model by the size of the vocabulary. 
    - `nn.Embedding(vocab_size, d_feature)`.
    - `vocab_size` is the number of unique words in the given vocabulary.
    - `d_feature` is the number of elements in the word embedding (some choices for a word embedding size range from 150 to 300, for example).
    

-  nn.LSTM: LSTM layer. 
    - `LSTM(n_units)` Builds an LSTM layer with hidden state and cell sizes equal to `n_units`. In trax, `n_units` should be equal to the size of the embeddings `d_feature`.



-  tl.Linear: A dense layer.
    - `nn.Linear(n_units)`: The parameter `n_units` is the number of units chosen for this dense layer.  


- F.log_softmax(): Log of the output probabilities.


In [90]:
# Define the device (GPU if available, otherwise CPU)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

class NER(nn.Module):
    def __init__(self):
        super(NER, self).__init__()

        self.embedding = nn.Embedding(num_embeddings=35181, embedding_dim=50)
        
        self.lstm      = nn.LSTM(input_size=50, hidden_size=64, batch_first=True, num_layers=3)
        
        self.fc        = nn.Linear(in_features=64, out_features=17)


    def forward(self, x, hidden, memmory_cell):                                                             ############################
        embedded_output = self.embedding(x)

        lstm_output, (hidden, memmory_cell) = self.lstm(embedded_output, (hidden, memmory_cell))            ############################

        linear_output = self.fc(lstm_output)

        return F.log_softmax(linear_output, dim=-1), hidden, memmory_cell                                   ############################



model = NER().to(device)########################
print(model)


NER(
  (embedding): Embedding(35181, 50)
  (lstm): LSTM(50, 128, num_layers=3, batch_first=True)
  (fc): Linear(in_features=128, out_features=17, bias=True)
)


## Optimizer and Loss Function

In [91]:
# optimizer and loss function

# import torch.nn as nn
# import torch.optim as optim

criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)


## Data generator

In [23]:
# Setting random seed for reproducibility and testing
rnd.seed(33)
batch_size = 64

# Create training data
train_generator = data_generator(batch_size, t_sentences, t_labels, vocab['<PAD>'], shuffle=False, verbose=True)


In [24]:
batch_of_inputs, batch_of_labels = next(train_generator)
batch_of_inputs.shape, batch_of_labels.shape


index= 64


((64, 40), (64, 40))

In [25]:
batch_of_labels[0,:]


array([    0,     0,     0,     0,     0,     0,     1,     0,     0,
           0,     0,     0,     1,     0,     0,     0,     0,     0,
           2,     0,     0,     0,     0,     0, 35180, 35180, 35180,
       35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
       35180, 35180, 35180, 35180])

In [26]:
vocab['<PAD>']


35180

In [27]:
batch_of_labels = np.where(batch_of_labels == 35180, 0, batch_of_labels)


In [28]:
batch_of_labels[0,:]


array([0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [29]:
batch_of_inputs, batch_of_labels = next(train_generator)
batch_of_inputs.shape, batch_of_labels.shape

# note length of sequence change ho rahi hay


index= 128


((64, 41), (64, 41))

In [30]:
len(t_sentences)/64, len(t_sentences)/batch_size

(524.53125, 524.53125)

## Training Loop


In [92]:
batch_size  = 64
# resetting training gen
train_generator = data_generator(batch_size, t_sentences, t_labels, vocab['<PAD>'], shuffle=False, verbose=False)

n_epochs = 10
steps_per_epoch = len(t_sentences)/batch_size


# setting up initial hidden and cell state
num_layers  = 3
batch_size  = 64
hidden_size = 128
# Optionally, create initial hidden and cell states
hidden_state = torch.zeros(num_layers, batch_size, hidden_size).to(device)
memmory_cell = torch.zeros(num_layers, batch_size, hidden_size).to(device)
# hidden_state.shape





for epoch in range(n_epochs):
    model.train()                  # agar evaluation accuracy etc bhi check kr rahay hotay tu i guess model ko dobara is tarin mode main dalna zroori tha
    total_loss=0.0

    i=0
    
    for batch_of_inputs, batch_of_labels in train_generator:
        i+=1
        batch_of_labels = np.where(batch_of_labels==35180, 0, batch_of_labels)
        batch_of_inputs, batch_of_labels = torch.from_numpy(batch_of_inputs).long(), torch.from_numpy(batch_of_labels).long()
        batch_of_inputs, batch_of_labels = batch_of_inputs.to(device), batch_of_labels.to(device)######################################


        optimizer.zero_grad()
        
        # Forward pass
        log_softmax_predictions, _, _ = model(batch_of_inputs, hidden_state, memmory_cell)     # OR log_softmax_predictions, hidden_state, memmory_cell
        
        # Reshaping the outputs and targets to match
        num_classes             =  log_softmax_predictions.size(-1)
        log_softmax_predictions =  log_softmax_predictions.view(-1, num_classes)
        batch_of_labels         =  batch_of_labels.view(-1)

        # Computing the loss
        loss = criterion(log_softmax_predictions, batch_of_labels)

        # Backward pass and optimization
        loss.backward()
        optimizer.step()


        total_loss += loss.item()

        if i % 250 == 0:
            print('i: ',i)
            print('loss: ', total_loss / (i + 1))
            print()
        if i >= 525:
            break


    avg_loss = total_loss / (i + 1)
    print(f"Epoch [{epoch + 1}/{n_epochs}], Average Loss: {avg_loss}")
    print('\n\n\n')


i:  250
loss:  0.4959373499055308

i:  500
loss:  0.3845687680496665

Epoch [1/10], Average Loss: 0.37729055792829835




i:  250
loss:  0.20794603537278347

i:  500
loss:  0.1934807835343831

Epoch [2/10], Average Loss: 0.19203686054101915




i:  250
loss:  0.14636622301017146

i:  500
loss:  0.13590730039659374

Epoch [3/10], Average Loss: 0.1347477340950372




i:  250
loss:  0.10432186244849664

i:  500
loss:  0.0981997159084755

Epoch [4/10], Average Loss: 0.09750957326687335




i:  250
loss:  0.08253520520856655

i:  500
loss:  0.07867030660341123

Epoch [5/10], Average Loss: 0.07831733099790467




i:  250
loss:  0.07075413137644648

i:  500
loss:  0.06707745376312566

Epoch [6/10], Average Loss: 0.06685994191895533




i:  250
loss:  0.06173164769265044

i:  500
loss:  0.05843341918643601

Epoch [7/10], Average Loss: 0.0582440593935011




i:  250
loss:  0.05444776151047285

i:  500
loss:  0.05122090261169299

Epoch [8/10], Average Loss: 0.05116926897458817




i:  250
loss: 

In [39]:
#torch.save(model, 'named_entity_recogniser.pt')


# Checking the inputs and ouputs of the model

In [None]:
# # Define the device (GPU if available, otherwise CPU)
# device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# class NER(nn.Module):
#     def __init__(self):
#         super(NER, self).__init__()

#         self.embedding = nn.Embedding(num_embeddings=35181, embedding_dim=50)
        
#         self.lstm      = nn.LSTM(input_size=50, hidden_size=34, batch_first=True, )
        
#         self.fc        = nn.Linear(in_features=34, out_features=17)


#     def forward(self, x, hidden, memmory_cell):                                                             ############################
#         embedded_output = self.embedding(x)

#         lstm_output, (hidden, memmory_cell) = self.lstm(embedded_output, (hidden, memmory_cell))            ############################

#         linear_output = self.fc(lstm_output)

#         return F.log_softmax(linear_output, dim=-1), hidden, memmory_cell                                   ############################



# model = NER().to(device)########################
# print(model)


In [67]:
# generating random sequnces

vocab_size=35181
torch.manual_seed(0)
input_sequences = torch.randint(0, vocab_size, (64, 10))
print('input_sequences: ', input_sequences.shape)


input_sequences:  torch.Size([64, 10])


In [68]:
# embedding layer
embedding_ = nn.Embedding(num_embeddings=vocab_size, embedding_dim=50)
embed_out_ = embedding_(input_sequences)

print('embed_out: ', embed_out_.shape)


embed_out:  torch.Size([64, 10, 50])


In [72]:
# GRU layer

# Defining the LSTM parameters
input_size  = 50  # Number of features in the input
hidden_size = 34 # Number of features in the hidden state
num_layers  = 1   # Number of recurrent layers


# Create an LSTM layer
lstm_ = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)


# Optionally, create initial hidden and cell states
h0 = torch.zeros(num_layers, batch_size, hidden_size)
c0 = torch.zeros(num_layers, batch_size, hidden_size)


# Forward pass through the LSTM
lstm_out_, (hn, cn) = lstm_(embed_out_, (h0, c0))             ##############################################################################################

# Print the shapes of the outputs
print("Output shape:", lstm_out_.shape)  # Output shape: (batch_size, seq_len, num_directions * hidden_size)
print("hn shape:", hn.shape)          # hn shape: (num_layers * num_directions, batch_size, hidden_size)
print("cn shape:", cn.shape)          # cn shape: (num_layers * num_directions, batch_size, hidden_size)


Output shape: torch.Size([64, 10, 34])
hn shape: torch.Size([1, 64, 34])
cn shape: torch.Size([1, 64, 34])


In [45]:
hn[0,0,:] = 0
hn[0,0,:]

# so ye qaabil e taghaiyyur hain


tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       grad_fn=<SliceBackward0>)

In [75]:
# Fully connected or dense layer
fc_ = nn.Linear(in_features=34, out_features=17)
fc_out_ = fc_(lstm_out_)
fc_out_.shape


torch.Size([64, 10, 17])

In [77]:
# taking log softmax
softmax_out_ = F.softmax(fc_out_, dim=-1)
softmax_out_.shape


torch.Size([64, 10, 17])

In [78]:
softmax_out_[0,0,:]


tensor([0.0550, 0.0607, 0.0531, 0.0604, 0.0530, 0.0571, 0.0703, 0.0601, 0.0656,
        0.0552, 0.0693, 0.0488, 0.0514, 0.0572, 0.0641, 0.0561, 0.0627],
       grad_fn=<SliceBackward0>)

In [80]:
torch.sum(softmax_out_[0,0,:])
# perfect


tensor(1., grad_fn=<SumBackward0>)

In [88]:
# doing the same using the whole model directly
# does the model gives the same output? yes!
a,b,c = model(input_sequences, h0, c0)
a.shape
#perfect


torch.Size([64, 10, 17])

# Accuracy on Test Data


### data generator and mask

In [93]:
batch_size     = len(v_sentences)
test_generator = data_generator(batch_size, v_sentences, v_labels, vocab['<PAD>'], shuffle=False, verbose=False)


In [94]:
len(v_sentences), len(v_labels)

(7194, 7194)

In [95]:
val_inputs, val_labels = next(test_generator)
val_inputs.shape, val_labels.shape


((7194, 73), (7194, 73))

In [96]:
val_inputs[0,:], val_labels[0, :]

(array([ 1020,    68,  5092,    50,     9, 29845,  1677, 18327,  1033,
            9,  4452,    13,   522, 29846,    45, 10314,   223,  6582,
           21, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
        35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
        35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
        35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
        35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
        35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
        35180]),
 array([    1,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     5,
            0, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
        35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
        35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
        35180, 35180, 35180, 35180, 35180, 35180, 35180, 351

In [97]:
mask = np.where(val_labels != 35180, 1, val_labels)
mask[0,:]


array([    1,     1,     1,     1,     1,     1,     1,     1,     1,
           1,     1,     1,     1,     1,     1,     1,     1,     1,
           1, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
       35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
       35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
       35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
       35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
       35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
       35180])

In [98]:
mask = np.where(mask == 35180, 0, mask)
mask[0,:]


array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0])

In [99]:
val_labels[0,:]


array([    1,     0,     0,     0,     0,     0,     0,     0,     0,
           0,     0,     0,     0,     0,     0,     0,     0,     5,
           0, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
       35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
       35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
       35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
       35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
       35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
       35180])

In [100]:
val_inputs, val_labels = torch.from_numpy(val_inputs).long().to(device), torch.from_numpy(val_labels).long().to(device)
mask = torch.from_numpy(mask).long().to(device)


### making predictions


In [103]:
# setting up initial hidden and cell state
num_layers  = 3
batch_size  = len(v_sentences)
hidden_size = 64
# Optionally, create initial hidden and cell states
hidden_state = torch.zeros(num_layers, batch_size, hidden_size).to(device)####################################
memmory_cell = torch.zeros(num_layers, batch_size, hidden_size).to(device)#####################################
# hidden_state.shape


In [104]:
with torch.no_grad():
    model.eval()
    predictions, _, _ = model(val_inputs, hidden_state, memmory_cell)
predictions.shape


torch.Size([7194, 73, 17])

#### reshaping predictions, getting class indices from logsoftmax probs

In [105]:
# testing argmax on random data
a = torch.randint(0, 17, (2, 3, 17))
a


tensor([[[ 4,  8,  5,  9, 12,  9,  2,  3,  4, 16,  8,  1,  5,  2,  3, 15, 10],
         [ 1, 12, 14, 11, 12,  3, 13,  2,  4, 14, 16,  8, 16, 13,  4, 11, 10],
         [13, 14, 12,  1,  0,  6, 10,  6,  2, 15,  1,  2,  1,  9, 14, 10,  2]],

        [[14, 10, 16, 15,  3, 12,  8, 11,  5, 13,  4, 16, 16,  3, 15, 15, 16],
         [15,  5, 15,  7,  9, 10,  4,  9,  9,  1,  6,  8, 10,  6, 14, 10, 13],
         [ 0, 16,  9, 14, 10,  7,  8, 10, 16, 13, 13, 12,  4, 10,  0,  3,  3]]])

In [106]:
# testing argmax on random data
torch.argmax(a, dim=-1)


tensor([[ 9, 10,  9],
        [ 2,  0,  1]])

In [107]:
reduced_predictions = torch.argmax(predictions, dim=-1)
reduced_predictions.shape

torch.Size([7194, 73])

In [108]:
# looing at data
reduced_predictions[1,:]


tensor([1, 0, 0, 0, 0, 0, 0, 1, 4, 0, 7, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0], device='cuda:0')

In [109]:
val_labels[1,:]


tensor([    1,     0,     0,     0,     0,     0,     0,     1,     4,     0,
            0,     0,     0,     7,     0,     0,     0,     0,     0,     0,
            8,     0, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
        35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
        35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
        35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
        35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180, 35180,
        35180, 35180, 35180], device='cuda:0')

### Accuracy using for loop


In [110]:
list_of_correct_preds = []
list_of_lengths_of_sequences = []

for i in range(len(val_labels)):
    
    length_of_current_sequence      = torch.sum(mask[i,:])
 
    current_number_of_correct_preds = torch.sum( reduced_predictions[i,: length_of_current_sequence] == val_labels[i, :length_of_current_sequence] )
    
    list_of_lengths_of_sequences.append(length_of_current_sequence)
    list_of_correct_preds.append(current_number_of_correct_preds)


In [111]:
correct_preds        = torch.tensor(list_of_correct_preds)
lengths_of_sequences = torch.tensor(list_of_lengths_of_sequences)
correct_preds, lengths_of_sequences


(tensor([18, 20,  9,  ..., 12,  9, 20]),
 tensor([19, 22, 10,  ..., 12,  9, 20]))

In [112]:
accuracy = torch.sum(correct_preds)/torch.sum(lengths_of_sequences)
accuracy


tensor(0.9511)

In [113]:
torch.sum(mask[0,:])

tensor(19, device='cuda:0')

In [114]:
val_labels[0, :torch.sum(mask[0,:])+1], val_labels[0, :torch.sum(mask[0,:])]


(tensor([    1,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     5,     0, 35180],
        device='cuda:0'),
 tensor([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0],
        device='cuda:0'))

In [115]:
torch.sum(torch.tensor([1,2,3,4,5])==torch.tensor([1,2,3,4,6]))


tensor(4)

In [116]:
current_length=19
torch.sum( reduced_predictions[0,: current_length] == val_labels[0, :current_length] )


tensor(18, device='cuda:0')

### Accuracy using matrix operations

In [117]:
# what do val_labels look like?
val_labels

tensor([[    1,     0,     0,  ..., 35180, 35180, 35180],
        [    1,     0,     0,  ..., 35180, 35180, 35180],
        [    5,     0,     0,  ..., 35180, 35180, 35180],
        ...,
        [    0,     0,     0,  ..., 35180, 35180, 35180],
        [    0,     0,     0,  ..., 35180, 35180, 35180],
        [    0,     0,     0,  ..., 35180, 35180, 35180]], device='cuda:0')

In [118]:
# what do preds look like
reduced_predictions

tensor([[1, 0, 0,  ..., 0, 0, 0],
        [1, 0, 0,  ..., 0, 0, 0],
        [1, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]], device='cuda:0')

In [119]:
# length of each sequence without padding
lengths_of_sequences = torch.sum(mask, dim=-1)
lengths_of_sequences, lengths_of_sequences.shape #perfect


(tensor([19, 22, 10,  ..., 12,  9, 20], device='cuda:0'), torch.Size([7194]))

In [120]:
# is it working fine on a random sequence?
torch.sum(reduced_predictions[1,:] == val_labels[1,:])


tensor(20, device='cuda:0')

In [121]:
# finally accuracy
accuracy = torch.sum(reduced_predictions == val_labels)/torch.sum(lengths_of_sequences)
accuracy


tensor(0.9511, device='cuda:0')

<a name="5"></a>
## 5 - Testing with your Own Sentence


Below, you can test it out with your own sentence! 

In [122]:
# import torch
# import numpy as np

def predict(sentence, model, vocab, tag_map):
    # Tokenize the sentence and convert tokens to their corresponding indices
    s = [vocab[token] if token in vocab else vocab['UNK'] for token in sentence.split(' ')]
    
    # Create a tensor with the appropriate shape and fill it with the indices
    batch_data = torch.ones((1, len(s)), dtype=torch.long).to(device)
    batch_data[0, :] = torch.tensor(s).to(device)
    
    # Pass the tensor through the model
    with torch.no_grad():
        output, _, _ = model(batch_data, hidden_state, memmory_cell)
    
    # Get the indices of the maximum values along the last dimension
    outputs = torch.argmax(output, dim=2)
    
    # Convert indices to labels using the tag_map
    labels = list(tag_map.keys())
    pred = [labels[idx] for idx in outputs[0]]
    
    return pred


In [123]:
# setting up initial hidden and cell state
num_layers  = 3
batch_size  = 1
hidden_size = 128
# Optionally, create initial hidden and cell states
hidden_state = torch.zeros(num_layers, batch_size, hidden_size).to(device)####################################
memmory_cell = torch.zeros(num_layers, batch_size, hidden_size).to(device)#####################################
# hidden_state.shape


# Try the output for the introduction example
#sentence = "Many French citizens are goin to visit Morocco for summer"
#sentence = "Sharon Floyd flew to Miami last Friday"

# New york times news:
sentence = "Peter Navarro, the White House director of trade and manufacturing policy of U.S, said in an interview on Sunday morning that the White House was working to prepare for the possibility of a second wave of the coronavirus in the fall, though he said it wouldn’t necessarily come"
s = [vocab[token] if token in vocab else vocab['UNK'] for token in sentence.split(' ')]
predictions = predict(sentence, model, vocab, tag_map)
for x,y in zip(sentence.split(' '), predictions):
    if y != 'O':
        print(x,y)

Peter B-per
Navarro, I-per
White B-org
House I-org
U.S, B-per
Sunday B-tim
morning I-tim
White B-org
House I-org
coronavirus B-org
fall, B-gpe
wouldn’t B-per


**Expected Results**

```
Peter B-per
Navarro, I-per
White B-org
House I-org
Sunday B-tim
morning I-tim
White B-org
House I-org
coronavirus B-tim
fall, B-tim
```

### their output


In [27]:
# Try the output for the introduction example
#sentence = "Many French citizens are goin to visit Morocco for summer"
#sentence = "Sharon Floyd flew to Miami last Friday"

# New york times news:
sentence = "Peter Navarro, the White House director of trade and manufacturing policy of U.S, said in an interview on Sunday morning that the White House was working to prepare for the possibility of a second wave of the coronavirus in the fall, though he said it wouldn’t necessarily come"
s = [vocab[token] if token in vocab else vocab['UNK'] for token in sentence.split(' ')]
predictions = predict(sentence, model, vocab, tag_map)
for x,y in zip(sentence.split(' '), predictions):
    if y != 'O':
        print(x,y)

Peter B-per
Navarro, I-per
White B-org
House I-org
Sunday B-tim
morning I-tim
White B-org
House I-org
coronavirus B-tim
fall, B-tim


Peter B-per

Navarro, I-per

White B-org

House I-org

Sunday B-tim

morning I-tim

White B-org

House I-org

coronavirus B-tim

fall, B-tim
