# Tutorial 4: Training Named Entity Recognition Taggers from Scratch using Bi-Directional LSTMs

<img src="https://images.unsplash.com/photo-1583361704493-d4d4d1b1d70a?ixid=MXwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHw%3D&ixlib=rb-1.2.1&auto=format&fit=crop&w=1651&q=80">

__Named Entity Recognition (NER)__ , also known as entity chunking/extraction , is a popular technique used in information extraction to identify and segment the named entities and classify or categorize them under various predefined classes.

There are various off the shelf solutions which offer capabilites to perform named entity extraction (some of which we discussed in the previous units). Yet there are times when the requirements are beyond the capabilities of off-the-shelf classifiers.

In this notebook, we will go through an exercise to build our own NER using Bi-directional LSTMs. We would be utilizing ``tensorflow.keras`` to develop our NER model.

## Load Dataset

Named Entity Recognition is a sequence modeling problem at it's core. It is more related to classification class of problems where in we need a labeled dataset to train a classifier.

There are various labeled datasets for NER class of problems. We would be utilizing a pre-processed version of __GMB(Groningen Meaning Bank)__ corpus for this notebook. The preprocessed version is availble at the following link : [kaggle/ner](https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus)

We have provided the dataset in the code repository itself using some intelligent compression and you can access it directly from pandas as follows.

In [1]:
!nvidia-smi

Thu Jul 22 22:30:55 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.42.01    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   38C    P0    27W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
import numpy as np 
import pandas as pd

In [6]:
from google.colab import drive
# drive.mount('/content/drive')
dataset = pd.read_csv("/content/drive/My Drive/NLP_DeepLearning_Course/Week2/ner_dataset2.csv")
dataset.info()


# dataset = pd.read_csv('ner_dataset.csv.gz', 
#                  compression='gzip', 
#                  encoding='ISO-8859-1')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1048575 entries, 0 to 1048574
Data columns (total 4 columns):
 #   Column        Non-Null Count    Dtype 
---  ------        --------------    ----- 
 0   Sentence Num  47959 non-null    object
 1   Word          1048575 non-null  object
 2   POS           1048575 non-null  object
 3   Tag           1048575 non-null  object
dtypes: object(4)
memory usage: 32.0+ MB


In [7]:
dataset = dataset.fillna(method='ffill')

In [8]:
dataset.head()

Unnamed: 0,Sentence Num,Word,POS,Tag
0,Sentence: 1,Thousands,NNS,O
1,Sentence: 1,of,IN,O
2,Sentence: 1,demonstrators,NNS,O
3,Sentence: 1,have,VBP,O
4,Sentence: 1,marched,VBN,O


The GMB dataset utilizes IOB tagging or Inside, Outside Beginning. IOB is a common tagging format for tagging tokens which we have discussed earlier. To refresh your memory:

+ __I- prefix__ before a tag indicates that the tag is inside a chunk.
+ __B- prefix__ before a tag indicates that the tag is the beginning of a chunk.
+ __O- tag__ indicates that a token belongs to no chunk (outside).


The tags in this dataset are explained as follows:

+ __geo__ = Geographical Entity
+ __org__ = Organization
+ __per__ = Person
+ __gpe__ = Geopolitical Entity
+ __tim__ = Time indicator
+ __art__ = Artifact
+ __eve__ = Event
+ __nat__ = Natural Phenomenon

Anything outside these classes is termed as other, denoted as O.



## Prepare Dataset

In [12]:
class SentenceGetter(object):
    
    def __init__(self, dataset):
        self.n_sent = 1
        self.dataset = dataset
        self.empty = False
        agg_func = lambda s: [(w, t) for w,t in zip(s["Word"].values.tolist(),
                                                        s["Tag"].values.tolist())]
        self.grouped = self.dataset.groupby("Sentence Num").apply(agg_func)
        self.sentences = [s for s in self.grouped]
    
    def get_next(self):
        try:
            s = self.grouped["Sentence: {}".format(self.n_sent)]
            self.n_sent += 1
            return s
        except:
            return None

In [13]:
getter = SentenceGetter(dataset)
sentences = getter.sentences

In [None]:
sentences[:2]

In [21]:
maxlen = max([len(s) for s in sentences])
print ('Maximum sequence length:', maxlen)

Maximum sequence length: 104


In [22]:
# get a list of words and add an additional token to denote out of vocabulary(OOV)
words = list(set(dataset["Word"].values))
words.append("<UNK>") ## OOV unknown token
words = ['<PAD>'] + words
print(words[:5])
print(words[-5:])

['<PAD>', 'diversify', 'chanted', 'evening', 'Jean-Jacques']
['Morag', 'Khalis', 'porch', 'chaired', '<UNK>']


In [23]:
n_words = len(words)
n_words

35180

In [24]:
# prepare a list of tags
tags = list(set(dataset["Tag"].values))
n_tags = len(tags)
n_tags

17

In [19]:
# word to index and tag to index mappings
word2idx = {w: i for i, w in enumerate(words)}
tag2idx = {t: i for i, t in enumerate(tags)}

### Prepare Uniform Length Sequences

__Padding__: The LSTM layers accept sequences of the same length only. Therefore, every sentence represented as integers (‘Word_idx’) must be padded to have the same length. We will work with the max length of the longest sequence and pad the shorter sequences to achieve this.

In [25]:
from tensorflow.keras.preprocessing.sequence import pad_sequences
X = [[word2idx[w[0]] for w in s] for s in sentences]
print(X[:3])
print([len(s) for s in X[:3]])

[[13599, 30684, 26678, 34704, 4982, 14190, 24063, 15106, 13817, 20080, 12722, 19876, 9211, 7525, 13862, 20080, 10935, 30684, 11154, 26164, 5049, 24569, 22231, 33026], [18006, 4373, 1544, 7369, 17104, 15106, 11401, 11296, 15106, 12680, 21497, 22502, 30684, 20080, 4369, 20899, 27526, 23289, 30728, 10605, 21718, 25141, 15017, 6528, 33026], [6418, 8320, 32867, 34106, 4140, 11805, 19876, 20080, 26188, 13305, 32414, 27526, 7156, 7158, 8839, 3519, 29223, 22281, 15106, 34704, 22120, 15106, 12745, 30728, 30681, 13675, 22312, 19876, 20025, 6346, 19546, 33026]]
[24, 25, 32]


In [26]:
X = pad_sequences(maxlen=104, sequences=X, padding="post")
print(X[:3])
print([len(s) for s in X[:3]])

[[13599 30684 26678 34704  4982 14190 24063 15106 13817 20080 12722 19876
   9211  7525 13862 20080 10935 30684 11154 26164  5049 24569 22231 33026
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0]
 [18006  4373  1544  7369 17104 15106 11401 11296 15106 12680 21497 22502
  30684 20080  4369 20899 27526 23289 30728 10605 21718 25141 15017  6528
  33026     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0

### Transform Tags

In [27]:
y = [[tag2idx[w[1]] for w in s] for s in sentences]
print(y[:3])
print([len(s) for s in y[:3]])

[[12, 12, 12, 12, 12, 12, 1, 12, 12, 12, 12, 12, 1, 12, 12, 12, 12, 12, 2, 12, 12, 12, 12, 12], [2, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 4, 12, 12, 12, 8, 12, 12, 12, 12, 12], [12, 12, 4, 12, 12, 12, 12, 12, 1, 12, 12, 12, 12, 12, 8, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 1, 3, 12]]
[24, 25, 32]


In [28]:
y = pad_sequences(maxlen=104, sequences=y, padding="post", value=tag2idx["O"])
print(y[:3])
print([len(s) for s in y[:3]])

[[12 12 12 12 12 12  1 12 12 12 12 12  1 12 12 12 12 12  2 12 12 12 12 12
  12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
  12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
  12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
  12 12 12 12 12 12 12 12]
 [ 2 12 12 12 12 12 12 12 12 12 12 12 12 12 12  4 12 12 12  8 12 12 12 12
  12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
  12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
  12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
  12 12 12 12 12 12 12 12]
 [12 12  4 12 12 12 12 12  1 12 12 12 12 12  8 12 12 12 12 12 12 12 12 12
  12 12 12 12 12  1  3 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
  12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
  12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
  12 12 12 12 12 12 12 12]]
[104, 104, 104]


In [29]:
from tensorflow.keras.utils import to_categorical
y = [to_categorical(i, num_classes=n_tags) for i in y]
y[:3]

[array([[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]], dtype=float32),
 array([[0., 0., 1., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]], dtype=float32),
 array([[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)]

## Prepare Train-Test Splits

In [30]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, shuffle=False)

## Build the Model

RNNs are capable of handling different input and output combinations. 

<img src="https://stanford.edu/~shervine/teaching/cs-230/illustrations/rnn-many-to-many-same-ltr.png?2790431b32050b34b80011afead1f232">

[Source: CS-230 Stanford](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks)

In this architecture, we are primarily working with three layers embedding, bi-lstm and the 3rd layer, which is ``TimeDistributedDense`` layer, to output the result. We are dealing with Many to Many RNN Architecture, where we expect output from every input sequence. 

Here is an example, in the sequence (x1 →y1, x2 →y2…xn →yn), x, and y are inputs and outputs of every sequence respectively. The ``TimeDistributeDense`` layers allow Dense(fully-connected) operation across every output over every time-step. Not using this layer will result in a single final output.

In [31]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input,LSTM, Embedding, Dense, TimeDistributed, Dropout, Bidirectional

In [32]:
input = Input(shape=(104,))
model = Embedding(input_dim=n_words, output_dim=300)(input)
model = Dropout(0.1)(model)
model = Bidirectional(LSTM(units=300, 
                           return_sequences=True, 
                           recurrent_dropout=0.1))(model)
out = TimeDistributed(Dense(n_tags, activation="softmax"))(model)  # softmax output layer



In [33]:
model = Model(input, out)

In [36]:
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

In [37]:
model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 104)]             0         
_________________________________________________________________
embedding (Embedding)        (None, 104, 300)          10554000  
_________________________________________________________________
dropout (Dropout)            (None, 104, 300)          0         
_________________________________________________________________
bidirectional (Bidirectional (None, 104, 600)          1442400   
_________________________________________________________________
time_distributed (TimeDistri (None, 104, 17)           10217     
Total params: 12,006,617
Trainable params: 12,006,617
Non-trainable params: 0
_________________________________________________________________


In [38]:
model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 104)]             0         
_________________________________________________________________
embedding (Embedding)        (None, 104, 300)          10554000  
_________________________________________________________________
dropout (Dropout)            (None, 104, 300)          0         
_________________________________________________________________
bidirectional (Bidirectional (None, 104, 600)          1442400   
_________________________________________________________________
time_distributed (TimeDistri (None, 104, 17)           10217     
Total params: 12,006,617
Trainable params: 12,006,617
Non-trainable params: 0
_________________________________________________________________


## Train the Model

In [39]:
history = model.fit(X_train, 
                    np.array(y_train), 
                    batch_size=32, 
                    epochs=2, 
                    validation_split=0.2, 
                    verbose=1)

Epoch 1/2
Epoch 2/2


## View predictions on Test Samples

In [44]:
tags = np.array(tags)

In [45]:
pd.set_option('display.max_columns', None)

In [41]:
y_test[0]

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)

In [46]:
i = 0
p = model.predict(np.array([X_test[i]]))
p = np.argmax(p, axis=-1)
pred_tags = tags[p[0]]

pd.DataFrame({
    'word': [words[w] for w in X_test[i]],
    'predicted_ne': pred_tags,
    'actual_ne': tags[np.argmax(y_test[i], axis=-1)]
}).T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103
word,Mr.,Bush,also,called,for,greater,use,of,clean,coal,technology,",",solar,and,wind,energy,",",and,nuclear,power,as,alternatives,to,oil,.,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>
predicted_ne,B-per,I-per,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O
actual_ne,B-per,I-per,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O


In [47]:
i = 1
p = model.predict(np.array([X_test[i]]))
p = np.argmax(p, axis=-1)
pred_tags = tags[p[0]]

pd.DataFrame({
    'word': [words[w] for w in X_test[i]],
    'predicted_ne': pred_tags,
    'actual_ne': tags[np.argmax(y_test[i], axis=-1)]
}).T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103
word,In,line,with,IMF,conditions,",",in,2009,",",Belarus,devalued,the,ruble,more,than,40,%,and,tightened,some,fiscal,and,monetary,policies,.,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>,<PAD>
predicted_ne,O,O,O,B-org,O,O,O,B-tim,O,B-geo,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O
actual_ne,O,O,O,B-org,O,O,O,B-tim,O,B-org,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O,O


## Test on a Random Sample

In [48]:
text = """Three more countries have joined an “international grand committee” of parliaments, adding to calls for 
Facebook’s boss, Mark Zuckerberg, to give evidence on misinformation to the coalition. Brazil, Latvia and Singapore 
bring the total to eight different parliaments across the world, with plans to send representatives to London on 27 
November with the intention of hearing from Zuckerberg.”
"""

In [49]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [50]:
# tokenize sentences
text_sents = nltk.sent_tokenize(text)
text_sents

['Three more countries have joined an “international grand committee” of parliaments, adding to calls for \nFacebook’s boss, Mark Zuckerberg, to give evidence on misinformation to the coalition.',
 'Brazil, Latvia and Singapore \nbring the total to eight different parliaments across the world, with plans to send representatives to London on 27 \nNovember with the intention of hearing from Zuckerberg.”']

In [57]:
# generate word token IDs
idx_text = [[word2idx.get(i,n_words - 1) for i in text.split()] for text in text_sents]
print(idx_text)

[[3392, 7980, 33481, 34704, 2707, 30728, 35179, 24302, 35179, 30684, 35179, 34485, 15106, 5511, 13917, 35179, 35179, 13242, 35179, 15106, 27649, 26854, 10310, 35179, 15106, 20080, 35179], [35179, 20038, 7525, 4930, 14437, 20080, 8531, 15106, 13811, 23123, 24465, 16424, 20080, 35179, 25801, 11, 15106, 35070, 4784, 15106, 24063, 10310, 28548, 8749, 25801, 20080, 9345, 30684, 28844, 5049, 35179]]


In [58]:
# pad sequences
pad_tokens = pad_sequences(idx_text, 
                           maxlen=104, 
                           dtype='int32', 
                           padding='post')
print(pad_tokens)

[[ 3392  7980 33481 34704  2707 30728 35179 24302 35179 30684 35179 34485
  15106  5511 13917 35179 35179 13242 35179 15106 27649 26854 10310 35179
  15106 20080 35179     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0]
 [35179 20038  7525  4930 14437 20080  8531 15106 13811 23123 24465 16424
  20080 35179 25801    11 15106 35070  4784 15106 24063 10310 28548  8749
  25801 20080  9345 30684 28844  5049 35179     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0

In [59]:
# predict NER tags
outputs = model.predict(pad_tokens)
outputs = np.argmax(outputs, axis=-1)
pred_tags = tags[outputs]
pred_tags

array([['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O',
        'O', 'O', 'O', 'O', 'B-per', 'O', 'O', 'O', 'O', 'O', 'O', 'O',
        'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O',
        'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O',
        'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O',
        'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O',
        'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O',
        'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O',
        'O'],
       ['O', 'B-geo', 'O', 'B-org', 'O', 'O', 'O', 'O', 'O', 'O', 'O',
        'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-geo', 'O',
        'B-tim', 'I-tim', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O',
        'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O',
        'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O',
        'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 

In [56]:
# Display tagged tokens and words
tagged_tokens = [list(zip(tt, pt)) for tt, pt in zip(text_tokens, pred_tags)]
tagged_tokens_flat = [item for sublist in tagged_tokens for item in sublist]
pd.DataFrame(tagged_tokens_flat, columns=['Word', 'Tag'])

NameError: ignored

## Compare with Spacy

In [60]:
import spacy
from spacy import displacy
nlp = spacy.load('en_core_web_sm')

In [61]:
spacy_text = nlp(text)

In [62]:
displacy.render(spacy_text, style = 'ent', jupyter=True)