<img align=center src="https://rhyme.com/assets/img/logo-dark.png"></img>
<h2 align=center> Named Entity Recognition (NER) using LSTMs with Keras</h2>

### Task 1: Project Overview and Import Modules

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
np.random.seed(0)
plt.style.use("ggplot")

import tensorflow as tf
print('Tensorflow version:', tf.__version__)
print('GPU detected:', tf.config.list_physical_devices('GPU'))

2023-09-06 07:47:30.851226: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-09-06 07:47:30.852778: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-09-06 07:47:30.886505: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-09-06 07:47:30.887797: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Tensorflow version: 2.13.0
GPU detected: []


2023-09-06 07:47:32.621013: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:02:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-09-06 07:47:32.621350: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1960] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...


### Task 2: Load and Explore the NER Dataset

*Essential info about tagged entities*:
- geo = Geographical Entity
- org = Organization
- per = Person
- gpe = Geopolitical Entity
- tim = Time indicator
- art = Artifact
- eve = Event
- nat = Natural Phenomenon

In [4]:
data = pd.read_csv('ner_dataset.csv', encoding='latin1')
data

Unnamed: 0,Sentence #,Word,POS,Tag
0,Sentence: 1,Thousands,NNS,O
1,,of,IN,O
2,,demonstrators,NNS,O
3,,have,VBP,O
4,,marched,VBN,O
...,...,...,...,...
1048570,,they,PRP,O
1048571,,responded,VBD,O
1048572,,to,TO,O
1048573,,the,DT,O


In [6]:
data = data.fillna(method='ffill')
data

  data = data.fillna(method='ffill')


Unnamed: 0,Sentence #,Word,POS,Tag
0,Sentence: 1,Thousands,NNS,O
1,Sentence: 1,of,IN,O
2,Sentence: 1,demonstrators,NNS,O
3,Sentence: 1,have,VBP,O
4,Sentence: 1,marched,VBN,O
...,...,...,...,...
1048570,Sentence: 47959,they,PRP,O
1048571,Sentence: 47959,responded,VBD,O
1048572,Sentence: 47959,to,TO,O
1048573,Sentence: 47959,the,DT,O


In [21]:
data.head(20)

Unnamed: 0,Sentence #,Word,POS,Tag
0,Sentence: 1,Thousands,NNS,O
1,Sentence: 1,of,IN,O
2,Sentence: 1,demonstrators,NNS,O
3,Sentence: 1,have,VBP,O
4,Sentence: 1,marched,VBN,O
5,Sentence: 1,through,IN,O
6,Sentence: 1,London,NNP,B-geo
7,Sentence: 1,to,TO,O
8,Sentence: 1,protest,VB,O
9,Sentence: 1,the,DT,O


Replacing Sentence:# with correct values

In [23]:
print(f"num cols {len(data)}")
rows = len(data)
correct_columns = ["Sentence: " + str(i) for i in range(rows)]
correct_columns

num cols 1048575


['Sentence: 0',
 'Sentence: 1',
 'Sentence: 2',
 'Sentence: 3',
 'Sentence: 4',
 'Sentence: 5',
 'Sentence: 6',
 'Sentence: 7',
 'Sentence: 8',
 'Sentence: 9',
 'Sentence: 10',
 'Sentence: 11',
 'Sentence: 12',
 'Sentence: 13',
 'Sentence: 14',
 'Sentence: 15',
 'Sentence: 16',
 'Sentence: 17',
 'Sentence: 18',
 'Sentence: 19',
 'Sentence: 20',
 'Sentence: 21',
 'Sentence: 22',
 'Sentence: 23',
 'Sentence: 24',
 'Sentence: 25',
 'Sentence: 26',
 'Sentence: 27',
 'Sentence: 28',
 'Sentence: 29',
 'Sentence: 30',
 'Sentence: 31',
 'Sentence: 32',
 'Sentence: 33',
 'Sentence: 34',
 'Sentence: 35',
 'Sentence: 36',
 'Sentence: 37',
 'Sentence: 38',
 'Sentence: 39',
 'Sentence: 40',
 'Sentence: 41',
 'Sentence: 42',
 'Sentence: 43',
 'Sentence: 44',
 'Sentence: 45',
 'Sentence: 46',
 'Sentence: 47',
 'Sentence: 48',
 'Sentence: 49',
 'Sentence: 50',
 'Sentence: 51',
 'Sentence: 52',
 'Sentence: 53',
 'Sentence: 54',
 'Sentence: 55',
 'Sentence: 56',
 'Sentence: 57',
 'Sentence: 58',
 'Sente

In [24]:
sentence_col_name = data.columns[0]
data[sentence_col_name] = correct_columns
data

Unnamed: 0,Sentence #,Word,POS,Tag
0,Sentence: 0,Thousands,NNS,O
1,Sentence: 1,of,IN,O
2,Sentence: 2,demonstrators,NNS,O
3,Sentence: 3,have,VBP,O
4,Sentence: 4,marched,VBN,O
...,...,...,...,...
1048570,Sentence: 1048570,they,PRP,O
1048571,Sentence: 1048571,responded,VBD,O
1048572,Sentence: 1048572,to,TO,O
1048573,Sentence: 1048573,the,DT,O


Add padding token.

In [29]:
words = list(set(data['Word'].values))
words.append('ENDPAD')
num_words = len(words)

In [27]:
print(f"Unique words: ${data['Word'].nunique()}")
print(f"Unique tags: ${data['Tag'].nunique()}")

Unique words: $35177
Unique tags: $17


In [31]:
tags = list(set(data["Tag"].values))
num_tags = len(tags)

In [32]:
num_words, num_tags

(35178, 17)

### Task 3: Retrieve Sentences and Corresponsing Tags

In [None]:
class SentenceGetter(object):
    def __init__(self, data):
        self.data = data

### Task 4: Define Mappings between Sentences and Tags

### Task 5: Padding Input Sentences and Creating Train/Test Splits

### Task 6: Build and Compile a Bidirectional LSTM Model

### Task 7: Train the Model

### Task 8: Evaluate Named Entity Recognition Model