## Named Entity Recognition using Bidirectional Encoder Representations from Transformers(BERT) 

Install simpletransformers library, you can use huggingface too as per your confortability. I tried running this code on kaggle notebook but during compiling the cells it throwed lot's of error related to the tokenizer and transformers packages, so be sure to use google colab to run the code for NER.

Bert is just a transformer based model which is the encoder part of the transformer, In such a way that the encoders are stacked one below another. The encoder consists of a starting input embedding layer >> Positional embedding >> Multihead-Attention >> Layer_Normalization >> Feed_forward_Network >> layer_normalization. 

And such mechanism is repeated which is called bert, that can access sequences from both sides of the encoder i.e bidirectional encoder.

In [1]:
# Installing simpletransformers libraries
pip install simpletransformers

Collecting simpletransformers
[?25l  Downloading https://files.pythonhosted.org/packages/f8/c0/c3dc5858a966308a9c9bc5344ead07ba9b87ac1b513f2b3c57390a143c0e/simpletransformers-0.51.9-py3-none-any.whl (201kB)
[K     |████████████████████████████████| 204kB 8.0MB/s 
[?25hCollecting tokenizers
[?25l  Downloading https://files.pythonhosted.org/packages/0f/1c/e789a8b12e28be5bc1ce2156cf87cb522b379be9cadc7ad8091a4cc107c4/tokenizers-0.9.4-cp36-cp36m-manylinux2010_x86_64.whl (2.9MB)
[K     |████████████████████████████████| 2.9MB 15.0MB/s 
[?25hCollecting sentencepiece
[?25l  Downloading https://files.pythonhosted.org/packages/e5/2d/6d4ca4bef9a67070fa1cac508606328329152b1df10bdf31fb6e4e727894/sentencepiece-0.1.94-cp36-cp36m-manylinux2014_x86_64.whl (1.1MB)
[K     |████████████████████████████████| 1.1MB 42.7MB/s 
[?25hCollecting streamlit
[?25l  Downloading https://files.pythonhosted.org/packages/dd/8d/4c7676d01e90852254e2275fb4639b747274430f2fa066aa94848d3a6ee4/streamlit-0.73.1-py2.py

Import Libraries

In [9]:
import pandas as pd
import torch
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
from simpletransformers.ner import NERModel,NERArgs
from sklearn.metrics import accuracy_score
from keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
torch.__version__

import numpy as np
from tqdm import tqdm, trange
import logging



Extract the zip file

In [2]:
from zipfile import ZipFile
f = ZipFile('/content/drive/MyDrive/archive.zip')
f.extractall()

Load Data

In [3]:
data = pd.read_csv("/content/ner_dataset.csv", encoding="latin1").fillna(method="ffill")
data.tail(10)

Unnamed: 0,Sentence #,Word,POS,Tag
1048565,Sentence: 47958,impact,NN,O
1048566,Sentence: 47958,.,.,O
1048567,Sentence: 47959,Indian,JJ,B-gpe
1048568,Sentence: 47959,forces,NNS,O
1048569,Sentence: 47959,said,VBD,O
1048570,Sentence: 47959,they,PRP,O
1048571,Sentence: 47959,responded,VBD,O
1048572,Sentence: 47959,to,TO,O
1048573,Sentence: 47959,the,DT,O
1048574,Sentence: 47959,attack,NN,O


In [4]:
# No null values
data.isnull().any()

Sentence #    False
Word          False
POS           False
Tag           False
dtype: bool

Pre-Processing Text

In [5]:
data["Sentence #"] = LabelEncoder().fit_transform(data["Sentence #"] )
data.rename(columns={"Sentence #":"sentence_id","Word":"words","Tag":"labels"}, inplace =True)

In [6]:
X= data[["sentence_id","words"]]
Y =data["labels"]

x_train, x_test, y_train, y_test = train_test_split(X,Y, test_size =0.2)

train_data = pd.DataFrame({"sentence_id":x_train["sentence_id"],"words":x_train["words"],"labels":y_train})
test_data = pd.DataFrame({"sentence_id":x_test["sentence_id"],"words":x_test["words"],"labels":y_test})

label = data["labels"].unique().tolist()

FineTuning Bert

In [7]:

logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)

train_args = {
    "reprocess_input_data": True,
    "overwrite_output_dir": True,
    "use_early_stopping":True,
    "weight_decay": 0.01,
    "do_lower_case":False,
    "num_train_epochs": 1,
    "learning_rate": 1e-4,
    "overwrite_output_dir":True,
    "train_batch_size": 32,
    "eval_batch_size": 32
}

model = NERModel('bert', 'bert-base-cased',labels=label,args =train_args)

INFO:filelock:Lock 140657953496872 acquired on /root/.cache/huggingface/transformers/a803e0468a8fe090683bdc453f4fac622804f49de86d7cecaee92365d4a0f829.0d87139f53a477d9f900f8a9020c367863079014bdaf2aa713f4b64cf1782655.lock


Downloading:   0%|          | 0.00/433 [00:00<?, ?B/s]

INFO:filelock:Lock 140657953496872 released on /root/.cache/huggingface/transformers/a803e0468a8fe090683bdc453f4fac622804f49de86d7cecaee92365d4a0f829.0d87139f53a477d9f900f8a9020c367863079014bdaf2aa713f4b64cf1782655.lock
INFO:filelock:Lock 140657953267384 acquired on /root/.cache/huggingface/transformers/092cc582560fc3833e556b3f833695c26343cb54b7e88cd02d40821462a74999.1f48cab6c959fc6c360d22bea39d06959e90f5b002e77e836d2da45464875cda.lock


Downloading:   0%|          | 0.00/436M [00:00<?, ?B/s]

INFO:filelock:Lock 140657953267384 released on /root/.cache/huggingface/transformers/092cc582560fc3833e556b3f833695c26343cb54b7e88cd02d40821462a74999.1f48cab6c959fc6c360d22bea39d06959e90f5b002e77e836d2da45464875cda.lock
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be e

Downloading:   0%|          | 0.00/213k [00:00<?, ?B/s]

INFO:filelock:Lock 140657953051536 released on /root/.cache/huggingface/transformers/6508e60ab3c1200bffa26c95f4b58ac6b6d95fba4db1f195f632fa3cd7bc64cc.437aa611e89f6fc6675a049d2b5545390adbc617e7d655286421c191d2be2791.lock


Training time

In [10]:
model.train_model(train_data,eval_data = test_data,acc=accuracy_score)

INFO:simpletransformers.ner.ner_model: Converting to features started.


  0%|          | 0/47958 [00:00<?, ?it/s]

Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/1499 [00:00<?, ?it/s]

INFO:simpletransformers.ner.ner_model: Training of bert model complete. Saved to outputs/.


(1499, 0.19393818622443817)

In [None]:
Model evaluation

In [11]:

results, model_outputs, predictions = model.eval_model(test_data)

INFO:simpletransformers.ner.ner_model: Converting to features started.


  0%|          | 0/46711 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/1460 [00:00<?, ?it/s]

INFO:simpletransformers.ner.ner_model:{'eval_loss': 0.17091481149681423, 'precision': 0.8282020572072707, 'recall': 0.7578326456936565, 'f1_score': 0.7914562714603111}


Performance 

In [12]:
results

{'eval_loss': 0.17091481149681423,
 'f1_score': 0.7914562714603111,
 'precision': 0.8282020572072707,
 'recall': 0.7578326456936565}

In [13]:
model_outputs

[[[[-0.4658,
    6.906,
    1.027,
    -0.2878,
    1.526,
    1.9,
    0.5195,
    0.483,
    -1.375,
    -2.13,
    -1.87,
    -2.377,
    -0.2291,
    -2.725,
    -1.045,
    -1.974,
    -3.096]],
  [[9.34,
    1.022,
    -2.145,
    -1.492,
    0.674,
    -0.3503,
    0.581,
    0.1278,
    -1.646,
    -1.752,
    -1.622,
    -3.357,
    0.7354,
    -2.695,
    -2.059,
    -2.35,
    -2.812]],
  [[7.54,
    0.6094,
    -2.336,
    -2.117,
    1.856,
    -0.2944,
    1.384,
    0.672,
    -2.01,
    -1.799,
    -1.749,
    -3.129,
    1.807,
    -2.9,
    -2.445,
    -2.447,
    -2.887]],
  [[7.74,
    0.166,
    -2.598,
    -2.027,
    0.9062,
    -0.848,
    1.254,
    1.249,
    -1.806,
    -2.111,
    -1.783,
    -3.025,
    3.395,
    -2.63,
    -2.658,
    -2.543,
    -2.898]],
  [[8.78,
    0.951,
    -2.05,
    -2.09,
    -0.10474,
    -0.777,
    -0.08453,
    3.102,
    -1.703,
    -2.072,
    -2.178,
    -3.6,
    1.873,
    -2.479,
    -2.621,
    -2.67,
    -2.746]],
  

Predictions over the test data

In [14]:
predictions

[['B-geo', 'O', 'O', 'O', 'O', 'O', 'O'],
 ['O', 'O', 'O', 'O'],
 ['O', 'O', 'O', 'O', 'O', 'O', 'O'],
 ['O'],
 ['O', 'B-geo', 'B-geo', 'O', 'O', 'O', 'O'],
 ['I-per', 'O', 'O'],
 ['O', 'B-geo', 'O', 'O'],
 ['O', 'O', 'O', 'O', 'O', 'O'],
 ['O', 'O', 'O', 'O', 'O', 'O', 'O'],
 ['B-per',
  'O',
  'O',
  'O',
  'I-per',
  'O',
  'O',
  'B-geo',
  'O',
  'O',
  'O',
  'O',
  'B-geo',
  'O',
  'O',
  'O'],
 ['O', 'O', 'O', 'O', 'O', 'O'],
 ['O', 'O', 'O', 'B-geo', 'O'],
 ['B-geo', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O'],
 ['O', 'O', 'O'],
 ['O', 'B-per', 'O', 'O', 'I-per', 'O'],
 ['O', 'O', 'O'],
 ['O', 'O', 'O', 'O', 'O', 'O', 'O'],
 ['O', 'B-per'],
 ['B-per', 'O', 'O', 'B-tim', 'O'],
 ['O', 'O', 'O'],
 ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O'],
 ['O', 'B-geo', 'B-geo', 'B-per', 'O', 'O', 'O', 'O'],
 ['O', 'O', 'O', 'O', 'O', 'O', 'O'],
 ['O', 'O', 'O'],
 ['O', 'O', 'O', 'O', 'O'],
 ['O', 'O'],
 ['O', 'O', 'O', 'O'],
 ['O', 'B-geo', 'B-geo'],
 ['B-tim', 'O'],
 ['O', 'O', 'O', 'O'

In [15]:
prediction, model_output = model.predict(["I think you live in Sydney, Right?"])

INFO:simpletransformers.ner.ner_model: Converting to features started.


  0%|          | 0/1 [00:00<?, ?it/s]

Running Prediction:   0%|          | 0/1 [00:00<?, ?it/s]

Prediction over a sample text

In [16]:
prediction

[[{'I': 'O'},
  {'think': 'O'},
  {'you': 'O'},
  {'live': 'O'},
  {'in': 'O'},
  {'Sydney,': 'B-geo'},
  {'Right?': 'O'}]]