<a href="https://colab.research.google.com/github/dbhadore/Named-Entity-Recognition/blob/main/NER_bert_fine_tune_NERDA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine Tune BERT for NER

#### Use NERDA package which is based on pytorch and Huggingface transformer

In [1]:
!pip install NERDA

Collecting NERDA
  Downloading NERDA-0.9.5-py3-none-any.whl (23 kB)
Collecting pyconll
  Downloading pyconll-3.1.0-py3-none-any.whl (26 kB)
Collecting progressbar
  Downloading progressbar-2.5.tar.gz (10 kB)
Collecting transformers
  Downloading transformers-4.9.2-py3-none-any.whl (2.6 MB)
[K     |████████████████████████████████| 2.6 MB 5.5 MB/s 
Collecting sacremoses
  Downloading sacremoses-0.0.45-py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 36.6 MB/s 
[?25hCollecting pyyaml>=5.1
  Downloading PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636 kB)
[K     |████████████████████████████████| 636 kB 43.8 MB/s 
Collecting huggingface-hub==0.0.12
  Downloading huggingface_hub-0.0.12-py3-none-any.whl (37 kB)
Collecting tokenizers<0.11,>=0.10.1
  Downloading tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3 MB)
[K     |████████████████████████████████| 3.3 MB 39.0 MB/s 
Building wheels f

In [2]:
from NERDA.datasets import get_conll_data, download_conll_data 
download_conll_data()
training = get_conll_data('train')
validation = get_conll_data('valid')
test = get_conll_data('test')

Reading https://data.deepai.org/conll2003.zip


##### Explore data

The IOB format (inside, outside, beginning) is a common tagging format for tagging tokens in Named Entity Recognition.

* I- prefix before a tag indicates that the tag is inside a chunk. 
* O tag indicates that a token belongs to no chunk. 
* B- prefix before a tag indicates that the tag is the beginning of a chunk that immediately follows another chunk without O tags between them. 

Another similar format which is widely used is IOB2 format, which is the same as the IOB format except that the B- tag is used in the beginning of every chunk (i.e. all chunks start with the B- tag)

This data uses IOB2 format

In [3]:
print(training.keys())

print('Number of training sentences', len(training['sentences']))
print('Number of validation sentences', len(validation['sentences']))

# IOB tagging
print([', '.join(x) for x in training['sentences'][:1]])
print([', '.join(x) for x in training['tags'][:1]])

dict_keys(['sentences', 'tags'])
Number of training sentences 14039
Number of validation sentences 3250
['EU, rejects, German, call, to, boycott, British, lamb, .']
['B-ORG, O, B-MISC, O, O, O, B-MISC, O, O']


Entities

* Location
* Organization
* Person
* Miscellaneous

In [4]:
tags = list(set([x for sentence in training['tags'] for x in sentence]))
print(tags)

['I-MISC', 'I-PER', 'B-LOC', 'I-ORG', 'B-MISC', 'I-LOC', 'B-ORG', 'O', 'B-PER']


In [5]:
tags.remove('O')
tag_scheme = tags
print(tag_scheme)

['I-MISC', 'I-PER', 'B-LOC', 'I-ORG', 'B-MISC', 'I-LOC', 'B-ORG', 'B-PER']


In [6]:
bert = 'bert-base-uncased'

In [7]:
from NERDA.models import NERDA

model = NERDA(
    dataset_training = training,
    dataset_validation = validation,
    tag_scheme = tag_scheme, 
    tag_outside = 'O',
    transformer = bert,
    dropout = 0.2,
    hyperparameters = {'epochs' : 4,
                       'train_batch_size': 32,
                       'learning_rate': 1e-5
                       }
)

Device automatically set to: cuda


Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

Train Model

In [8]:
model.train()


 Epoch 1 / 4


100%|██████████| 439/439 [09:46<00:00,  1.34s/it]
100%|██████████| 407/407 [00:51<00:00,  7.89it/s]


Train Loss = 0.39443630453597983 Valid Loss = 0.12468820383728776

 Epoch 2 / 4


100%|██████████| 439/439 [09:45<00:00,  1.33s/it]
100%|██████████| 407/407 [00:51<00:00,  7.86it/s]


Train Loss = 0.11496112985718902 Valid Loss = 0.07797759772462573

 Epoch 3 / 4


100%|██████████| 439/439 [09:46<00:00,  1.34s/it]
100%|██████████| 407/407 [00:51<00:00,  7.90it/s]


Train Loss = 0.07649000019687896 Valid Loss = 0.06876358117494064

 Epoch 4 / 4


100%|██████████| 439/439 [09:46<00:00,  1.34s/it]
100%|██████████| 407/407 [00:51<00:00,  7.92it/s]

Train Loss = 0.06167750797607853 Valid Loss = 0.0677250954096551





'Model trained successfully'

Evaluate

In [9]:
model.evaluate_performance(test)



Unnamed: 0,Level,F1-Score
0,I-MISC,0.622601
1,I-PER,0.982305
2,B-LOC,0.90979
3,I-ORG,0.859438
4,B-MISC,0.766046
5,I-LOC,0.82197
6,B-ORG,0.867514
7,B-PER,0.966313
0,AVG_MICRO,0.893469
0,AVG_MACRO,0.849497


Predict

In [12]:
import nltk
nltk.download('punkt')
model.predict_text('Dhiman is working on named entity in Bangalore')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


([['Dhiman', 'is', 'working', 'on', 'named', 'entity', 'in', 'Bangalore']],
 [['B-PER', 'O', 'O', 'O', 'O', 'O', 'O', 'B-LOC']])

In [13]:
model.predict_text('Test cricket means everything to Virat Kohli, says Kevin Pietersen')

([['Test',
   'cricket',
   'means',
   'everything',
   'to',
   'Virat',
   'Kohli',
   ',',
   'says',
   'Kevin',
   'Pietersen']],
 [['O', 'O', 'O', 'O', 'O', 'B-PER', 'I-PER', 'O', 'O', 'B-PER', 'I-PER']])

In [14]:
model.predict_text('Alexa gets the voice of Amitabh Bachchan')

([['Alexa', 'gets', 'the', 'voice', 'of', 'Amitabh', 'Bachchan']],
 [['B-PER', 'O', 'O', 'O', 'O', 'B-PER', 'I-PER']])