# Named entity recognition

Named entity recognition refers to the problem of extracting short fragments of texts and classifying them. Today, we will learn a new framework called FLAIR (we discussed this framework in our lecture). 

First, we will try to use a pretrained model using the FLAIR framework.

**Assignment 1**
Please visit the FLAIR website and read the documentation of FLAIR related to tagging entities: https://flairnlp.github.io/docs/tutorial-basics/tagging-entities . Use the code provided there to tag some example input.

---
**Done by:** Sofya Aksenyuk, 150284

---

In [1]:
!pip install flair

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting flair
  Downloading flair-0.12.2-py3-none-any.whl (373 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m373.1/373.1 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting bpemb>=0.3.2
  Downloading bpemb-0.3.4-py3-none-any.whl (19 kB)
Collecting mpld3==0.3
  Downloading mpld3-0.3.tar.gz (788 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m788.5/788.5 kB[0m [31m28.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pytorch-revgrad
  Downloading pytorch_revgrad-0.2.0-py3-none-any.whl (4.6 kB)
Collecting huggingface-hub>=0.10.0
  Downloading huggingface_hub-0.13.4-py3-none-any.whl (200 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m200.1/200.1 kB[0m [31m17.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting janome
  Downloading Janome-0.4.2-py2.py3-none-any.whl (19.7 M

In [2]:
## Here you should paste the code loading the NER model and tagging a given text

from flair.nn import Classifier
from flair.data import Sentence

# load the model
tagger = Classifier.load('ner')

# make a sentence
sentence = Sentence('George Washington went to Washington.')

# predict NER tags
tagger.predict(sentence)

# print sentence with predicted tags
print(sentence)

Downloading pytorch_model.bin:   0%|          | 0.00/432M [00:00<?, ?B/s]

2023-04-11 15:18:22,236 SequenceTagger predicts: Dictionary with 20 tags: <unk>, O, S-ORG, S-MISC, B-PER, E-PER, S-LOC, B-ORG, E-ORG, I-PER, S-PER, B-MISC, I-MISC, E-MISC, I-ORG, B-LOC, E-LOC, I-LOC, <START>, <STOP>
Sentence[6]: "George Washington went to Washington." → ["George Washington"/PER, "Washington"/LOC]


**(Optional)** Of course, most often, we would like to train our own tagger. The description providing details on this process can be found in a great blogpost (if you see a paywall you can open the website in the incognito mode). 

https://medium.com/thecyphy/training-custom-ner-model-using-flair-df1f9ea9c762

However, training a custom FLAIR model is not required in this labs.


**Assignment 2** Named Entity Recognition models can be also prepared using BERT and HuggingFace transformers library!

To see how we can use transformers to solve a NER problem, we will use the notebook provided by Niels Rogge from HuggingFace. https://github.com/NielsRogge/Transformers-Tutorials . The notebook we will use is uploaded to eKursy along this "main" notebook. Please follow the instructions in this other notebook and copy-and-paste appropriate cell output as described below.


One of the code cells provided in this notebook is the following one:
```
from seqeval.metrics import classification_report

print(classification_report(labels, predictions))
```
If you manage to follow this tutorial, this pair of lines will produce evaluation metrics. Copy-and-paste them into the cell below.

              precision    recall  f1-score   support

         geo       0.54      0.71      0.62       641
         gpe       0.61      0.70      0.65       305
         org       0.47      0.43      0.44       456
         per       0.63      0.50      0.56       349
         tim       0.62      0.67      0.65       286
         
         micro avg       0.56      0.61      0.58      2037
         macro avg       0.58      0.60      0.59      2037
         weighted avg       0.56      0.61      0.58      2037