## **Transformer-Based Models (BERT, Flair)**

Transformer models like BERT and Flair use attention mechanisms to capture relationships in text for high-quality NER.

**Imports**

In [1]:
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline




**Load Pre-trained BERT for NER**

In [2]:
tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER")
model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER")


tokenizer_config.json:   0%|          | 0.00/59.0 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


config.json:   0%|          | 0.00/829 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/2.00 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/433M [00:00<?, ?B/s]

Some weights of the model checkpoint at dslim/bert-base-NER were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


**NER Pipeline**

In [3]:
ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer)

Device set to use cpu


**Perform NER**

In [4]:
text = "John lives in New York."
entities = ner_pipeline(text)
print(entities)

[{'entity': 'B-PER', 'score': 0.998917, 'index': 1, 'word': 'John', 'start': 0, 'end': 4}, {'entity': 'B-LOC', 'score': 0.99937576, 'index': 4, 'word': 'New', 'start': 14, 'end': 17}, {'entity': 'I-LOC', 'score': 0.999428, 'index': 5, 'word': 'York', 'start': 18, 'end': 22}]


## **Boilerplate Code with Flair:**

**Imports**

In [7]:
!pip install flair
!pip install transformers  # Flair often uses transformers, so install it too (if you haven't already)

from flair.models import SequenceTagger
from flair.data import Sentence



Collecting flair
  Downloading flair-0.15.0-py3-none-any.whl.metadata (12 kB)
Collecting conllu<5.0.0,>=4.0 (from flair)
  Downloading conllu-4.5.3-py2.py3-none-any.whl.metadata (19 kB)
Collecting ftfy>=6.1.0 (from flair)
  Downloading ftfy-6.3.1-py3-none-any.whl.metadata (7.3 kB)
Collecting gdown>=4.4.0 (from flair)
  Downloading gdown-5.2.0-py3-none-any.whl.metadata (5.8 kB)
Collecting langdetect>=1.0.9 (from flair)
  Downloading langdetect-1.0.9.tar.gz (981 kB)
     ---------------------------------------- 0.0/981.5 kB ? eta -:--:--
     ---------- ----------------------------- 262.1/981.5 kB ? eta -:--:--
     -------------------------------------- 981.5/981.5 kB 5.1 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting lxml>=4.8.0 (from flair)
  Downloading lxml-5.3.0-cp312-cp312-win_amd64.whl.metadata (3.9 kB)
Collecting more-itertools>=8.13.0 (from flair)
  Downloading more_itertools-10.6.0-py3-none-any.


[notice] A new release of pip is available: 24.2 -> 25.0
[notice] To update, run: python.exe -m pip install --upgrade pip

[notice] A new release of pip is available: 24.2 -> 25.0
[notice] To update, run: python.exe -m pip install --upgrade pip
ERROR: Invalid requirement: '#': Expected package name at the start of dependency specifier
    #
    ^


**Load Pre-trained Flair Model**

In [None]:
tagger = SequenceTagger.load("ner")

**Predict Entities**

In [None]:
sentence = Sentence("John lives in New York.")
tagger.predict(sentence)

print(sentence.to_tagged_string())