# Named Entity Recognition (NER) Tool
This notebook demonstrates how to use a pre-trained BERT model for Named Entity Recognition (NER) using the Hugging Face Transformers library.

In [1]:
!pip install transformers torch pandas

Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch)
  Using cached nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.2.106 (from torch)
  Using cached nvidia_curand_cu12-10.3.2.106-py3-

## Loading the Model and Tokenizer
We'll use the `dbmdz/bert-large-cased-finetuned-conll03-english` model, which is a BERT model fine-tuned for NER tasks.

In [2]:
import torch
from transformers import BertTokenizer, BertForTokenClassification
from transformers import pipeline

# Load pre-trained model and tokenizer
model_name = "dbmdz/bert-large-cased-finetuned-conll03-english"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForTokenClassification.from_pretrained(model_name)

# Create NER pipeline
nlp = pipeline("ner", model=model, tokenizer=tokenizer)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


## Extracting Entities
The following function uses the NER pipeline to extract named entities from a given text.

In [3]:
def extract_entities(text):
    """
    Extract named entities from text using a pre-trained BERT model.

    Args:
    text (str): Input text to extract entities from.

    Returns:
    List[Dict]: List of extracted entities with their labels.
    """
    ner_results = nlp(text)
    entities = []
    for entity in ner_results:
        entities.append({
            "word": entity['word'],
            "entity": entity['entity'],
            "start": entity['start'],
            "end": entity['end']
        })
    return entities

## Example Usage
Let's test the function with a sample text.

In [4]:
sample_text = """
John Doe is a software engineer at Google. He lives in New York City and graduated from MIT in 2015.
"""
entities = extract_entities(sample_text)
for entity in entities:
    print(f"Entity: {entity['word']}, Label: {entity['entity']}, Start: {entity['start']}, End: {entity['end']}")

Entity: John, Label: I-PER, Start: None, End: None
Entity: Do, Label: I-PER, Start: None, End: None
Entity: ##e, Label: I-PER, Start: None, End: None
Entity: Google, Label: I-ORG, Start: None, End: None
Entity: New, Label: I-LOC, Start: None, End: None
Entity: York, Label: I-LOC, Start: None, End: None
Entity: City, Label: I-LOC, Start: None, End: None
Entity: MIT, Label: I-ORG, Start: None, End: None


## Conclusion
This notebook demonstrates how to use a pre-trained BERT model for Named Entity Recognition (NER) to extract named entities from text. The model used is `dbmdz/bert-large-cased-finetuned-conll03-english`, which is fine-tuned on the CoNLL-2003 dataset.