# Named Entity Recognition (NER) Tool
This notebook demonstrates how to use a pre-trained BERT model for Named Entity Recognition (NER) using the Hugging Face Transformers library.

In [None]:
!pip install transformers torch pandas

Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch)
  Using cached nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.2.106 (from torch)
  Using cached nvidia_curand_cu12-10.3.2.106-py3-

## Loading the Model and Tokenizer
We'll use the `"dslim/bert-base-NER"` model, which is a BERT model fine-tuned for NER tasks.

In [27]:
import torch
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline


# Load the pre-trained NER model and tokenizer
model_name = "dslim/bert-base-NER"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)




Some weights of the model checkpoint at dslim/bert-base-NER were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [None]:

# Create the NER pipeline
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")


## Extracting Entities
The following function uses the NER pipeline to extract named entities from a given text.

In [None]:

def extract_entities(text):
    """
    Extracts named entities from the given text.

    Args:
    text (str): Input text to extract entities from.

    Returns:
    List[Dict]: List of extracted entities with their labels.
    """
    ner_results = nlp(text)
    entities = []
    for entity in ner_results:
        entities.append({
            "word": entity['word'],
            "entity": entity['entity_group'],
            "start": entity['start'],
            "end": entity['end']
        })
    return entities

## Example Usage
Let's test the function with a sample text.

In [29]:

# Sample usage
sample_text = """
John Doe is a software engineer at Google. He lives in New York City and graduated from MIT.
"""
entities = extract_entities(sample_text)
for entity in entities:
    print(f"Entity: {entity['word']}, Label: {entity['entity']}, Start: {entity['start']}, End: {entity['end']}")


Entity: John Doe, Label: PER, Start: 1, End: 9
Entity: Google, Label: ORG, Start: 36, End: 42
Entity: New York City, Label: LOC, Start: 56, End: 69
Entity: MIT, Label: ORG, Start: 89, End: 92


## Conclusion
This notebook demonstrates how to use a pre-trained BERT model for Named Entity Recognition (NER) to extract named entities from text.