1. NER with spaCy

First, let's implement NER using spaCy, a popular library for NLP tasks.

In [1]:
import spacy

# Load the small English model in spaCy
nlp = spacy.load("en_core_web_sm")


 Implement NER with spaCy

In [2]:
# Sample text
text = "Apple is looking at buying U.K. startup for $1 billion"

# Process the text using the loaded model
doc = nlp(text)

# Print named entities found in the text
print("Named Entities, Phrases, and Concepts:")
for entity in doc.ents:
    print(f"{entity.text} ({entity.label_})")


Named Entities, Phrases, and Concepts:
Apple (ORG)
U.K. (GPE)
$1 billion (MONEY)


Explanation:

nlp is the spaCy model loaded with pre-trained NER capabilities.

doc = nlp(text) processes the input text and returns a Doc object, which contains the entities identified in the text.

doc.ents gives a list of the named entities found, and we print both the entity text and its type (like PERSON, ORG, MONEY, etc.).

**NER with Hugging Face's Transformers**

Now, let's use Hugging Face's transformers library with a pre-trained model like BERT or RoBERTa for NER.

In [3]:
from transformers import pipeline

# Load pre-trained NER model from Hugging Face
ner_model = pipeline("ner", model="dbmdz/bert-large-cased-finetuned-conll03-english")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly iden

#Implement NER with Hugging Face

In [4]:
# Sample text
text = "Apple is looking at buying U.K. startup for $1 billion"

# Perform NER on the input text
entities = ner_model(text)

# Print the named entities and their respective labels
print("Named Entities and Labels:")
for entity in entities:
    print(f"{entity['word']} ({entity['entity']}) - Confidence: {entity['score']}")


Named Entities and Labels:
Apple (I-ORG) - Confidence: 0.9990183115005493
U (I-LOC) - Confidence: 0.9996721744537354
K (I-LOC) - Confidence: 0.997936487197876


Explanation:

pipeline("ner") loads a pre-trained Named Entity Recognition model from Hugging Face.

ner_model(text) runs NER on the input text and returns a list of dictionaries with word, entity, and score (confidence of detection).

The output includes the word (entity), entity label (e.g., ORG, LOC, MISC), and confidence score.



3. Comparison of spaCy and Hugging Face NER Outputs

While both spaCy and Hugging Face's transformers model can perform NER, the differences in their outputs can be noticeable.

spaCy provides a simple interface to work with and tends to use predefined models. Hugging Face, on the other hand, can offer a broader variety of fine-tuned models, which can often be more accurate and customized for specific tasks.

In [5]:
# spaCy output
print("spaCy NER Output:")
for entity in doc.ents:
    print(f"{entity.text} ({entity.label_})")

# Hugging Face output
print("\nHugging Face NER Output:")
for entity in entities:
    print(f"{entity['word']} ({entity['entity']}) - Confidence: {entity['score']}")


spaCy NER Output:
Apple (ORG)
U.K. (GPE)
$1 billion (MONEY)

Hugging Face NER Output:
Apple (I-ORG) - Confidence: 0.9990183115005493
U (I-LOC) - Confidence: 0.9996721744537354
K (I-LOC) - Confidence: 0.997936487197876


Key Differences:

spaCy gives more concise output with labels such as "ORG", "PERSON", and "GPE" (Geopolitical Entity).

Hugging Face's pipeline gives more detailed output, including confidence scores and more granular entity classification.

Conclusion

Both spaCy and Hugging Face are excellent tools for Named Entity Recognition, and each has its strengths.

spaCy is fast and efficient for common NLP tasks, while Hugging Face’s models can leverage more complex and fine-tuned pre-trained models like BERT, RoBERTa, and others.