Assignment-4 : Build a Named Entity Recognition (NER) system for extracting entities from real-world text
such as news articles or social media data. And measure its accuracy, precision, recall, and F1-
score.

# **Named Entity Recognition (NER) System with Evaluation**

**Step 1: Install & Import Libraries**

In [1]:
!pip install spacy
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m105.3 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [2]:
import spacy
import pandas as pd
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score

**Step 2: Load Pretrained NER Model**

In [3]:
nlp = spacy.load("en_core_web_sm")

**Step 3: Sample Real-World Text (News + Social Media)**

In [4]:
texts = [
    "Apple Inc. was founded by Steve Jobs in California.",
    "Elon Musk announced new updates at Tesla headquarters.",
    "India won the cricket match held in Mumbai.",
    "Google is launching new AI tools in New York."
]

**Step 4: Perform Named Entity Recognition**

In [5]:
for text in texts:
    doc = nlp(text)
    print("\nText:", text)
    for ent in doc.ents:
        print(ent.text, "→", ent.label_)


Text: Apple Inc. was founded by Steve Jobs in California.
Apple Inc. → ORG
Steve Jobs → PERSON
California → GPE

Text: Elon Musk announced new updates at Tesla headquarters.
Elon Musk → PERSON
Tesla → ORG

Text: India won the cricket match held in Mumbai.
India → GPE
Mumbai → GPE

Text: Google is launching new AI tools in New York.
Google → ORG
AI → GPE
New York → GPE


**Step 5: Create Ground Truth Labels**

In [6]:
true_entities = [
    [("Apple Inc.", "ORG"), ("Steve Jobs", "PERSON"), ("California", "GPE")],
    [("Elon Musk", "PERSON"), ("Tesla", "ORG")],
    [("India", "GPE"), ("Mumbai", "GPE")],
    [("Google", "ORG"), ("New York", "GPE")]
]

**Step 6: Convert Predictions into Comparable Format**

In [7]:
pred_entities = []

for text in texts:
    doc = nlp(text)
    ents = [(ent.text, ent.label_) for ent in doc.ents]
    pred_entities.append(ents)

pred_entities

[[('Apple Inc.', 'ORG'), ('Steve Jobs', 'PERSON'), ('California', 'GPE')],
 [('Elon Musk', 'PERSON'), ('Tesla', 'ORG')],
 [('India', 'GPE'), ('Mumbai', 'GPE')],
 [('Google', 'ORG'), ('AI', 'GPE'), ('New York', 'GPE')]]

**Step 7: Prepare Data for Evaluation Metrics**

In [8]:
y_true = []
y_pred = []

for true, pred in zip(true_entities, pred_entities):
    true_set = set(true)
    pred_set = set(pred)

    for entity in true_set:
        y_true.append(1)
        y_pred.append(1 if entity in pred_set else 0)

    for entity in pred_set - true_set:
        y_true.append(0)
        y_pred.append(1)

**Step 8: Calculate Evaluation Metrics**

In [9]:
accuracy = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred)

print("Accuracy :", accuracy)
print("Precision:", precision)
print("Recall   :", recall)
print("F1-Score :", f1)

Accuracy : 0.9
Precision: 0.9
Recall   : 1.0
F1-Score : 0.9473684210526315
