# Day 33 – Named Entity Recognition using spaCy
### Extract Entities from Complaints / Tickets / Documents

Today you'll use spaCy's NER system to automate entity extraction from text.

#### Goals:
- Load spaCy NER model
- Extract entities such as PERSON, ORG, GPE, DATE, MONEY, etc.
- Visualize entities using spaCy displacy
- Build a reusable entity extraction function
- Generate a structured, machine-readable entity dictionary

This is key for document automation and AI-driven grievance analysis.

In [None]:
import spacy
from spacy import displacy
import pandas as pd
import json

nlp = spacy.load("en_core_web_sm")
print("spaCy model loaded successfully.")

## 1. Synthetic Document / Complaint Dataset
We simulate real-world grievance/ticket texts.

In [None]:
documents = [
    "I, John Doe, want to report that my laptop was stolen near MG Road on 5th January 2024.",
    "Payment of $250 was deducted twice from my HDFC account. Please help resolve this immediately.",
    "The service center at Hyderabad refused to repair my phone even though it is under warranty.",
    "My name is Ramesh Kumar. I filed a complaint last week but haven't received any update.",
    "Uber charged me ₹560 for a ride I never booked on 14th Feb 2023.",
    "I work at Infosys and need a salary slip for my account verification process.",
    "There is a fraud transaction of Rs 12,000 on my SBI credit card yesterday.",
    "My passport application ID AX92011 has been pending for 3 weeks."
]

df = pd.DataFrame({"document": documents})
df

## 2. Run NER on Each Document
Let's inspect extracted entities for one example.

In [None]:
sample_doc = nlp(documents[0])
for ent in sample_doc.ents:
    print(ent.text, "--", ent.label_)

## 3. Visualize Entities with spaCy Displacy

In [None]:
displacy.render(sample_doc, style="ent", jupyter=True)

## 4. Reusable Entity Extraction Function
Returns a dictionary grouped by entity label.

In [None]:
def extract_entities(text):
    doc = nlp(text)
    entity_dict = {}
    for ent in doc.ents:
        if ent.label_ not in entity_dict:
            entity_dict[ent.label_] = []
        entity_dict[ent.label_].append(ent.text)
    return entity_dict

# Test on sample
extract_entities(documents[1])

## 5. Apply to All Documents and Create Structured Dataset

In [None]:
df['entities'] = df['document'].apply(extract_entities)
df

## 6. Export as JSON (Useful for Automation Pipelines)

In [None]:
json_output = df.to_json(orient='records', indent=2)
print(json_output)

## 7. Summary
- Loaded spaCy NER model
- Extracted PERSON, ORG, GPE, MONEY, DATE, CARDINAL, etc.
- Visualized NER outputs
- Created reusable entity extraction function
- Built structured JSON output for automation systems

**Deliverable:** `day33_spacy_ner_extraction.ipynb`