### **Named Entity Recognition with spaCy**
**Goal:** Extract named entities (like PERSON, ORG, DATE, GPE) from text using spaCy’s built in NER model.

**Install and Load spaCy**

In [5]:
import spacy
# Load spaCy's small English model
nlp = spacy.load("en_core_web_sm")

**Define Input Text**

You can use any article, resume snippet, or news report.

In [6]:
text = """
Apple Inc. is planning to launch the iPhone 15 in September 2023.
Tim Cook, the CEO of Apple, announced this during the annual product keynote.
The company expects a 10% increase in global sales, especially in markets like India and China,
where demand for premium smartphones is rising.
"""

**Apply spaCy NER Pipeline**

In [8]:
# Process text
doc = nlp(text)
# Display named entities
for ent in doc.ents:
  print(f"{ent.text:<30} → {ent.label_} ({spacy.explain(ent.label_)})")

Apple Inc.                     → ORG (Companies, agencies, institutions, etc.)
September 2023                 → DATE (Absolute or relative dates or periods)
Tim Cook                       → PERSON (People, including fictional)
Apple                          → ORG (Companies, agencies, institutions, etc.)
annual                         → DATE (Absolute or relative dates or periods)
10%                            → PERCENT (Percentage, including "%")
India                          → GPE (Countries, cities, states)
China                          → GPE (Countries, cities, states)


**Visualize Named Entities**

In [9]:
from spacy import displacy
# Render entities in Jupyter or browser

displacy.render(doc, style="ent", jupyter=True)

**Group Entities by Type**

In [10]:
from collections import defaultdict

entities_by_type = defaultdict(list)
for ent in doc.ents:
  entities_by_type[ent.label_].append(ent.text)

for label, ents in entities_by_type.items():
  print(f"{label} ({spacy.explain(label)}): {set(ents)}\n")

ORG (Companies, agencies, institutions, etc.): {'Apple', 'Apple Inc.'}

DATE (Absolute or relative dates or periods): {'September 2023', 'annual'}

PERSON (People, including fictional): {'Tim Cook'}

PERCENT (Percentage, including "%"): {'10%'}

GPE (Countries, cities, states): {'China', 'India'}



### 📌 **Summary**

In this project, we built a Named Entity Recognition (NER) system using **spaCy’s pretrained NLP model (`en_core_web_sm`)** to automatically identify and classify entities from unstructured text.

The pipeline was designed to detect key entities such as:

- **ORG** – Organizations and institutions  
- **PERSON** – Names of individuals  
- **DATE** – Temporal expressions  
- **GPE** – Geopolitical entities (countries, cities)  
- **PERCENT** – Percentage values

The extracted entities were grouped by type and optionally visualized using `displacy`, spaCy’s built-in rendering tool. We also structured the output to display unique entities per category.

---

### 🔍 Key Takeaways:
- Used `spaCy` to build a lightweight, rule-free NER system.
- Extracted meaningful insights from plain text efficiently.
- Demonstrated how NER can support downstream tasks like information extraction, content analysis, and knowledge graph construction.

This project showcases the practical utility of pretrained NLP pipelines in processing real-world language data with minimal setup.
