
# **Named Entity Recognition (NER)**

---

## **1. Theory**

### **What is NER?**

* **Definition**: Named Entity Recognition (NER) is the process of identifying and classifying **entities** in text into predefined categories.

* Common categories:

  * **PERSON** → People (e.g., “Albert Einstein”)
  * **ORG** → Organizations (e.g., “Google”)
  * **GPE** → Geopolitical Entities (countries, cities, states)
  * **DATE / TIME** → Temporal expressions
  * **MONEY / PERCENT / CARDINAL / PRODUCT / EVENT**, etc.

* Example:
  Text: *“Apple was founded by Steve Jobs in California in 1976.”*
  NER Output:

  ```
  Apple → ORG
  Steve Jobs → PERSON
  California → GPE
  1976 → DATE
  ```

---

### **Why is NER important?**

* **Information extraction**: Identify key facts from unstructured text.
* **Knowledge graphs**: Build connections between entities.
* **Chatbots and question answering**: Recognize entities to respond intelligently.
* **Search and retrieval**: Improve indexing and relevance.

---

## **2. Examples**

### **NER with SpaCy**

```python
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
text = "Apple was founded by Steve Jobs in California in 1976."

doc = nlp(text)

# Print entities with labels
for ent in doc.ents:
    print(ent.text, "→", ent.label_)

# Visualization
displacy.render(doc, style="ent", jupyter=True)
```

**Output:**

```
Apple → ORG
Steve Jobs → PERSON
California → GPE
1976 → DATE
```

* Visualization highlights entities with different colors for each label.

---

### **NER with NLTK (Using Pre-trained Chunking)**

```python
import nltk
from nltk import word_tokenize, pos_tag, ne_chunk

nltk.download("punkt")
nltk.download("maxent_ne_chunker")
nltk.download("words")
nltk.download("averaged_perceptron_tagger")

sentence = "Apple was founded by Steve Jobs in California in 1976."
tokens = word_tokenize(sentence)
pos_tags = pos_tag(tokens)
ner_tree = ne_chunk(pos_tags)

print(ner_tree)
```

**Output (Tree Structure):**

```
(S
  (ORGANIZATION Apple/NNP)
  was/VBD
  founded/VBN
  by/IN
  (PERSON Steve/NNP Jobs/NNP)
  in/IN
  (GPE California/NNP)
  in/IN
  1976/CD
  ./.)
```

> Note: NLTK’s NER is rule/statistics-based and less accurate than SpaCy for modern applications.

---

## **3. Interview-Style Q&A**

### **Basic Level**

**Q1. What is Named Entity Recognition (NER)?**
*A: NER is the task of identifying and classifying entities in text into predefined categories like PERSON, ORG, GPE, DATE, etc.*

**Q2. Give an example of NER.**
*A: In “Apple was founded by Steve Jobs in California in 1976”: Apple → ORG, Steve Jobs → PERSON, California → GPE, 1976 → DATE.*

---

### **Intermediate Level**

**Q3. How does SpaCy perform NER?**
*A: SpaCy uses pre-trained statistical models based on neural networks. It predicts entities with labels for each token span and is optimized for speed and production use.*

**Q4. Difference between NER in NLTK and SpaCy?**
*A: NLTK relies on rule-based and classical statistical chunking models, less accurate and slower. SpaCy uses neural network models, faster and more accurate, with visualizations built-in.*

---

### **Advanced Level**

**Q5. How do Transformer-based models handle NER?**
*A: Transformer models (like BERT) treat NER as a **token classification task**, leveraging contextual embeddings to identify entity spans more accurately than rule-based or classical models.*

**Q6. What are some challenges in NER?**
*A: Ambiguity (“Apple” as fruit vs company), overlapping entities, domain adaptation (legal, medical), multilingual texts, and nested entities.*

---

## **4. Visualization in SpaCy**

* `displacy.render(doc, style="ent")` highlights entities with **different colors**:

  * ORG → Blue
  * PERSON → Green
  * GPE → Orange
  * DATE → Red
