# üè¢ Named Entity Recognition (NER)

## üéØ What is NER?

**Named Entity Recognition (NER)** is the process of identifying and classifying named entities in text into predefined categories such as persons, organizations, locations, dates, and more.

### Quick Example:
```
Text: "Apple Inc. was founded by Steve Jobs in California"

Entities:
- Apple Inc. ‚Üí ORGANIZATION
- Steve Jobs ‚Üí PERSON
- California ‚Üí LOCATION (GPE - Geo-Political Entity)
```

---

## üîë Common Entity Types

| Entity | Label | Examples |
|--------|-------|----------|
| **Person** | PERSON | Steve Jobs, Elon Musk, Gandhi |
| **Organization** | ORGANIZATION | Apple, Google, UN |
| **Location** | GPE (Geo-Political) | India, New York, Paris |
| **Location** | LOCATION | Mount Everest, Pacific Ocean |
| **Date** | DATE | January 2024, Monday |
| **Time** | TIME | 3 PM, morning |
| **Money** | MONEY | $100, ‚Çπ500 |
| **Percent** | PERCENT | 25%, half |
| **Facility** | FACILITY | Airport, Stadium |

---

## üí° Why NER Matters

| Use Case | How NER Helps |
|----------|---------------|
| **Information Extraction** | Extract key facts from documents |
| **Question Answering** | Find "who", "where", "when" answers |
| **Content Recommendation** | Identify topics and interests |
| **Customer Support** | Route queries based on product/person mentioned |
| **Resume Parsing** | Extract names, companies, skills |
| **News Analysis** | Track mentions of people/organizations |

---

## üìù Sample Sentence

Let's identify entities in a sentence about the Eiffel Tower:

In [1]:
sentence="The Eiffel Tower was built from 1887 to 1889 by Gustave Eiffel, whose company specialized in building metal frameworks and structures."

**Expected Entities:**
- Eiffel Tower ‚Üí FACILITY/ORGANIZATION
- 1887 to 1889 ‚Üí DATE
- Gustave Eiffel ‚Üí PERSON

---

## üì¶ Step 1: Tokenization & POS Tagging

NER requires POS tags as input. The process is:
1. Tokenize text into words
2. Apply POS tagging
3. Apply NER on POS-tagged words

In [5]:
import nltk
from nltk.tokenize import word_tokenize

words = word_tokenize(sentence)
tag_element = nltk.pos_tag(words)
tag_element

[('The', 'DT'),
 ('Eiffel', 'NNP'),
 ('Tower', 'NNP'),
 ('was', 'VBD'),
 ('built', 'VBN'),
 ('from', 'IN'),
 ('1887', 'CD'),
 ('to', 'TO'),
 ('1889', 'CD'),
 ('by', 'IN'),
 ('Gustave', 'NNP'),
 ('Eiffel', 'NNP'),
 (',', ','),
 ('whose', 'WP$'),
 ('company', 'NN'),
 ('specialized', 'VBD'),
 ('in', 'IN'),
 ('building', 'NN'),
 ('metal', 'NN'),
 ('frameworks', 'NNS'),
 ('and', 'CC'),
 ('structures', 'NNS'),
 ('.', '.')]

**Output:** List of tuples with each word and its POS tag
- This is the required input format for NER

---

## üì• Step 2: Download NER Model

In [9]:
nltk.download('maxent_ne_chunker_tab')
nltk.download('words')

[nltk_data] Downloading package maxent_ne_chunker_tab to
[nltk_data]     C:\Users\asus\AppData\Roaming\nltk_data...
[nltk_data]   Package maxent_ne_chunker_tab is already up-to-date!
[nltk_data] Downloading package words to
[nltk_data]     C:\Users\asus\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\words.zip.


True

**Required:**
- `maxent_ne_chunker_tab` - NER model
- `words` - Corpus of English words

---

## üîß Step 3: Apply NER

Using `nltk.ne_chunk()` to identify named entities:

In [13]:
ner_tree = nltk.ne_chunk(tag_element)

print(ner_tree)

(S
  The/DT
  (ORGANIZATION Eiffel/NNP Tower/NNP)
  was/VBD
  built/VBN
  from/IN
  1887/CD
  to/TO
  1889/CD
  by/IN
  (PERSON Gustave/NNP Eiffel/NNP)
  ,/,
  whose/WP$
  company/NN
  specialized/VBD
  in/IN
  building/NN
  metal/NN
  frameworks/NNS
  and/CC
  structures/NNS
  ./.)


**Output Format:** Tree structure
- Non-entities: Simple tuples `(word, POS)`
- Entities: Subtrees `(ENTITY_TYPE (word, POS)...)`

Example output:
```
(S
  (GPE Eiffel/NNP Tower/NNP)
  was/VBD
  built/VBN
  from/IN
  (DATE 1887/CD)
  ...
  (PERSON Gustave/NNP Eiffel/NNP)
)
```

---

## üé® Step 4: Visualize Entity Tree

In [15]:
ner_tree.draw()

---

## üìä NLTK NER Limitations

### ‚ö†Ô∏è Challenges:
- **Limited entity types** (mainly PERSON, ORGANIZATION, GPE)
- **Context-dependent accuracy** (needs clear proper nouns)
- **English-focused** (better support needed for other languages)
- **Not domain-specific** (generic model)dge Graph Building

### Real-World Applications:
- üì∞ **News monitoring** - Track companies/people mentions
- üîç **Search engines** - Understand query intent
- üìß **Email filtering** - Identify contacts/organizations
- üìÑ **Document processing** - Extract key information
- ü§ñ **Chatbots** - Understand user references
- üìä **Market intelligence** - Monitor brand mentions

---

<div align="center">

**üéâ NER Complete!**

*Extract meaningful entities from unstructured text*

</div>