### Named Entity Recognition

Named Entity Recognition (NER) is the process of identifying and classifying proper nouns in unstructured text into predefined categories such as:

Person names

Organizations

Locations

Dates

Monetary values

Time expressions

Percentages, etc.



### 🧠 2. Why NER Is Important in NLP
| Application                  | Description                                      |
| ---------------------------- | ------------------------------------------------ |
| ✅ **Information Extraction** | Extract structured data from raw text            |
| ✅ **Search & Indexing**      | Power semantic search and recommendation systems |
| ✅ **Question Answering**     | Understand "who", "where", "when", etc.          |
| ✅ **Text Summarization**     | Highlight important named entities               |
| ✅ **Sentiment Analysis**     | Associate emotions with specific entities        |

### 🧾 3. Common Named Entity Categories (NLTK)
| Entity Type    | Meaning               | Example               |
| -------------- | --------------------- | --------------------- |
| `PERSON`       | Individual’s name     | "Barack Obama"        |
| `ORGANIZATION` | Group or company      | "Microsoft", "UNICEF" |
| `GPE`          | Geo-political entity  | "India", "New York"   |
| `LOCATION`     | Geographical location | "Mount Everest"       |
| `DATE`         | Date references       | "24th July", "2025"   |
| `TIME`         | Time expressions      | "3 PM", "midnight"    |
| `MONEY`        | Currency amounts      | "\$100", "₹2000"      |
| `PERCENT`      | Percentage values     | "85%"                 |


In [56]:
## 🧰 4. NLTK NER Implementation
import nltk
from nltk import word_tokenize, pos_tag, ne_chunk

nltk.download('punkt')
nltk.download('maxent_ne_chunker')
nltk.download('words')
nltk.download('averaged_perceptron_tagger')

text = "Apple Inc. was founded by Steve Jobs in California in 1976 with a funding of $1,000."

# Tokenization → POS Tagging → Named Entity Chunking
tokens = word_tokenize(text)
tags = pos_tag(tokens)
tree = ne_chunk(tags)

print(tree)


[nltk_data] Downloading package punkt to C:\Users\Suraj
[nltk_data]     Khodade\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package maxent_ne_chunker to C:\Users\Suraj
[nltk_data]     Khodade\AppData\Roaming\nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package words to C:\Users\Suraj
[nltk_data]     Khodade\AppData\Roaming\nltk_data...
[nltk_data]   Package words is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\Suraj Khodade\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


(S
  (PERSON Apple/NNP)
  (ORGANIZATION Inc./NNP)
  was/VBD
  founded/VBN
  by/IN
  (PERSON Steve/NNP Jobs/NNP)
  in/IN
  (GPE California/NNP)
  in/IN
  1976/CD
  with/IN
  a/DT
  funding/NN
  of/IN
  $/$
  1,000/CD
  ./.)


### 📊 5. Advantages of NER

| Advantage                          | Description                                       |
| ---------------------------------- | ------------------------------------------------- |
| ✅ **Structure from Unstructured**  | Extract machine-readable metadata from plain text |
| ✅ **Improves Search & Retrieval**  | Enables entity-aware search engines               |
| ✅ **Supports Relation Extraction** | Helps build knowledge graphs                      |
| ✅ **Customizable**                 | Domain-specific entities can be trained           |

### ⚠️ 6. Limitations of NLTK’s NER
| Limitation               | Description                                               |
| ------------------------ | --------------------------------------------------------- |
| ❌ **Limited to English** | No multilingual support                                   |
| ❌ **Shallow Parsing**    | Based on pre-trained decision tree models                 |
| ❌ **Not Contextual**     | Cannot handle polysemy (e.g., "Apple" = fruit vs company) |
| ❌ **Static Model**       | No retraining/fine-tuning support in core NLTK            |

### 🔄 7. Alternative NER Libraries
| Tool                           | Strength                                          |
| ------------------------------ | ------------------------------------------------- |
| **spaCy**                      | Fast, deep-learning-based, multilingual           |
| **Stanza (Stanford NLP)**      | Accurate contextual recognition                   |
| **Flair**                      | Contextual embeddings for high accuracy           |
| **Transformers (HuggingFace)** | SOTA models like BERT, RoBERTa for fine-tuned NER |


In [57]:
import nltk
from nltk.tokenize import sent_tokenize, word_tokenize

In [58]:
sentence = "Hello, world! I am Suraj. I live in India. This is a test sentence."

In [59]:
tagged_sentences = nltk.pos_tag(word_tokenize(sentence))

In [60]:
tagged_sentences

[('Hello', 'NNP'),
 (',', ','),
 ('world', 'NN'),
 ('!', '.'),
 ('I', 'PRP'),
 ('am', 'VBP'),
 ('Suraj', 'NNP'),
 ('.', '.'),
 ('I', 'PRP'),
 ('live', 'VBP'),
 ('in', 'IN'),
 ('India', 'NNP'),
 ('.', '.'),
 ('This', 'DT'),
 ('is', 'VBZ'),
 ('a', 'DT'),
 ('test', 'NN'),
 ('sentence', 'NN'),
 ('.', '.')]

In [61]:
downloaded = nltk.download('punkt')
downloaded = nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')


[nltk_data] Downloading package punkt to C:\Users\Suraj
[nltk_data]     Khodade\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\Suraj Khodade\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package maxent_ne_chunker to C:\Users\Suraj
[nltk_data]     Khodade\AppData\Roaming\nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!


True

In [62]:
ne =  nltk.ne_chunk(tagged_sentences,binary=True)

In [63]:
print(ne)

(S
  (NE Hello/NNP)
  ,/,
  world/NN
  !/.
  I/PRP
  am/VBP
  Suraj/NNP
  ./.
  I/PRP
  live/VBP
  in/IN
  (NE India/NNP)
  ./.
  This/DT
  is/VBZ
  a/DT
  test/NN
  sentence/NN
  ./.)


In [64]:
ne.draw()  # This will display the named entity recognition tree in a separate window