# Named Entity Recognition (NER) with NLTK

## What is Named Entity Recognition (NER)?

**Named Entity Recognition (NER)** is a subtask of **information extraction** that identifies and classifies **named entities** in text into predefined categories. These categories typically include:

- **Person**: Names of people (e.g., "Barack Obama")
- **Organization**: Names of companies or institutions (e.g., "Google")
- **Location**: Names of geographic locations (e.g., "Paris", "Hawaii")
- **Date**, **Time**, **Money**, **Percent**, etc.

NER helps in understanding the context and extracting important information from unstructured text.

## NER with NLTK

NLTK provides a built-in **Named Entity Chunker** using a trained model to identify named entities. The function `nltk.ne_chunk()` is used for NER in NLTK.

## When to Use NER:
- **Information Retrieval**: To improve search results by identifying and prioritizing entities.
- **Question Answering**: For answering questions about specific people, places, or organizations.
- **Content Extraction**: To pull out relevant data like dates, locations, or people for automated analysis.

In [5]:
import nltk
nltk.download('punkt')  # Downloads the tokenizer models for breaking text into sentences and words
nltk.download('averaged_perceptron_tagger')  # Downloads the POS tagger model used for part-of-speech tagging
nltk.download('maxent_ne_chunker')  # Downloads the named entity chunker model for NER tasks
nltk.download('maxent_ne_chunker_tab')  # NER chunker tab files (needed for parsing)
nltk.download('words')  # Downloads a list of English words used by the NER and chunker

[nltk_data] Downloading package punkt to
[nltk_data]     /home/u5c2dbc0bf2849dd5288e3311262c709/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/u5c2dbc0bf2849dd5288e3311262c709/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /home/u5c2dbc0bf2849dd5288e3311262c709/nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package maxent_ne_chunker_tab to
[nltk_data]     /home/u5c2dbc0bf2849dd5288e3311262c709/nltk_data...
[nltk_data]   Unzipping chunkers/maxent_ne_chunker_tab.zip.
[nltk_data] Downloading package words to
[nltk_data]     /home/u5c2dbc0bf2849dd5288e3311262c709/nltk_data...
[nltk_data]   Package words is already up-to-date!


True

In [6]:
sample_text = "Aryan is an ambivert boy!!"

sentences = nltk.sent_tokenize(sample_text)
for sentence in sentences:
    words = nltk.word_tokenize(sentence)
    pos_tags = nltk.pos_tag(words)
    named_entities = nltk.ne_chunk(pos_tags)
    print(named_entities)
    

(S (GPE Aryan/NNP) is/VBZ an/DT ambivert/JJ boy/NN !/.)
(S !/.)
