# Named Entity Recognition (NER) using NLTK

This notebook provides a detailed explanation and demonstration of Named Entity Recognition (NER) using the Natural Language Toolkit (NLTK) in Python.

## What is Named Entity Recognition?
Named Entity Recognition (NER) is a subtask of information extraction that classifies named entities into predefined categories such as:
- Person
- Location
- Organization
- Date/Time
- Money
- Percentages

NER is a crucial component in Natural Language Processing (NLP) applications like information retrieval, question answering, and summarization.

## Example Sentence
Consider the following sentence:

> *"The Eiffel Tower was built from 1887 to 1889 by French engineer Gustave Eiffel, whose company specialized in building metal frameworks and structures."*

This sentence contains several named entities:
- `Eiffel Tower`: Location or Organization
- `1887 to 1889`: Date/Time
- `Gustave Eiffel`: Person
- `French`: Nationality/Geopolitical Entity (GPE)

We'll use NLTK to identify these entities.

In [1]:
import nltk
from nltk import word_tokenize, pos_tag, ne_chunk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\vishalrathod\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\vishalrathod\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     C:\Users\vishalrathod\AppData\Roaming\nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package words to
[nltk_data]     C:\Users\vishalrathod\AppData\Roaming\nltk_data...
[nltk_data]   Package words is already up-to-date!


True

## Tokenization
We first tokenize the sentence into words.

In [2]:
sentence = "The Eiffel Tower was built from 1887 to 1889 by French engineer Gustave Eiffel, whose company specialized in building metal frameworks and structures."
tokens = word_tokenize(sentence)
tokens

['The',
 'Eiffel',
 'Tower',
 'was',
 'built',
 'from',
 '1887',
 'to',
 '1889',
 'by',
 'French',
 'engineer',
 'Gustave',
 'Eiffel',
 ',',
 'whose',
 'company',
 'specialized',
 'in',
 'building',
 'metal',
 'frameworks',
 'and',
 'structures',
 '.']

## Part-of-Speech Tagging
Next, we assign parts of speech to each token.

In [3]:
tagged_tokens = pos_tag(tokens)
tagged_tokens

[('The', 'DT'),
 ('Eiffel', 'NNP'),
 ('Tower', 'NNP'),
 ('was', 'VBD'),
 ('built', 'VBN'),
 ('from', 'IN'),
 ('1887', 'CD'),
 ('to', 'TO'),
 ('1889', 'CD'),
 ('by', 'IN'),
 ('French', 'JJ'),
 ('engineer', 'NN'),
 ('Gustave', 'NNP'),
 ('Eiffel', 'NNP'),
 (',', ','),
 ('whose', 'WP$'),
 ('company', 'NN'),
 ('specialized', 'VBD'),
 ('in', 'IN'),
 ('building', 'NN'),
 ('metal', 'NN'),
 ('frameworks', 'NNS'),
 ('and', 'CC'),
 ('structures', 'NNS'),
 ('.', '.')]

## Named Entity Recognition
Now we apply NER chunking using `ne_chunk`.

In [4]:
import nltk

# Download required NLTK data
nltk.download('maxent_ne_chunker')
nltk.download('words')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

# Check where NLTK is looking for data
print(nltk.data.path)

['C:\\Users\\vishalrathod/nltk_data', 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\\nltk_data', 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\\share\\nltk_data', 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\\lib\\nltk_data', 'C:\\Users\\vishalrathod\\AppData\\Roaming\\nltk_data', 'C:\\nltk_data', 'D:\\nltk_data', 'E:\\nltk_data']


[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     C:\Users\vishalrathod\AppData\Roaming\nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package words to
[nltk_data]     C:\Users\vishalrathod\AppData\Roaming\nltk_data...
[nltk_data]   Package words is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\vishalrathod\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\vishalrathod\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


In [5]:
nltk.data.path.append(r'D:\nltk_data')

In [6]:
from nltk import ne_chunk
named_entities = ne_chunk(tagged_tokens, binary=True)
named_entities.draw()

LookupError: 
**********************************************************************
  Resource [93mmaxent_ne_chunker_tab[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  [31m>>> import nltk
  >>> nltk.download('maxent_ne_chunker_tab')
  [0m
  For more information see: https://www.nltk.org/data.html

  Attempted to load [93mchunkers/maxent_ne_chunker_tab/english_ace_binary/[0m

  Searched in:
    - 'C:\\Users\\vishalrathod/nltk_data'
    - 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\\nltk_data'
    - 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\\share\\nltk_data'
    - 'C:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\\lib\\nltk_data'
    - 'C:\\Users\\vishalrathod\\AppData\\Roaming\\nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
    - 'D:\\nltk_data'
**********************************************************************


## Visualizing Named Entities
You can visualize the named entity tree using the `.draw()` method.

In [None]:
# This will open a pop-up window with the tree visualization (if supported)
named_entities.draw()

NameError: name 'named_entities' is not defined

## Summary
- We used NLTK to perform Named Entity Recognition on a sample sentence.
- Tokenization, POS tagging, and chunking were demonstrated.
- Named entities such as persons, locations, and dates were identified.

NER is a powerful tool for extracting structured information from unstructured text.