# Named Entity Recognition using NLTK Library in Python
### What is Named Entity Recognition?
To understand what is Named Entity Recognition process in NLP, it will be a good starting point to first understand the concept of Named Entity.

####  1. Named Entity
Named entities are proper nouns that refer to specific entities that can be a person, organization, location, date, etc. Consider this example – “Mount Everest is the tallest mountain”. Here Mount Everest is a named entity of type location as it refers to a specific entity.

#### 2. Named Entity Recognition
In information retrieval and natural language processing, Named Entity Recognition (NER) is the process of extracting Named Entities from the text.

NER is a two steps process, we first perform Part of Speech (POS) tagging on the text, and then using it we extract the named entities based on the information of POS tagging

### Uses of Named Entity Recognition
Named Entity Recognition is useful in –

- The field of academics by easy and faster extraction of information for the students and researchers from the searching data.
- In Question Answer system to provide answers from the data by the machine and hence minimizing human efforts.
- In content classification by identifying the theme and subject of the contents and makes the process faster and easy, suggesting the best content of interest.
- Helps in customer service by categorizing the user complaint, request, and question in respective fields and filtering by priority keywords.
- Helps to categories the books and articles in the e-library on different subjects and thus making it organized.


In [5]:
# example
import nltk
from nltk import word_tokenize,pos_tag
nltk.download('maxent_ne_chunker')
nltk.download('words')

text = "Microsoft acquired Linkedin for $26.2 billion in a deal aimed to grow the professional networking site."
tokens = word_tokenize(text)
tag=pos_tag(tokens)
print(tag)

ne_tree = nltk.ne_chunk(tag)
print(ne_tree)

[('Microsoft', 'NNP'), ('acquired', 'VBD'), ('Linkedin', 'NNP'), ('for', 'IN'), ('$', '$'), ('26.2', 'CD'), ('billion', 'CD'), ('in', 'IN'), ('a', 'DT'), ('deal', 'NN'), ('aimed', 'VBN'), ('to', 'TO'), ('grow', 'VB'), ('the', 'DT'), ('professional', 'JJ'), ('networking', 'NN'), ('site', 'NN'), ('.', '.')]
(S
  (PERSON Microsoft/NNP)
  acquired/VBD
  (PERSON Linkedin/NNP)
  for/IN
  $/$
  26.2/CD
  billion/CD
  in/IN
  a/DT
  deal/NN
  aimed/VBN
  to/TO
  grow/VB
  the/DT
  professional/JJ
  networking/NN
  site/NN
  ./.)


[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     C:\Users\user\AppData\Roaming\nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package words to
[nltk_data]     C:\Users\user\AppData\Roaming\nltk_data...
[nltk_data]   Package words is already up-to-date!


Let us see one more example where we have used already present tagged sentences provided by the NLTK library.

In [7]:
# example
nltk.download('treebank')
sent = nltk.corpus.treebank.tagged_sents()
print(nltk.ne_chunk(sent[0]))

[nltk_data] Downloading package treebank to
[nltk_data]     C:\Users\user\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\treebank.zip.


(S
  (PERSON Pierre/NNP)
  (ORGANIZATION Vinken/NNP)
  ,/,
  61/CD
  years/NNS
  old/JJ
  ,/,
  will/MD
  join/VB
  the/DT
  board/NN
  as/IN
  a/DT
  nonexecutive/JJ
  director/NN
  Nov./NNP
  29/CD
  ./.)
