```
#############################################
##                                         ##
##  Natural Language Processing in Python  ##
##                                         ##
#############################################

§1 Introduction to Natural Language Processing in Python

§1.3 Named-entity recognition
```

# Named entity recognition

## What is Named Entity Recognition (NER)?

* NER is a NLP task to identify important named entities in the text, such as:

    * *people*, *places*, *organizations*

    * *dates*, *states*, *works of art*

    * *and other categories*

* NER can be used alongside topic identification or on its own to determine important items in a text or answer basic natural language understanding questions such as:

    * *who*, *what*, *when*, *where*

## What is the Stanford CoreNLP library?

* The Stanford CoreNLP library:

    * integrated into Python via NLTK

    * is a Java-based library

    * support for NER as well as coreference and dependency trees

## Code of using NLTK for Named Entity Recognition:

In [2]:
import nltk

sentence = '''In New York, I like to ride the Metro to visit MOMA and \
some restaurants rated well by Ruth Reichl.'''
tokenized_sent = nltk.word_tokenize(sentence)
tagged_sent = nltk.pos_tag(tokenized_sent)
tagged_sent[:3]

[('In', 'IN'), ('New', 'NNP'), ('York', 'NNP')]

In [7]:
print(nltk.ne_chunk(tagged_sent))

(S
  In/IN
  (GPE New/NNP York/NNP)
  ,/,
  I/PRP
  like/VBP
  to/TO
  ride/VB
  the/DT
  (ORGANIZATION Metro/NNP)
  to/TO
  visit/VB
  (ORGANIZATION MOMA/NNP)
  and/CC
  some/DT
  restaurants/NNS
  rated/VBN
  well/RB
  by/IN
  (PERSON Ruth/NNP Reichl/NNP)
  ./.)


## Practice exercises for Named Entity Recognition:

$\blacktriangleright$ **Package pre-loading:**

In [10]:
import nltk
from nltk.tokenize import sent_tokenize
from nltk.tokenize import word_tokenize

$\blacktriangleright$ **Data pre-loading:**

In [11]:
article = open('ref1. News article - Uber Apple.txt').read()

$\blacktriangleright$ **NLTK NER practice:**

In [12]:
# Tokenize the article into sentences: sentences
sentences = sent_tokenize(article)

# Tokenize each sentence into words: token_sentences
token_sentences = [word_tokenize(sent) for sent in sentences]

# Tag each tokenized sentence into parts of speech: pos_sentences
pos_sentences = [nltk.pos_tag(sent) for sent in token_sentences] 

# Create the named entity chunks: chunked_sentences
chunked_sentences = nltk.ne_chunk_sents(pos_sentences, binary=True)

# Test for stems of the tree with 'NE' tags
for sent in chunked_sentences:
    for chunk in sent:
        if hasattr(chunk, "label") and chunk.label() == "NE":
            print(chunk)

(NE Uber/NNP)
(NE Beyond/NN)
(NE Apple/NNP)
(NE Uber/NNP)
(NE Uber/NNP)
(NE Travis/NNP Kalanick/NNP)
(NE Tim/NNP Cook/NNP)
(NE Apple/NNP)
(NE Silicon/NNP Valley/NNP)
(NE CEO/NNP)
(NE Yahoo/NNP)
(NE Marissa/NNP Mayer/NNP)
