# Named Entity Recognition - Exercise

Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that **seeks to locate and classify named entities in text into pre-defined categories** such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.  (wikipedia.org)

Let's take an example using NLTK. You will have to apply concepts you studied in previous lectures, e.g. tokenization and chunking.

NLTK has his own Named Entity chunker, **.ne_chunk()**

In [1]:
import nltk

In [2]:
text = "United States is a large country. David is a clever student from Japan. Both countries are members of the United Nations."

In [3]:
words = nltk.word_tokenize(text)
tags = nltk.pos_tag(words)

In [4]:
#print(tags)

Now we can apply **.ne_chunk()**:

From the Documentation:

* **nltk.ne_chunk(tagged_tokens, binary=False)**: Use NLTK's currently recommended named entity chunker to chunk the given list of tagged tokens.
    * if binary=True [1], then named entities are just tagged as NE; otherwise, the classifier adds category labels such as PERSON, ORGANIZATION, and GPE

In [5]:
chunks = nltk.ne_chunk(tags)

In [6]:
for chunk in chunks:
    if hasattr(chunk, "label"):
        print(" ".join(c[0] for c in chunk.leaves()), "-->", chunk.label())

United States --> GPE
David --> PERSON
Japan --> GPE
United Nations --> ORGANIZATION
