# Named Entity Recognition

It's tagging and classifying the tokens into particular semantic entities like person, organization, etc.

In [1]:
import spacy

nlp = spacy.load("en_core_web_sm")

In [2]:
def show_ents(doc):
    if doc.ents:
        for ent in doc.ents:
            print(f"{ent.text} - {ent.label_} - {spacy.explain(ent.label_)}")
    else:
        print("No entities found")

In [5]:
doc = nlp("Hi how are you?")

In [6]:
show_ents(doc)

No entities found


In [7]:
doc = nlp("I want to go to Japan next year and visit to the kinkaku-ji temple.")
show_ents(doc)

Japan - GPE - Countries, cities, states
next year - DATE - Absolute or relative dates or periods


## NER Tags
Tags are accessible through the `.label_` property of an entity.
<table>
<tr><th>TYPE</th><th>DESCRIPTION</th><th>EXAMPLE</th></tr>
<tr><td>`PERSON`</td><td>People, including fictional.</td><td>*Fred Flintstone*</td></tr>
<tr><td>`NORP`</td><td>Nationalities or religious or political groups.</td><td>*The Republican Party*</td></tr>
<tr><td>`FAC`</td><td>Buildings, airports, highways, bridges, etc.</td><td>*Logan International Airport, The Golden Gate*</td></tr>
<tr><td>`ORG`</td><td>Companies, agencies, institutions, etc.</td><td>*Microsoft, FBI, MIT*</td></tr>
<tr><td>`GPE`</td><td>Countries, cities, states.</td><td>*France, UAR, Chicago, Idaho*</td></tr>
<tr><td>`LOC`</td><td>Non-GPE locations, mountain ranges, bodies of water.</td><td>*Europe, Nile River, Midwest*</td></tr>
<tr><td>`PRODUCT`</td><td>Objects, vehicles, foods, etc. (Not services.)</td><td>*Formula 1*</td></tr>
<tr><td>`EVENT`</td><td>Named hurricanes, battles, wars, sports events, etc.</td><td>*Olympic Games*</td></tr>
<tr><td>`WORK_OF_ART`</td><td>Titles of books, songs, etc.</td><td>*The Mona Lisa*</td></tr>
<tr><td>`LAW`</td><td>Named documents made into laws.</td><td>*Roe v. Wade*</td></tr>
<tr><td>`LANGUAGE`</td><td>Any named language.</td><td>*English*</td></tr>
<tr><td>`DATE`</td><td>Absolute or relative dates or periods.</td><td>*20 July 1969*</td></tr>
<tr><td>`TIME`</td><td>Times smaller than a day.</td><td>*Four hours*</td></tr>
<tr><td>`PERCENT`</td><td>Percentage, including "%".</td><td>*Eighty percent*</td></tr>
<tr><td>`MONEY`</td><td>Monetary values, including unit.</td><td>*Twenty Cents*</td></tr>
<tr><td>`QUANTITY`</td><td>Measurements, as of weight or distance.</td><td>*Several kilometers, 55kg*</td></tr>
<tr><td>`ORDINAL`</td><td>"first", "second", etc.</td><td>*9th, Ninth*</td></tr>
<tr><td>`CARDINAL`</td><td>Numerals that do not fall under another type.</td><td>*2, Two, Fifty-two*</td></tr>
</table>

Expanding the default list of entities.

In case we are dealing with a new entity that is not being recognized by the default NER tags, we can create it and add it manually to the entities of the document.

In [13]:
from spacy.tokens import Span

ORG = doc.vocab.strings["ORG"]
new_ent = Span(doc, 12, 15, label=ORG)
doc.ents = list(doc.ents) + [new_ent]

In [14]:
show_ents(doc)

Japan - GPE - Countries, cities, states
next year - DATE - Absolute or relative dates or periods
kinkaku-ji - ORG - Companies, agencies, institutions, etc.


Notice now we get the Kinkaku-ji temple recognized as an Org