# Named Entity Recognition (NER)
spaCy has an **'ner'** pipeline component that identifies token spans fitting a predetermined set of named entities. These are available as the `ents` property of a `Doc` object.

In [1]:
import spacy

In [2]:
nlp = spacy.load('en_core_web_sm')

In [3]:
def show_ents(doc):
    # if there's an entity, print out the entity and its label info
    if doc.ents:
        for ent in doc.ents:
            print(ent.text + ' - ' + ent.label_ + ' - ' + str(spacy.explain(ent.label_)))
    # otherwise, say no entities were found
    else:
        print('No entities found.')

In [4]:
doc = nlp(u'Hi how are you?')

In [5]:
show_ents(doc)

No entities found.


In [6]:
doc = nlp(u"May I go to Washington, DC next May to see the Washington Monument?")

In [7]:
show_ents(doc)

Washington, DC - GPE - Countries, cities, states
next May - DATE - Absolute or relative dates or periods
the Washington Monument - ORG - Companies, agencies, institutions, etc.


## Entity annotations
`Doc.ents` are token spans with their own set of annotations.
<table>
<tr><td>`ent.text`</td><td>The original entity text</td></tr>
<tr><td>`ent.label`</td><td>The entity type's hash value</td></tr>
<tr><td>`ent.label_`</td><td>The entity type's string description</td></tr>
<tr><td>`ent.start`</td><td>The token span's *start* index position in the Doc</td></tr>
<tr><td>`ent.end`</td><td>The token span's *stop* index position in the Doc</td></tr>
<tr><td>`ent.start_char`</td><td>The entity text's *start* index position in the Doc</td></tr>
<tr><td>`ent.end_char`</td><td>The entity text's *stop* index position in the Doc</td></tr>
</table>



In [8]:
doc = nlp(u"Can I please have 500 dollars of Microsoft stock?")

In [9]:
show_ents(doc)

500 dollars - MONEY - Monetary values, including unit
Microsoft - ORG - Companies, agencies, institutions, etc.


## NER Tags
Tags are accessible through the `.label_` property of an entity.
<table>
<tr><th>TYPE</th><th>DESCRIPTION</th><th>EXAMPLE</th></tr>
<tr><td>`PERSON`</td><td>People, including fictional.</td><td>*Fred Flintstone*</td></tr>
<tr><td>`NORP`</td><td>Nationalities or religious or political groups.</td><td>*The Republican Party*</td></tr>
<tr><td>`FAC`</td><td>Buildings, airports, highways, bridges, etc.</td><td>*Logan International Airport, The Golden Gate*</td></tr>
<tr><td>`ORG`</td><td>Companies, agencies, institutions, etc.</td><td>*Microsoft, FBI, MIT*</td></tr>
<tr><td>`GPE`</td><td>Countries, cities, states.</td><td>*France, UAR, Chicago, Idaho*</td></tr>
<tr><td>`LOC`</td><td>Non-GPE locations, mountain ranges, bodies of water.</td><td>*Europe, Nile River, Midwest*</td></tr>
<tr><td>`PRODUCT`</td><td>Objects, vehicles, foods, etc. (Not services.)</td><td>*Formula 1*</td></tr>
<tr><td>`EVENT`</td><td>Named hurricanes, battles, wars, sports events, etc.</td><td>*Olympic Games*</td></tr>
<tr><td>`WORK_OF_ART`</td><td>Titles of books, songs, etc.</td><td>*The Mona Lisa*</td></tr>
<tr><td>`LAW`</td><td>Named documents made into laws.</td><td>*Roe v. Wade*</td></tr>
<tr><td>`LANGUAGE`</td><td>Any named language.</td><td>*English*</td></tr>
<tr><td>`DATE`</td><td>Absolute or relative dates or periods.</td><td>*20 July 1969*</td></tr>
<tr><td>`TIME`</td><td>Times smaller than a day.</td><td>*Four hours*</td></tr>
<tr><td>`PERCENT`</td><td>Percentage, including "%".</td><td>*Eighty percent*</td></tr>
<tr><td>`MONEY`</td><td>Monetary values, including unit.</td><td>*Twenty Cents*</td></tr>
<tr><td>`QUANTITY`</td><td>Measurements, as of weight or distance.</td><td>*Several kilometers, 55kg*</td></tr>
<tr><td>`ORDINAL`</td><td>"first", "second", etc.</td><td>*9th, Ninth*</td></tr>
<tr><td>`CARDINAL`</td><td>Numerals that do not fall under another type.</td><td>*2, Two, Fifty-two*</td></tr>
</table>

## Add named entity to a Span

In [10]:
doc = nlp(u"Tesla to build a U.K. factory for $6 million")

In [11]:
show_ents(doc)

U.K. - GPE - Countries, cities, states
$6 million - MONEY - Monetary values, including unit


Let's add 'Tesla' as a proper name referring to a company

In [12]:
from spacy.tokens import Span

In [13]:
ORG = doc.vocab.strings[u"ORG"]

In [14]:
ORG

381

In [15]:
# Span(document object, start index, stop index, label assigned to entity)
new_ent = Span(doc,0,1,label=ORG)

In [16]:
doc.ents = list(doc.ents) + [new_ent]

In [17]:
show_ents(doc)

Tesla - ORG - Companies, agencies, institutions, etc.
U.K. - GPE - Countries, cities, states
$6 million - MONEY - Monetary values, including unit


## Add multiple named entities to a Span

In [18]:
doc = nlp(u"Our company created a brand new vacuum cleaner."
         u"This new vacuum-cleaner is the best in show.")

In [19]:
show_ents(doc)

No entities found.


In [20]:
from spacy.matcher import PhraseMatcher

In [21]:
matcher = PhraseMatcher(nlp.vocab)

In [22]:
phrase_list = ['vacuum cleaner','vacuum-cleaner']

In [23]:
phrase_patterns = [nlp(text) for text in phrase_list]

In [24]:
# name of matcher, callbacks, patterns to find
matcher.add('new_product',None,*phrase_patterns)

In [25]:
found_matches = matcher(doc)

In [26]:
found_matches

[(9676102616875934564, 6, 8), (9676102616875934564, 11, 14)]

In [27]:
PROD = doc.vocab.strings[u"PRODUCT"]

In [28]:
new_ents = [Span(doc,match[1],match[2],label=PROD) for match in found_matches]

In [29]:
doc.ents = list(doc.ents) + new_ents

In [30]:
show_ents(doc)

vacuum cleaner - PRODUCT - Objects, vehicles, foods, etc. (not services)
vacuum-cleaner - PRODUCT - Objects, vehicles, foods, etc. (not services)


In [31]:
doc = nlp(u"Originally I paid $29.95 for this car toy, but now it is marked down by 10 dollars")

In [32]:
# show all entities mentioned in document object
[ent for ent in doc.ents]

[29.95, 10 dollars]

In [33]:
# show all LABELS of entities mentioned in document object
[ent.label_ for ent in doc.ents]

['MONEY', 'MONEY']

In [34]:
# how many times was a certain entity named?
len([ent for ent in doc.ents if ent.label_ == "MONEY"])

2

# Visualize Named Entities with displacy

In [35]:
from spacy import displacy

In [36]:
doc = nlp(u"Over the last quarter Apple sold nearly 20 thousand iPods for a profit of $6 million."
         u"By contrast, Sony only sold 8 thousand Walkman music players.")

In [37]:
displacy.render(doc,style='ent',jupyter=True)

In [38]:
for sent in doc.sents:
    displacy.render(nlp(sent.text),style='ent',jupyter=True)

## Highlight only certain entity types

In [39]:
options = {'ents':['PRODUCT']}

In [40]:
displacy.render(doc,style='ent',jupyter=True,options=options)

## Customize entity colors

In [41]:
colors = {'ORG':'red'}
options = {'ents':['PRODUCT','ORG'],'colors':colors}

In [42]:
displacy.render(doc,style='ent',jupyter=True,options=options)

## Color along radial gradient

In [43]:
colors = {'ORG':'radial-gradient(yellow,red)'}
options = {'ents':['PRODUCT','ORG'],'colors':colors}

In [44]:
displacy.render(doc,style='ent',jupyter=True,options=options)

## Color along linear gradient (left to right)

In [45]:
colors = {'ORG':'linear-gradient(90deg,yellow,red)'}
options = {'ents':['PRODUCT','ORG'],'colors':colors}

In [46]:
displacy.render(doc,style='ent',jupyter=True,options=options)

## Color along linear gradient (top to bottom)

In [47]:
colors = {'ORG':'linear-gradient(180deg,yellow,red)'}
options = {'ents':['PRODUCT','ORG'],'colors':colors}

In [48]:
displacy.render(doc,style='ent',jupyter=True,options=options)

## Visualize in browser

In [49]:
displacy.serve(doc,style='ent',options=options)


[93m    Serving on port 5000...[0m
    Using the 'ent' visualizer


    Shutting down server on port 5000.



## Go to http://127.0.0.1:5000/ to view in browser