##  Named Entity Recognition

NER is used to identify named entity such as names, organizations, etc. from unstructures. It allows to add entities as well.

In [46]:
import spacy
nlp = spacy.load('en_core_web_sm')

In [47]:
def show_ents(doc):
    if doc.ents:
        for ent in doc.ents:
            print(f"{ent.text:{35}} {ent.label_:{10}} {spacy.explain(ent.label_):{10}}") #label_: label of the entity
    else:
        print("No entities found")

In [48]:
doc = nlp(u"I am a student. I am studying in the University of Waterloo. I would like to visit Canada next year. The trip will cost $1000b.")
show_ents(doc)

the University of Waterloo          ORG        Companies, agencies, institutions, etc.
Canada                              GPE        Countries, cities, states
next year                           DATE       Absolute or relative dates or periods


In [49]:
doc1 = nlp(u"Tesla to acquire U.S. startup for $6.5 billion")
show_ents(doc1) #Tesla is not recognized as an entity

U.S.                                GPE        Countries, cities, states
$6.5 billion                        MONEY      Monetary values, including unit


In [50]:
#Creating custom entity

from spacy.tokens import Span
ORG = doc.vocab.strings[u'ORG'] #ORG: label of the entity

new_ent = Span(doc1,0,1,label=ORG)
doc1.ents = list(doc1.ents) + [new_ent]

show_ents(doc1)

Tesla                               ORG        Companies, agencies, institutions, etc.
U.S.                                GPE        Countries, cities, states
$6.5 billion                        MONEY      Monetary values, including unit


### Adding Multiple NERs

For examply we might wanna add vacuum cleaner and vacuum-cleaner as PROD (product) NERs. Below is an example of how to achieve the same.

In [51]:
doc2 = nlp(u"Our company created a brand new vacuum cleaner." u"This new vacuum-cleaner is the best in the business")
show_ents(doc2)

No entities found


In [52]:
#Creating the NERs

from spacy.matcher import PhraseMatcher
matcher = PhraseMatcher(nlp.vocab)

phrase_list = ['vacuum cleaner','vacuum-cleaner']
phrase_patterns = [nlp(text) for text in phrase_list]
matcher.add('VACUUM_CLEANER', None, *phrase_patterns)

found_matches = matcher(doc2)
print(found_matches)


[(17204589546155032115, 6, 8), (17204589546155032115, 11, 14)]


In [53]:
from spacy.tokens import Span
PROD = doc2.vocab.strings[u'PRODUCT']
new_entity = [Span(doc2,match[1],match[2],label=PROD) for match in found_matches] #match[1]: start index of the entity, match[2]: end index of the entity
doc2.ents = list(doc2.ents) + new_entity

In [54]:
# Counting the number of entities

doc3 = nlp(u"Originally I paid $100 for the vacuum cleaner. I then bought a new vacuum cleaner for 200 dollars.")

print (len([ent for ent in doc3.ents])) ## Count of all entities
print (len([ent for ent in doc3.ents if ent.label_ == 'MONEY'])) ## Count of entities with label MONEY

2
2


### Visualising Named Entities

In [61]:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy
doc4 = nlp(u"Over the last quarter Apple sold nearly 20 million iPhones and over 1 billion iPads for a profit of $1.5 billion."
           u"By contrast, Sony only sold 8 thousand Walkman music players and over a million portable media players for a loss of $1.5 billion.")
displacy.render(doc4,style='ent',jupyter=True) #style: 'ent' for entities, 'dep' for dependencies

for sent in doc4.sents:
    displacy.render(sent,style='ent',jupyter=True)

colors = {'ORG':'yellow'}
options = {'ents':['PRODUCT','ORG'],'colors':colors}
for sent in doc4.sents:
    displacy.render(sent,style='ent',jupyter=True,options=options)

