# Named entity recognition (NER)

Named entity recognition (NER) seeks to locate and classify named entity mentions in unstructured text into pre-defined 
categories such as the person names, organization, location, medical codes, time expression, quantity,
percentage ....

In [1]:
import spacy
nlp = spacy.load('en_core_web_sm')

In [12]:
def show_ents(doc):
    if doc.ents:
        for ent in doc.ents:
            print(ent.text+ '- '+ent.label_+ '- '+str(spacy.explain(ent.label_)))
    else:
            print('No entities found.')

In [13]:
doc= nlp(u'Hi how are you?')

In [14]:
show_ents(doc)

No entities found.


In [15]:
doc1= nlp(u'May I go to Washington, DC next May to see the Washington Monument? ')

In [16]:
show_ents(doc1)

Washington, DC- GPE- Countries, cities, states
next May- DATE- Absolute or relative dates or periods
the Washington Monument- ORG- Companies, agencies, institutions, etc.


In [17]:
doc3 = nlp(u"Can I please have 500 dollars of Microsoft stocks? ")

In [18]:
show_ents(doc3)

500 dollars- MONEY- Monetary values, including unit
Microsoft- ORG- Companies, agencies, institutions, etc.


In [37]:
doc4 = nlp(u"Hema to build a U.K. factory for $6 million. ")

In [38]:
show_ents(doc4)

U.K.- GPE- Countries, cities, states
$6 million- MONEY- Monetary values, including unit


# ADD A Named Entity

In [39]:
from spacy.tokens import Span

In [40]:
ORG = doc.vocab.strings[u"ORG"]

In [41]:
ORG

383

In [44]:
new_ent = Span(doc4, 0,1,label=ORG)

In [45]:
# add entity to existing doc
doc4.ents = list(doc4.ents) + [new_ent]

In [46]:
show_ents(doc4)

Hema- ORG- Companies, agencies, institutions, etc.
U.K.- GPE- Countries, cities, states
$6 million- MONEY- Monetary values, including unit


# Add Multiple Named Entity

In [48]:
doc5 = nlp(u"Our company created a brand new vacuum cleaner." 
           u"This new vacuum-cleaner is the best in show. ")

In [49]:
show_ents(doc5)

No entities found.


In [51]:
from spacy.matcher import PhraseMatcher

In [52]:
matcher = PhraseMatcher(nlp.vocab)

In [53]:
phrase_list = ["vacuum cleaner", "vacuum-cleaner"]

In [54]:
phrase_patterns = [nlp(text) for text in phrase_list]

In [55]:
matcher.add('New Product', None,*phrase_patterns)

In [57]:
found_matches= matcher(doc5)
print(found_matches)

[(1006732951835949582, 6, 8), (1006732951835949582, 11, 14)]


In [58]:
from spacy.tokens import Span

In [64]:
PROD = doc5.vocab.strings[u"PRODUCT"]

In [65]:
found_matches

[(1006732951835949582, 6, 8), (1006732951835949582, 11, 14)]

In [66]:
new_ents =[Span(doc5, match[1], match[2], label=PROD) for match in found_matches]

In [71]:
new_ents

[vacuum cleaner, vacuum-cleaner]

In [72]:
doc5

Our company created a brand new vacuum cleaner.This new vacuum-cleaner is the best in show. 

In [69]:
# add entity to existing doc
doc5.ents = list(doc5.ents) + new_ents

In [70]:
show_ents(doc5)

Our- ORG- Companies, agencies, institutions, etc.
vacuum cleaner- PRODUCT- Objects, vehicles, foods, etc. (not services)
vacuum-cleaner- PRODUCT- Objects, vehicles, foods, etc. (not services)


# How many times the entity was mentioned.

In [73]:
docs = nlp(u"Originally I paid $29.95 for this car toy, but now it is marked down by 10 dollars. ")

In [75]:
# this will give every entity
[ent for ent in docs.ents]

[29.95, 10 dollars]

In [76]:
# to find a specific entity.
[ent for ent in docs.ents if ent.label_ == "MONEY"]

[29.95, 10 dollars]

In [77]:
# to find the number that how many time a specific entity was present.
len([ent for ent in docs.ents if ent.label_ == "MONEY"])

2

# Visualizing Named Entity recognition.

In [78]:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

In [81]:
docc = nlp(u"Over the last quater Apple sold nearly 20 thousand iPods for a profit of 6 million")

In [82]:
displacy.render(docc,style='ent', jupyter=True)

In [85]:
docy = nlp(u"Over the last quater Apple sold nearly 20 thousand iPods for a profit of 6 million"
          "In contrast Sony only sold 8 thousand walkman music players")

In [86]:
# this will show entire thing as one long line.
displacy.render(docy,style='ent', jupyter=True)

In [89]:
# to get it in diff sentences.
for sent in docy.sents:
    displacy.render(nlp(sent.text), style='ent', jupyter=True)

In [90]:
# Product entity
options= {'ents':['PRODUCT'] }

In [91]:
displacy.render(docy,style='ent', jupyter=True, options=options)

In [93]:
# Product entity
options1= {'ents':['PRODUCT', 'ORG'] }

In [94]:
displacy.render(docy,style='ent', jupyter=True, options=options1)

In [102]:
# colors for diff entity
colors= {'ORG':'red'}
options1= {'ents':['PRODUCT', 'ORG'], 'colors':colors }

In [103]:
displacy.render(docy,style='ent', jupyter=True, options=options1)

In [104]:
# Radial Gradient

colors= {'ORG':'radial-gradient(yellow,green)'}
options1= {'ents':['PRODUCT', 'ORG'], 'colors':colors }
displacy.render(docy,style='ent', jupyter=True, options=options1)

In [105]:
# Linear Gradient

colors= {'ORG':'linear-gradient(yellow,green,red)'}
options1= {'ents':['PRODUCT', 'ORG'], 'colors':colors }
displacy.render(docy,style='ent', jupyter=True, options=options1)