# 1) Basics of Named Entity Recognition

Named Entity Recognition is a subtask of information extraction that classify named entities into pre-defined categories such as names of persons, organizations, locations

spaCy features an extremely fast statistical entity recognition system, that assigns labels to contiguous spans of tokens

The default model identifies a variety of named and numeric entities, including companies, locations, organizations and products

In [0]:
# officaial documentation 
# https://spacy.io/usage/linguistic-features/#named-entities


In [0]:
# Import spaCy
import spacy

In [0]:
# load the English language library
nlp = spacy.load(name='en_core_web_sm')

In [0]:
# Create a simple doc object
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")

In [4]:
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_, str(spacy.explain(ent.label_)))

Apple 0 5 ORG Companies, agencies, institutions, etc.
U.K. 27 31 GPE Countries, cities, states
$1 billion 44 54 MONEY Monetary values, including unit


In [0]:
# Create another doc object
doc_2 = nlp("San Francisco considers banning sidewalk delivery robots")

In [6]:
for ent in doc_2.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_, str(spacy.explain(ent.label_)))

San Francisco 0 13 GPE Countries, cities, states


# 2) Adding Named Entity to Span

In [0]:
doc_3 = nlp("facebook is hiring a new vice president in U.S.")

In [8]:
for ent in doc_3.ents:
    print(ent.text, ent.label_, str(spacy.explain(ent.label_)))

U.S. GPE Countries, cities, states


In [0]:
# we will add Facebook as Named Entity as a company

In [0]:
from spacy.tokens import Span

In [10]:
# Get the hash value of ORG entity label
ORG = doc_3.vocab.strings['ORG']
print(ORG)

383


In [0]:
# Create a Span for new entity
new_ent = Span(doc_3, 0, 1, label=ORG)
# Index locations from 0 to 1 (excludes 1)

In [0]:
# Add the entity to the existing Doc object
doc_3.ents = list(doc_3.ents) + [new_ent]

In [13]:
for ent in doc_3.ents:
    print(ent.text, ent.label_, str(spacy.explain(ent.label_)))

facebook ORG Companies, agencies, institutions, etc.
U.S. GPE Countries, cities, states


# 3) Visualizing Named Entities

In [0]:
# Import spaCy
import spacy
# load the English language library
nlp = spacy.load(name='en_core_web_sm')
# Import the displaCy library
from spacy import displacy

In [0]:
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")

In [16]:
displacy.render(docs=doc,style='ent',jupyter=True)

In [17]:
# Viewing Specific Entities
options = {'ents': ['ORG', 'MONEY']}
displacy.render(docs=doc,style='ent',jupyter=True,options=options)