 Named Entity Recognition, also known as NER is a technique used in NLP to identify specific entities such as a person, product, location, money, etc from the text. It has many useful real-life use cases such as document search, recommendations, customer support ticket routing, and many more

In [1]:
import spacy 

nlp = spacy.load("en_core_web_sm")

In [2]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

using 'ner'

In [15]:
doc = nlp("Tesla Inc is going to acquire Twitter Inc for $45 billion")

for ent in doc.ents:
  print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))

Tesla Inc | ORG | Companies, agencies, institutions, etc.
Twitter Inc | ORG | Companies, agencies, institutions, etc.
$45 billion | MONEY | Monetary values, including unit


In [11]:
from spacy import displacy

displacy.render(doc, style= "ent")

'<div class="entities" style="line-height: 2.5; direction: ltr">tesla \n<mark class="entity" style="background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    Inc\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem">ORG</span>\n</mark>\n is going to acquire \n<mark class="entity" style="background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    Twitter Inc\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem">ORG</span>\n</mark>\n for \n<mark class="entity" style="background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    $45 billion\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem">MON

liat down all entities

In [12]:
nlp.pipe_labels['ner']

['CARDINAL',
 'DATE',
 'EVENT',
 'FAC',
 'GPE',
 'LANGUAGE',
 'LAW',
 'LOC',
 'MONEY',
 'NORP',
 'ORDINAL',
 'ORG',
 'PERCENT',
 'PERSON',
 'PRODUCT',
 'QUANTITY',
 'TIME',
 'WORK_OF_ART']

In [14]:
doc= nlp("Michael Bloonberg founded Bloomberg Inc in 1982")

for ent in doc.ents:
  print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))


Michael Bloonberg | PERSON | People, including fictional
Bloomberg Inc | ORG | Companies, agencies, institutions, etc.
1982 | DATE | Absolute or relative dates or periods


[try the same with hugging face
](https://huggingface.co/dslim/bert-base-NER?text=Michael+Bloomberg+founded+Bloomberg+in+1982)

In [17]:
doc = nlp("Tesla is going to acquire Twitter for $45 billion")

for ent in doc.ents:
  print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))


Twitter | ORG | Companies, agencies, institutions, etc.
$45 billion | MONEY | Monetary values, including unit


In [27]:

doc = nlp("Tesla is going to acquire Twitter for $45 billion")

for ent in doc.ents:
    print(ent.text, " | ", ent.label_, " | ", ent.start_char, "|", ent.end_char)

Twitter  |  ORG  |  26 | 33
$45 billion  |  MONEY  |  38 | 49


etting custom entities

In [28]:
doc[2:5]

going to acquire

In [29]:
type(doc[2:5])

spacy.tokens.span.Span

In [30]:
from spacy.tokens import Span

s1 = Span(doc, 0, 1, label= "ORG")
s2 = Span(doc, 5, 6, label= "ORG")

doc.set_ents([s1, s2], default= "unmodified")

In [31]:
for ent in doc.ents:
  print(ent.text, "|", ent.label_)

Tesla | ORG
Twitter | ORG
$45 billion | MONEY
