<h2 align='center'>NLP Tutorial: Named Entity Recognition (NER)</h2>

In [1]:
import spacy

In [5]:
nlp = spacy.load("en_core_web_sm")
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [3]:
doc = nlp("Tesla Inc is going to acquire twitter for $45 billion")
for ent in doc.ents:
    print(ent.text, " | ", ent.label_, " | ", spacy.explain(ent.label_))

Tesla Inc  |  ORG  |  Companies, agencies, institutions, etc.
$45 billion  |  MONEY  |  Monetary values, including unit


In [4]:
from spacy import displacy

displacy.render(doc, style="ent")

<h3>List down all the entities</h3>

In [6]:
nlp.pipe_labels['ner']

['CARDINAL',
 'DATE',
 'EVENT',
 'FAC',
 'GPE',
 'LANGUAGE',
 'LAW',
 'LOC',
 'MONEY',
 'NORP',
 'ORDINAL',
 'ORG',
 'PERCENT',
 'PERSON',
 'PRODUCT',
 'QUANTITY',
 'TIME',
 'WORK_OF_ART']

List of entities are also documented on this page: https://spacy.io/models/en

In [7]:
doc = nlp("Michael Bloomberg founded Bloomberg in 1982")
for ent in doc.ents:
    print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))

Michael Bloomberg | PERSON | People, including fictional
Bloomberg | GPE | Countries, cities, states
1982 | DATE | Absolute or relative dates or periods


Above it made a mistake in identifying Bloomberg the company. Let's try hugging face for this now.

https://huggingface.co/dslim/bert-base-NER?text=Michael+Bloomberg+founded+Bloomberg+in+1982

Here also go through 3 sample examples for NER 

In [8]:
doc = nlp("Tesla Inc is going to acquire Twitter Inc for $45 billion")
for ent in doc.ents:
    print(ent.text, " | ", ent.label_, " | ", ent.start_char, "|", ent.end_char)

Tesla Inc  |  ORG  |  0 | 9
Twitter Inc  |  PERSON  |  30 | 41
$45 billion  |  MONEY  |  46 | 57


<h3>Setting custom entities</h3>

In [9]:
doc = nlp("Tesla is going to acquire Twitter for $45 billion")
for ent in doc.ents:
    print(ent.text, " | ", ent.label_)

Tesla  |  ORG
Twitter  |  PERSON
$45 billion  |  MONEY


In [10]:
s = doc[2:5]
s

going to acquire

In [11]:
type(s)

spacy.tokens.span.Span

In [None]:
from spacy.tokens import Span

s1 = Span(doc, 0, 1, label="ORG")   # custom entities using span => Tesla | ORG
s2 = Span(doc, 5, 6, label="ORG")   # # custom entities using span => Tesla | ORG

doc.set_ents([s1, s2], default="unmodified")

In [13]:
for ent in doc.ents:
    print(ent.text, " | ", ent.label_)

Tesla  |  ORG
Twitter  |  ORG
$45 billion  |  MONEY


In [14]:
help(Span)

Help on class Span in module spacy.tokens.span:

class Span(builtins.object)
 |  A slice from a Doc object.
 |
 |  DOCS: https://spacy.io/api/span
 |
 |  Methods defined here:
 |
 |  __eq__(self, value, /)
 |      Return self==value.
 |
 |  __ge__(self, value, /)
 |      Return self>=value.
 |
 |  __getitem__(...)
 |      Get a `Token` or a `Span` object
 |
 |      i (int or tuple): The index of the token within the span, or slice of
 |          the span to get.
 |      RETURNS (Token or Span): The token at `span[i]`.
 |
 |      DOCS: https://spacy.io/api/span#getitem
 |
 |  __gt__(self, value, /)
 |      Return self>value.
 |
 |  __hash__(self, /)
 |      Return hash(self).
 |
 |  __iter__(...)
 |      Iterate over `Token` objects.
 |
 |      YIELDS (Token): A `Token` object.
 |
 |      DOCS: https://spacy.io/api/span#iter
 |
 |  __le__(self, value, /)
 |      Return self<=value.
 |
 |  __len__(...)
 |      Get the number of tokens in the span.
 |
 |      RETURNS (int): The number of 