<a href="https://colab.research.google.com/github/asifahsaan/NLP-Natural-Language-Processing-/blob/main/Named_Entity_Recognition_(NER)_Spacy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **NAMED ENTITY RECOGNITION (NER) with Spacy**

In [3]:
import spacy

textModel  = spacy.load("en_core_web_sm")

In [4]:
textModel.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [6]:
doc = textModel("NextBridge Inc in the new trending company in Pakistan that has more than 600 employees and earning $10 Million")

for ent in doc.ents:
  print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))

NextBridge Inc | ORG | Companies, agencies, institutions, etc.
Pakistan | GPE | Countries, cities, states
more than 600 | CARDINAL | Numerals that do not fall under another type
$10 Million | MONEY | Monetary values, including unit


### **Visual representation of NER**

In [None]:
from spacy import displacy
displacy.render(doc, style="ent")

<div class="entities" style="line-height: 2.5; direction: ltr">
<mark class="entity" style="background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">
    NextBridge Inc
    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem">ORG</span>
</mark>
 in the new trending company in
<mark class="entity" style="background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">
    Pakistan
    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem">GPE</span>
</mark>
 that has
<mark class="entity" style="background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">
    more than 600
    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem">CARDINAL</span>
</mark>
 employees and earning
<mark class="entity" style="background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">
    $10 Million
    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem">MONEY</span>
</mark>
</div>

### **NER Functionalities of Spacy**

*Check which NER functionalities Spacy provide*

In [8]:
textModel.pipe_labels['ner']

['CARDINAL',
 'DATE',
 'EVENT',
 'FAC',
 'GPE',
 'LANGUAGE',
 'LAW',
 'LOC',
 'MONEY',
 'NORP',
 'ORDINAL',
 'ORG',
 'PERCENT',
 'PERSON',
 'PRODUCT',
 'QUANTITY',
 'TIME',
 'WORK_OF_ART']

### ***Issue in SPACY NER***

In [11]:
doc = textModel("Michael Bloomberg founded Bloomberg in 1982")

for ent in doc.ents:
  print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))

Michael Bloomberg | PERSON | People, including fictional
Bloomberg | PERSON | People, including fictional
1982 | DATE | Absolute or relative dates or periods


See, Spacy doesn't work so well yet, Because it predict Bloomberg as PERSON, althogh it is a COMPANY. Spacy use some sort of RULES to indentify the things. Let's try to add Inc with Company.

### ***Rule based Small Solution***

In [12]:
doc = textModel("Michael Bloomberg founded Bloomberg Inc in 1982")

for ent in doc.ents:
  print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))

Michael Bloomberg | PERSON | People, including fictional
Bloomberg Inc | ORG | Companies, agencies, institutions, etc.
1982 | DATE | Absolute or relative dates or periods


It has predicted the Bloomberg as Company now. In the same way if a company's name isn't starting with Capital letter it won't understand it as company, let's see.

In [14]:
doc = textModel("Michael Bloomberg founded bloomberg in 1982")

for ent in doc.ents:
  print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))

Michael Bloomberg | PERSON | People, including fictional
1982 | DATE | Absolute or relative dates or periods


So what that means is, Inc is really important to add next to the company if you want to be recognized it as company. Regardless of Initial in Capital or not, See below.

In [15]:
doc = textModel("Michael Bloomberg founded bloomberg Inc in 1982")

for ent in doc.ents:
  print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))

Michael Bloomberg | PERSON | People, including fictional
bloomberg Inc | ORG | Companies, agencies, institutions, etc.
1982 | DATE | Absolute or relative dates or periods


## **Custom Training the Model**

In [16]:
doc = textModel("Tesla is going to acquire Twitter for $45 Billion")

for ent in doc.ents:
  print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))

Tesla | ORG | Companies, agencies, institutions, etc.
Twitter | PRODUCT | Objects, vehicles, foods, etc. (not services)
$45 Billion | MONEY | Monetary values, including unit


In [32]:
doc = textModel("Asif is going to acquire neflogix for $5 Billion")

for ent in doc.ents:
  print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))

neflogix | CARDINAL | Numerals that do not fall under another type
$5 Billion | MONEY | Monetary values, including unit


As we can see, Asif isn't recognized and Neflogix predicted wrong, because it's a company but it doesn't know and we don't want to use Inc.

***To do that we will use the SPAN tokens for Spacy to add new values.***

In [24]:
type(doc[2:5])

spacy.tokens.span.Span

In [34]:
from spacy.tokens import Span

t1 = Span(doc, 0,1, label="PERSON")
t2 = Span(doc, 5,6, label="ORG")

#set_ents add value
doc.set_ents([t1, t2], default="unmodified")

Now check the above example after training the model

In [35]:
for ent in doc.ents:
  print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))

Asif | PERSON | People, including fictional
neflogix | ORG | Companies, agencies, institutions, etc.
$5 Billion | MONEY | Monetary values, including unit


OUR MODEL IS TRAINED AND IS NOW RECOGNIZING THE ENTITIES WITHOUT ANY ISSUE.