# <div class='alert alert-success'> Named Entity Recognition</div>

**The named entity recognition (NER) is one of the most data preprocessing task. It involves the identification of key information in the text and classification into a set of predefined categories. An entity is basically the thing that is consistently talked about or refer to in the text.**<br><br>

**NER is the form of NLP.**

In [1]:
import spacy

In [2]:
nlp=spacy.load('en_core_web_sm')

In [63]:
import warnings

In [64]:
warnings.filterwarnings('ignore')

# <font color='blue'>The main Function</font>

In [4]:
def show_entities(d):
    if d.ents:
        for ent in d.ents:
            print(f"{ent.text:{15}}{ent.label_:{15}}{spacy.explain(ent.label_)}")
    else:
        print("No entities found ")

**Lets try now with different texts**

**1)**

In [5]:
d=nlp(u"GFG is an Indian company which provides one of the finest education")

In [7]:
show_entities(d)

GFG            ORG            Companies, agencies, institutions, etc.
Indian         NORP           Nationalities or religious or political groups
one            CARDINAL       Numerals that do not fall under another type


**2)**

In [8]:
d=nlp(u"I am not feeling well today")

In [9]:
show_entities(d)

today          DATE           Absolute or relative dates or periods


**3)**

In [10]:
d=nlp(u"Lets go to play")

In [11]:
show_entities(d)

No entities found 


**4)**

In [12]:
d=nlp(u"We will be going to Thailand on 7th August")

In [13]:
show_entities(d)

Thailand       GPE            Countries, cities, states
7th            ORDINAL        "first", "second", etc.
August         DATE           Absolute or relative dates or periods


### How would we add a new entity

In [31]:
d=nlp(u"Tesla is earning money at an extensive rate")

In [32]:
show_entities(d)

No entities found 


**But here  `Tesla` should have been categorised as ORG**

## <font color='purple'> So now lets add it </font>

In [33]:
from spacy.tokens import Span as ss

In [34]:
ORG=d.vocab.strings[u"ORG"]

In [38]:
new_entity=ss(d,0,1,label=ORG)
#0,1 is the index position of Tesla in the text

In [39]:
d.ents=list(d.ents)+[new_entity]
#Here we are adding it to our entities list

**Now lets check again**

In [40]:
d

Tesla is earning money at an extensive rate

In [41]:
show_entities(d)

Tesla          ORG            Companies, agencies, institutions, etc.


**Added  :)**

### Lets see how to add multiple words as an entity

We are doing this code just to show how to add multiple entities although the words don't act as entities in normal scenario

In [42]:
d=nlp(u"Playing cricket and football are both good for health")

In [43]:
show_entities(d)

No entities found 


**Now we will add cricket and football**

### <font color='purple'> Lets see now how we add it </font>

In [44]:
from spacy.matcher import PhraseMatcher

In [45]:
m=PhraseMatcher(nlp.vocab)

In [46]:
phrase=['football','cricket']

In [49]:
patterns=[nlp(text) for text in phrase]
#This is a list Comphrehension

In [52]:
m.add('sports',None,*patterns)
#Press Shift Tab to check the parameters

In [56]:
found=m(d)

In [57]:
found

[(13020240661013469444, 1, 2), (13020240661013469444, 3, 4)]

This shows that we have found cricket and football and the indexes of those are also given

In [53]:
from spacy.tokens import Span as ss

In [54]:
d

Playing cricket and football are both good for health

In [58]:
sport=d.vocab.strings[u"Sports"]

In [59]:
new_ents=[ss(d,match[1],match[2],label=sport) for match in found]

In [60]:
d.ents=list(d.ents)+new_ents

### Now lets try

In [65]:
d

Playing cricket and football are both good for health

In [66]:
show_entities(d)

cricket        Sports         None
football       Sports         None


**<font color='blue'>So we have made our own entites and given them our desired name as well</font>**

## How to check which type of specific entity is present and how many times they occur

In [67]:
d=nlp(u"GFG has wonderful Data Science course for students")

In [68]:
show_entities(d)

GFG            ORG            Companies, agencies, institutions, etc.
Data Science   ORG            Companies, agencies, institutions, etc.


**<font color='blue'>Lets see now how its done</font>**

In [70]:
[ent for ent in d.ents if ent.label_=='ORG']

[GFG, Data Science]

**To check how many of them are present**

In [71]:
len([ent for ent in d.ents if ent.label_=='ORG'])

2