### Named Entity Recognition

Involves identification of key information in the text and cassification in the set of predefined categories.
An entity is basically the thing that is consistently talked about or refer to in the text. <B>NER</B> is the form of <b>NLP</b>

In [1]:
import spacy
nlp = spacy.load('en_core_web_sm')



## 1. Defining entity Function

In [2]:
doc = nlp('GFG is an Indian company which provides one of the finest education.')
doc

GFG is an Indian company which provides one of the finest education.

In [3]:
doc.ents

(GFG, Indian)

In [4]:
def show_entities(doc):
    if doc.ents:
        for ent in doc.ents:
            print(ent, '|', ent.label_, '|', spacy.explain(ent.label_))
    else:
        print('No enities found')

show_entities(doc)

GFG | ORG | Companies, agencies, institutions, etc.
Indian | NORP | Nationalities or religious or political groups


In [5]:
show_entities(nlp("I'm not feeling well today."))
show_entities(nlp("I'm not feeling well."))

today | DATE | Absolute or relative dates or periods
No enities found


## 2. Adding new entity - one at a time

In [14]:
from spacy.tokens import Span as sp

doc = nlp('Tesla is one of the biggets giants in the field of electric vehicles')

existing_entites = [ent for ent in doc.ents if ent.start <= 0 < ent.end]

if existing_entites:
    print('Token at position 0 is already part of an entity span.')
    
else:
    new_entity = sp(doc, 0, 1, label = doc.vocab.strings['ORG'])
    doc.ents = list(doc.ents) + [new_entity]
    
show_entities(doc)

Token at position 0 is already part of an entity span.
Tesla | ORG | Companies, agencies, institutions, etc.


## 3. Adding multiple entities at a time

In [24]:
from spacy.matcher import PhraseMatcher

doc = nlp('Playing cricket and football is good for health')

matcher = PhraseMatcher(nlp.vocab)
phrase = ['cricket', 'football']
pattern = [nlp(text) for text in phrase]

matcher.add('Sports', None, *pattern)

show_entities(doc)

No enities found


In [25]:
matcher(doc)

[(9611670226552988807, 1, 2), (9611670226552988807, 3, 4)]