# Named Entity Recognition (NER)
spaCy has an **'ner'** pipeline component that identifies token spans fitting a predetermined set of named entities. These are available as the `ents` property of a `Doc` object.

In [2]:
import spacy

In [3]:
nlp = spacy.load('en_core_web_sm')

In [4]:
doc=nlp(u'My name is Dipin Singh. I am currently doing NLP course')

In [5]:
for x in doc:
    print(f'{x} POS is    {x.pos_}       {x.tag_}')

My POS is    ADJ       PRP$
name POS is    NOUN       NN
is POS is    VERB       VBZ
Dipin POS is    PROPN       NNP
Singh POS is    PROPN       NNP
. POS is    PUNCT       .
I POS is    PRON       PRP
am POS is    VERB       VBP
currently POS is    ADV       RB
doing POS is    VERB       VBG
NLP POS is    PROPN       NNP
course POS is    NOUN       NN


In [6]:
## counting the POS tags

In [7]:
pos_counts=doc.count_by(spacy.attrs.POS)
pos_counts

{96: 1, 83: 1, 99: 3, 85: 1, 91: 2, 94: 1, 95: 3}

In [8]:
for key,value in pos_counts.items():
    print(f'{key} is {doc.vocab[key].text:{10}} and {value}')

96 is PUNCT      and 1
83 is ADJ        and 1
99 is VERB       and 3
85 is ADV        and 1
91 is NOUN       and 2
94 is PRON       and 1
95 is PROPN      and 3


In [9]:
from spacy import displacy

In [10]:
displacy.render(docs=doc,jupyter=True)

In [11]:
## Dipin NER

In [23]:
doc3= nlp(u'My name is Dipin Singh. he is data scientist. He stays in Karanataka. He is from chd.')

In [24]:
def check_new(doc):
    if doc.ents:
        for x in doc.ents:
            print(x.text,x.label_,spacy.explain(x.label_))
    else:
        print('No NER')

In [25]:
check_new(doc3)

Dipin Singh PERSON People, including fictional
Karanataka PERSON People, including fictional


In [26]:
## here data scientist and chd are not marked as entity. CREATING custom enrity to add them 

In [31]:
from spacy.matcher import PhraseMatcher
from spacy.tokens import Span
matcher_my=PhraseMatcher(nlp.vocab)

In [32]:
phrase1=['data scientist','chd']
phrase_to_add=[nlp(text) for text in phrase1]

In [33]:
matcher_my.add('adding_new',None,*phrase_to_add)

In [34]:
matches=matcher_my(doc3)
matches

[(8299211061703320173, 8, 10), (8299211061703320173, 19, 20)]

In [35]:
new_ent=[Span(doc3,start[1],start[2],label=doc.vocab.strings[u'LOC']) for start in matches]

In [36]:
doc3.ents = list(doc3.ents)+ new_ent

In [37]:
check_new(doc3)

Dipin Singh PERSON People, including fictional
data scientist LOC Non-GPE locations, mountain ranges, bodies of water
Karanataka PERSON People, including fictional
chd LOC Non-GPE locations, mountain ranges, bodies of water
