# Name Entity Recognition / Detection

- Named Entity Recognition (NER) is a standard NLP problem which involves spotting named entities (people, places, organizations etc.) from a chunk of text, and classifying them into a predefined set of categories.:
- Quickly retrieving geographical locations talked about in Twitter posts.
- Extract Names
- Extract Organization names


USE - 
- Classify or categorize content by relevent tags
- for information extraction
- for content recommandation 

In [2]:
import spacy

In [3]:
nlp=spacy.load("en_core_web_sm")

In [5]:
text = nlp("Dhoni was born in 7 July 1987 at Ranchi and Now his income is in millions")

In [6]:
text

Dhoni was born in 7 July 1987 at Ranchi and Now his income is in millions

In [7]:
for word in text.ents:
    print(word.text)

Dhoni
7 July 1987
Ranchi
millions


In [9]:
for word in text.ents:
    print(word.text,word.label_)

Dhoni PERSON
7 July 1987 DATE
Ranchi GPE
millions CARDINAL


In [10]:
spacy.explain("GPE")

'Countries, cities, states'

In [11]:
spacy.explain("CARDINAL")

'Numerals that do not fall under another type'

In [13]:
#function
def entity_find(doc):
    for word in doc.ents:
        print(word.text,word.label_)

In [14]:
entity_find(text)

Dhoni PERSON
7 July 1987 DATE
Ranchi GPE
millions CARDINAL


### ADD NEW ENTITY

In [15]:
# assume TGT as an Organization
document = nlp("as per in TGT news price of Tesla cybertruck  is expected to be Rs. 50.7 Lack in India .")

In [16]:
entity_find(document)

Tesla NORP
50.7 CARDINAL
India GPE


In [17]:
from spacy.tokens import Span

In [19]:
new_ent = Span(document,3,4,label= "ORG") #docname,start,ent,label=?

In [20]:
document.ents = list(document.ents)+[new_ent]

In [21]:
entity_find(document)

TGT ORG
Tesla NORP
50.7 CARDINAL
India GPE


#### To DISPLAY

In [22]:
from spacy import displacy

In [23]:
displacy.render(document,style='ent',jupyter=True)

#  MULTIPLE VALUES ASSIGN in NER

In [3]:
import spacy

In [4]:
nlp = spacy.load("en_core_web_sm")

In [5]:
doc = nlp("Camera and micro phones are importent part of recording, there are various type of micro-phones available in market.")

In [6]:
doc

Camera and micro phones are importent part of recording, there are various type of micro-phones available in market.

In [7]:
def entity_find(doc):
    for word in doc.ents:
        print(word.text,word.label_)

In [8]:
entity_find(doc)#empty

In [9]:
from spacy.matcher import PhraseMatcher

In [10]:
matcher = PhraseMatcher(nlp.vocab)

In [11]:
phrase_to_find = ["micro phones","micro-phones"]

In [12]:
pattern = []

In [13]:
for text in phrase_to_find:
    pattern.append(nlp(text))

In [15]:
pattern #pattern created

[micro phones, micro-phones]

In [16]:
#adding patterns to matcher

In [17]:
matcher.add("NewPattern",None,*pattern)

In [18]:
total_matches = matcher(doc)

In [19]:
total_matches

[(17108370319406582427, 2, 4), (17108370319406582427, 15, 18)]

In [22]:
PROD = doc.vocab.strings[u"PRODUCT"]

In [24]:
from spacy.tokens import Span

In [25]:
new_ents = [Span(doc,match[1],match[2],label=PROD) for match in total_matches]

In [26]:
doc.ents = list(doc.ents)+new_ents

In [27]:
entity_find(doc)

micro phones PRODUCT
micro-phones PRODUCT
