### Named Entity Recognition:
#### We will find the entities of the words in a given doc using spacy.
#### Spacy has a additional feature where we can add words to the entity list
#### Lets check


### Important Points:


### Some of NER Tags are 
## NER Tags
Tags are accessible through the `.label_` property of an entity.
<table>
<tr><th>TYPE</th><th>DESCRIPTION</th><th>EXAMPLE</th></tr>
<tr><td>`PERSON`</td><td>People, including fictional.</td><td>*Fred Flintstone*</td></tr>
<tr><td>`NORP`</td><td>Nationalities or religious or political groups.</td><td>*The Republican Party*</td></tr>
<tr><td>`FAC`</td><td>Buildings, airports, highways, bridges, etc.</td><td>*Logan International Airport, The Golden Gate*</td></tr>
<tr><td>`ORG`</td><td>Companies, agencies, institutions, etc.</td><td>*Microsoft, FBI, MIT*</td></tr>
<tr><td>`GPE`</td><td>Countries, cities, states.</td><td>*France, UAR, Chicago, Idaho*</td></tr>
<tr><td>`LOC`</td><td>Non-GPE locations, mountain ranges, bodies of water.</td><td>*Europe, Nile River, Midwest*</td></tr>
<tr><td>`PRODUCT`</td><td>Objects, vehicles, foods, etc. (Not services.)</td><td>*Formula 1*</td></tr>
<tr><td>`EVENT`</td><td>Named hurricanes, battles, wars, sports events, etc.</td><td>*Olympic Games*</td></tr>
<tr><td>`WORK_OF_ART`</td><td>Titles of books, songs, etc.</td><td>*The Mona Lisa*</td></tr>
<tr><td>`LAW`</td><td>Named documents made into laws.</td><td>*Roe v. Wade*</td></tr>
<tr><td>`LANGUAGE`</td><td>Any named language.</td><td>*English*</td></tr>
<tr><td>`DATE`</td><td>Absolute or relative dates or periods.</td><td>*20 July 1969*</td></tr>
<tr><td>`TIME`</td><td>Times smaller than a day.</td><td>*Four hours*</td></tr>
<tr><td>`PERCENT`</td><td>Percentage, including "%".</td><td>*Eighty percent*</td></tr>
<tr><td>`MONEY`</td><td>Monetary values, including unit.</td><td>*Twenty Cents*</td></tr>
<tr><td>`QUANTITY`</td><td>Measurements, as of weight or distance.</td><td>*Several kilometers, 55kg*</td></tr>
<tr><td>`ORDINAL`</td><td>"first", "second", etc.</td><td>*9th, Ninth*</td></tr>
<tr><td>`CARDINAL`</td><td>Numerals that do not fall under another type.</td><td>*2, Two, Fifty-two*</td></tr>
</table>

In [2]:
import spacy
nlp= spacy.load('en_core_web_sm')

In [2]:
### Lets take an example doc
doc=nlp(u"Mukesh ambani is the CEO of Reliance")

In [3]:
doc.text

'Mukesh ambani is the CEO of Reliance'

#### To find the entities we have to convert the doc to entity first and then we have to find it.
#### Doc.ent is a iterator so we will loop it


In [10]:
for token in doc.ents:
    print(f'{token.text}: {token.label_} - {spacy.explain(token.label_)}')

Reliance: ORG - Companies, agencies, institutions, etc.


### So from the above doc spacy has recognised only Reliance as org.

In [18]:
doc = nlp(u"Google sold its product google-glass for very cheaper price which is 200 dollars")

In [19]:
doc.text

'Google sold its product google-glass for very cheaper price which is 200 dollars'

In [21]:
for token in doc.ents:
    print(f'{token.text}: {token.label_} - {spacy.explain(token.label_)}')

Google: ORG - Companies, agencies, institutions, etc.
200 dollars: MONEY - Monetary values, including unit


#### We can get the start and end values of the text
### Some of the entities are given below
## Entity annotations
`Doc.ents` are token spans with their own set of annotations.
<table>
<tr><td>`ent.text`</td><td>The original entity text</td></tr>
<tr><td>`ent.label`</td><td>The entity type's hash value</td></tr>
<tr><td>`ent.label_`</td><td>The entity type's string description</td></tr>
<tr><td>`ent.start`</td><td>The token span's *start* index position in the Doc</td></tr>
<tr><td>`ent.end`</td><td>The token span's *stop* index position in the Doc</td></tr>
<tr><td>`ent.start_char`</td><td>The entity text's *start* index position in the Doc</td></tr>
<tr><td>`ent.end_char`</td><td>The entity text's *stop* index position in the Doc</td></tr>
</table>




In [22]:
for token in doc.ents:
    print(f'{token.text}: {token.label_} - {spacy.explain(token.label_)}')
    print(f'{token.start} {token.end}')

Google: ORG - Companies, agencies, institutions, etc.
0 1
200 dollars: MONEY - Monetary values, including unit
13 15


In [32]:
doc[4:7]

google-glass

### As we fond that spacy could not able to recognised google-glass as a product
#### Now lets add that to one of the entity.
#### LEts add Google-glass to product entity
#### This can be done using span module

In [23]:
from spacy.tokens import Span

In [24]:
### Now lets get the hash value of entity product
prod= doc.vocab.strings[u'PRODUCT']
prod

386

In [34]:
### Using span we will push google glass to product entity
new_ent= Span(doc,4, 7, prod)

In [36]:
new_ent.label_

'PRODUCT'

In [37]:
### Now we will add this entity to doc
doc.ents= list(doc.ents) + [new_ent]

In [38]:
### NOw lets see updated entities
doc.ents

(Google, google-glass, 200 dollars)

In [40]:
for token in doc.ents:
    print(f'{token.text}: {token.label_} - {spacy.explain(token.label_)}')
    print(f'{token.start} {token.end}')

Google: ORG - Companies, agencies, institutions, etc.
0 1
google-glass: PRODUCT - Objects, vehicles, foods, etc. (not services)
4 7
200 dollars: MONEY - Monetary values, including unit
13 15


In [52]:
doc2 = nlp(u'Originally priced at $29.50,\nthe sweater was marked down to five dollars.')

doc2.ents

(29.50, five dollars)

In [53]:
for sentence in doc2.sents:
    print(sentence)

Originally priced at $29.50,
the sweater was marked down to five dollars.


In [55]:
doc2.ents

(29.50, five dollars)

In [59]:
for tokens in doc.noun_chunks:
    print(tokens)
    #print(tokens.root.text)

Google
its product google-glass
very cheaper price
200 dollars


In [3]:
from spacy.tokens import Token
fruit_getter = lambda token: token.text in ("apple", "pear", "banana")
Token.set_extension("is_fruit", getter=fruit_getter)
doc = nlp("I have an apple")
assert doc[3]._.is_fruit

ValueError: [E090] Extension 'is_fruit' already exists on Token. To overwrite the existing extension, set `force=True` on `Token.set_extension`.

In [16]:
doc = nlp("i am going to apple company")
doc[3]._.is_fruit

False