# Named Entity Recognition (NER) in Spacy Library
### What is Named Entity?
A named entity is a proper noun that refers to a specific entity like location, person, organization, etc. For example, in the sentence “Joe Biden is the president of the United States”, Joe Biden, and United States are named entities.

These are some more examples of named entities –
- Organization - eg; World Bank, Samsung
- Person  -e.g;-Donald Trump, Nelson Mandela
- Money -e.g;- 5 million dollars, INR 4 Crore
- GPE --e.g; -Africa, Australia, South East Asia
- Location -e.g; -Nairobi, Lake Victoria
- Date -e.g; -12th April 1998, 7 AUG
- Time -e.g; -9:30 P.M., Four-thirty am

### What is Named Entity Recognition (NER)
In NLP, named entity recognition or NER is the process of identifying named entities. NER is useful in areas like information retrieval, content classification, question and answer system, etc.

The operation of named entity recognition is a two-step process:- 
1. First POS (Part of Speech) tagging this done. 
2. Based on POS tagging, the named entities are extracted from the text.

### Named Entity Recognition (NER) in Spacy
Performing named entity recognition in Spacy is quite fast and easy. The labels or named entities that Spacy library can recognize include companies, locations, organizations, and products. The Spacy model is pre-trained to recognize these entities, however, we can also add our own arbitrary classes to the entity recognition system, and update the model with new examples.

#### Example 1
In the below example of Spacy NER, we first create a Spacy object and instantiate it with the sample text and assign it to doc variable. The named entities can be simply extracted by iterating over the doc.ent object. In each iteration the entity text is printed by using ent.text and entity label by using ent.label_.

In [3]:
import spacy 
nlp = spacy.load("en_core_web_sm")

doc = nlp("President Biden has announced that the US government through NASA has awarded Elon Musk’s SpaceX a $2.9 billion contract to build the lunar lander.")
for ent in doc.ents:
    print(ent.text,  ent.label_)

Biden PERSON
US GPE
NASA ORG
Elon Musk’s PERSON
$2.9 billion MONEY


#### Example 2
This example is also similar to the above example, but just with a different sample text.

In [4]:
import spacy 
nlp = spacy.load("en_core_web_sm")

doc = nlp("Donald John Trump is an American politician, media personality, and businessman who served as the 45th president of the United States from 2017 to 2021. Trump graduated from the Wharton School of the University of Pennsylvania with a bachelor's degree in 1968")
for ent in doc.ents:
    print(ent.text,  ent.label_)

Donald John Trump PERSON
American NORP
45th ORDINAL
the United States GPE
2017 DATE
2021 CARDINAL
the Wharton School of the University of Pennsylvania ORG
1968 DATE


### Spacy NER Lists
We can get the list of NER in Spacy by using nlp.pipe_labels [‘ner’].

In [5]:
import spacy

nlp = spacy.load("en_core_web_sm")
ner_lst = nlp.pipe_labels['ner']

print(len(ner_lst))
print(ner_lst)

18
['CARDINAL', 'DATE', 'EVENT', 'FAC', 'GPE', 'LANGUAGE', 'LAW', 'LOC', 'MONEY', 'NORP', 'ORDINAL', 'ORG', 'PERCENT', 'PERSON', 'PRODUCT', 'QUANTITY', 'TIME', 'WORK_OF_ART']


### Accessing Entity Annotations and Labels
The standard way to access the entity annotation in Spacy is by using doc.ents which returns a tuple containing all the entities of the doc. The entity type can be accessed as a hash value or as a string type by using ent.label and ent.label_. By using doc.ents we can get a bunch of information about the entities such as

- Entity text by using ent.text,
- Starting and ending character of an entity by using ent.start_char and ent.end_char,
- Entity’s index by using ent.start,
- Entity type’s id by using ent.entid,
- Generate vector norm of an entity by using ent.vector_norm.

In [7]:
import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("Microsoft is looking at acquiring Activision Blizzard Inc for $68.7 billion")

for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_, ent.start, ent.ent_id_, ent.label, ent.vector_norm)

Microsoft 0 9 ORG 0  383 9.452225
Activision Blizzard Inc 34 57 ORG 5  383 7.3945355
$68.7 billion 62 75 MONEY 9  394 8.186131


However, we can also access the entity annotation by using the token.ent_iob and token.ent_type attributes. The token.ent_iob returns three tags ‘B’, ‘I’ and ‘O’. ‘B’ means the token begins an entity, ‘I’ means it is inside an entity, ‘O’ means it is outside an entity that is no entity tag is set for this token and will return an empty string “”.

In [9]:
import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("Nairobi is the most populous city in Kenya")

for token in doc:
    print(token.text, token.ent_iob_, token.ent_type_)

Nairobi B GPE
is O 
the O 
most O 
populous O 
city O 
in O 
Kenya B GPE


### Adding New Named Entities in Spacy
The Spacy library has provided a feature to set entity annotation at the document level. However, this can’t be written directly to the token.ent_iob or token.ent_type attributes. Setting entities can be done by various methods listed below.

#### Method 1 :
Creating a new entity as a span and assigning it to the doc.ents by using doc.set_ents function. Keep in mind that we can set only those entities which are not previously defined. Otherwise can cause an error as “Trying to set conflicting doc.ents”

In the below example, the default Spacy model does not recognize Facebook as an entity. We then create a new span for the Facebook entity and then subsequently it starts recognizing it.

In [15]:
import spacy
from spacy.tokens import Span

nlp = spacy.load("en_core_web_sm")
doc = nlp("facebook was founded by Mark Zuckerberg and his fellow roommates at Harvard College")
ents = [(e.text, e.start_char, e.end_char, e.label_) for e in doc.ents]
print('Before : ', ents)


Before :  [('Mark Zuckerberg', 24, 39, 'PERSON'), ('Harvard College', 68, 83, 'ORG')]


In [16]:
# The model didn't recognize 'facebook' as an entity
# Creating a span for the new entity
facebook_ent = Span(doc, 0, 1, label="ORG")
doc.set_ents([facebook_ent], default="unmodified")

#printing the new entity list
ents = [(e.text, e.start, e.end, e.label_) for e in doc.ents]
print('After : ', ents)

After :  [('facebook', 0, 1, 'ORG'), ('Mark Zuckerberg', 4, 6, 'PERSON'), ('Harvard College', 11, 13, 'ORG')]


#### Method 2:
We create a new list of a span of entities and concatenate it with the original doc.ents list.

In [18]:
import spacy
from spacy.tokens import Span

nlp = spacy.load("en_core_web_sm")
doc = nlp("facebook was founded by Mark Zuckerberg and his fellow roommates at Harvard College")
ents = [(e.text, e.start_char, e.end_char, e.label_) for e in doc.ents]
print('Before : ', ents)


Before :  [('Mark Zuckerberg', 24, 39, 'PERSON'), ('Harvard College', 68, 83, 'ORG')]


In [19]:
# The model didn't recognize 'facebook' as an entity
# Creating a span for the new entity
facebook_ent = Span(doc, 0, 1, label="ORG")

orig_ents = list(doc.ents)
doc.ents = orig_ents + [facebook_ent] 

# Printing the new entity list
ents = [(e.text, e.start, e.end, e.label_) for e in doc.ents]
print('After : ', ents)

After :  [('facebook', 0, 1, 'ORG'), ('Mark Zuckerberg', 4, 6, 'PERSON'), ('Harvard College', 11, 13, 'ORG')]


#### Method 3:
We create a NumPy array of zeros of size (length of doc * 2) to store the entity iob and entity type and assign new entities. In this example, we are assigning “London” and “U.K.” as “GPE”.

In [20]:
import numpy
import spacy
from spacy.attrs import ENT_IOB, ENT_TYPE

nlp = spacy.load("en_core_web_sm")
doc = nlp.make_doc("Nairobi is a big city in Africa")
ents = [(e.text, e.start, e.end, e.label_) for e in doc.ents]
print('Before :', ents) # []

header = [ENT_IOB, ENT_TYPE]
attr_array = numpy.zeros((len(doc), len(header)), dtype="uint64")

attr_array[0, 0] = 3  # B
attr_array[0, 1] = doc.vocab.strings["GPE"]

attr_array[7:, 0] = 3  # B
attr_array[7:, 1] = doc.vocab.strings["GPE"]
doc.from_array(header, attr_array)

ents = [(e.text, e.start, e.end, e.label_) for e in doc.ents]
print('After :', ents)

Before : []
After : [('Nairobi', 0, 1, 'GPE')]


### Visualizing Named Entities in Spacy
We can use the displacy function provided by the spacy library to display a nice visualization of entities of doc objects.

In [24]:
import spacy
from spacy import displacy

text = "The Expat City Ranking 2021 also showed that Nairobi ranks first in Africa for both the general friendliness of its local residents and friendliness towards foreign residents in particular."

nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
displacy.serve(doc, style="ent")