# Import Libraries

In [1]:
import spacy

In [2]:
m = spacy.load('en_core_web_sm')

# Examples

In [3]:
example1 = "Allen is here in Bangalore, India. One of the major uses cases of named entity recognition involves automating the recommendation process. Recommendation systems dominate how we discover new content and ideas in today’s world. The example of Netflix shows that developing an effective recommendation system can work wonders for the fortunes of a media company by making their platforms more engaging and event addictive. For news publishers, using Named Entity Recognition to recommend similar articles is a proven approach. The below example from BBC news shows how recommendations for similar articles are implemented in real life. This can be done by extracting entities from a particular article and recommending the other articles which have the most similar entities mentioned in them. This is an approach that we have effectively used to develop content recommendations for a media industry client."

In [9]:
example2 = "To install additional data tables for lemmatization in spaCy v2.2+ you can run pip install spacy[lookups] or install spacy-lookups-data separately. The lookups package is needed to create blank models with lemmatization data, and to lemmatize in languages that don’t yet come with pretrained models and aren’t powered by third-party libraries."

In [11]:
example3 = "Now, if you pass it through the Named Entity Recognition API, it pulls out the entities Bandra (location) and Fitbit (Product). This can be then used to categorize the complaint and assign it to the relevant department within the organization that should be handling this."

# Labels

In [7]:
doc = m(example1)
for ent in doc.ents:
    print(ent.text, "-", ent.label_)

Allen - PERSON
Bangalore - GPE
India - GPE
One - CARDINAL
today - DATE
Netflix - PERSON
Named Entity Recognition - ORG
BBC news - ORG


In [10]:
doc = m(example2)
for ent in doc.ents:
    print(ent.text, "-", ent.label_)

third - ORDINAL


In [12]:
doc = m(example3)
for ent in doc.ents:
    print(ent.text, "-", ent.label_)

the Named Entity Recognition - ORG
Bandra - PERSON


# Word of Art & FAC

In [16]:
woa = m("Smooth Criminal")
for ent in woa.ents:
    print(ent.text, "-", ent.label_)

In [26]:
fac = m("Tibetan Plateau and Asia")
for ent in fac.ents:
    print(ent.text, "-", ent.label_)

Tibetan - NORP


# Querying

In [29]:
locs = [('Omnicom', 'IN', 'New York'),('DDB Needham', 'IN', 'New York'),('Kaplan Thaler Group', 'IN', 'New York'),('BBDO South', 'IN', 'Atlanta'), ('Georgia-Pacific', 'IN', 'Atlanta')]

In [32]:
query = [e1 for (e1, rel, e2) in locs if e2=='Atlanta']

# Domain Specific Jargon

In [31]:
c = "The ubiquitin-proteasome system is the major pathway for the maintenance of protein homeostasis. Its inhibition causes accumulation of ubiquitinated proteins; this accumulation has been associated with several of the most common neurodegenerative diseases. Several genetic factors have been identified for most neurodegenerative diseases, however, most cases are considered idiopathic, thus making the study of the mechanisms of protein accumulation a relevant field of research. It is often mentioned that the biggest risk factor for neurodegenerative diseases is aging, and several groups have reported an age-related alteration of the expression of some of the 26S proteasome subunits and a reduction of its activity. Proteasome subunits interact with proteins that are known to accumulate in neurodegenerative diseases such as α-synuclein in Parkinson's, tau in Alzheimer's, and huntingtin in Huntington's diseases"

In [33]:
doc = m(c)
for ent in doc.ents:
    print(ent.text, "-", ent.label_)

26S - CARDINAL
Huntington - GPE
