- https://stackabuse.com/python-for-nlp-parts-of-speech-tagging-and-named-entity-recognition/

# Parts of Speech (POS) Tagging

In [1]:
import spacy
sp = spacy.load('en_core_web_sm')

In [2]:
sen = sp(u"I like to play football. I hated it in my childhood though")

In [3]:
type(sen)

spacy.tokens.doc.Doc

In [4]:
print(sen.text)

I like to play football. I hated it in my childhood though


In [5]:
print(type(sen.text))

<class 'str'>


In [6]:
print(sen[7], sen[7].pos_)

hated VERB


In [7]:
print(spacy.explain(sen[7].tag_))

verb, past tense


In [8]:
for word in sen:
    print(f'{word.text:{12}} {word.pos_:{10}} {word.tag_:{8}} {spacy.explain(word.tag_)}')

I            PRON       PRP      pronoun, personal
like         VERB       VBP      verb, non-3rd person singular present
to           PART       TO       infinitival "to"
play         VERB       VB       verb, base form
football     NOUN       NN       noun, singular or mass
.            PUNCT      .        punctuation mark, sentence closer
I            PRON       PRP      pronoun, personal
hated        VERB       VBD      verb, past tense
it           PRON       PRP      pronoun, personal
in           ADP        IN       conjunction, subordinating or preposition
my           DET        PRP$     pronoun, possessive
childhood    NOUN       NN       noun, singular or mass
though       SCONJ      IN       conjunction, subordinating or preposition


In [9]:
sen = sp(u'Can you google it?')
word = sen[2]

print(f'{word.text:{12}} {word.pos_:{10}} {word.tag_:{8}} {spacy.explain(word.tag_)}')

google       VERB       VB       verb, base form


In [10]:
sen = sp(u'You can google it on Google')
word = sen[2]

print(f'{word.text:{12}} {word.pos_:{10}} {word.tag_:{8}} {spacy.explain(word.tag_)}')

google       VERB       VB       verb, base form


In [11]:
sen = sp(u"I like to play football. I hated it in my childhood though")

num_pos = sen.count_by(spacy.attrs.POS)
num_pos

{95: 3, 100: 3, 94: 1, 92: 2, 97: 1, 85: 1, 90: 1, 98: 1}

In [12]:
for k,v in sorted(num_pos.items()):
    print(f'{k}. {sen.vocab[k].text:{8}}: {v}')

85. ADP     : 1
90. DET     : 1
92. NOUN    : 2
94. PART    : 1
95. PRON    : 3
97. PUNCT   : 1
98. SCONJ   : 1
100. VERB    : 3


In [13]:
from spacy import displacy

sen = sp(u"I like to play football. I hated it in my childhood though")
displacy.render(sen, style='dep', jupyter=True, options={'distance': 75})

In [14]:
# displacy.serve(sen, style='dep', options={'distance': 120})

  "__main__", mod_spec)



Using the 'dep' visualizer
Serving on http://0.0.0.0:5000 ...



127.0.0.1 - - [14/Sep/2020 09:29:44] "GET / HTTP/1.1" 200 9473
127.0.0.1 - - [14/Sep/2020 09:29:45] "GET /favicon.ico HTTP/1.1" 200 9473


Shutting down server on port 5000.


# Named Entity Recognition

In [27]:
import spacy
sp = spacy.load('en_core_web_sm')

sen = sp(u'Manchester United is looking to sign Harry Kane for $90 million')

In [28]:
print(sen.ents)

(Manchester United, Harry Kane, $90 million)


In [29]:
for entity in sen.ents:
    print(entity.text + ' - ' + entity.label_ + ' - ' + str(spacy.explain(entity.label_)))

Manchester United - PERSON - People, including fictional
Harry Kane - PERSON - People, including fictional
$90 million - MONEY - Monetary values, including unit


In [37]:
sen = sp(u'Nesfruita is setting up a new company in India and Timor Leste')
for entity in sen.ents:
    print(entity.text + ' - ' + entity.label_ + ' - ' + str(spacy.explain(entity.label_)))

Nesfruita - ORG - Companies, agencies, institutions, etc.
India - GPE - Countries, cities, states


In [38]:
from spacy.tokens import Span

GPE = sen.vocab.strings[u'GPE']
new_entity = Span(sen, 10, 12, label=GPE)
sen.ents = list(sen.ents) + [new_entity]

for entity in sen.ents:
    print(entity.text + ' - ' + entity.label_ + ' - ' + str(spacy.explain(entity.label_)))

Nesfruita - ORG - Companies, agencies, institutions, etc.
India - GPE - Countries, cities, states
Timor Leste - GPE - Countries, cities, states


In [39]:
sen = sp(u'Manchester United is looking to sign Harry Kane for $90 million. David demand 100 Million Dollars')
for entity in sen.ents:
    print(entity.text + ' - ' + entity.label_ + ' - ' + str(spacy.explain(entity.label_)))

Manchester United - PERSON - People, including fictional
Harry Kane - PERSON - People, including fictional
$90 million - MONEY - Monetary values, including unit
David - PERSON - People, including fictional
100 Million Dollars - MONEY - Monetary values, including unit


In [40]:
len([ent for ent in sen.ents if ent.label_=='PERSON'])

3

In [41]:
from spacy import displacy

sen = sp(u'Manchester United is looking to sign Harry Kane for $90 million. David demand 100 Million Dollars')
displacy.render(sen, style='ent', jupyter=True)

In [42]:
filter = {'ents': ['ORG']}
displacy.render(sen, style='ent', jupyter=True, options=filter)

In [43]:
# displacy.serve(sen, style='ent')

  "__main__", mod_spec)



Using the 'ent' visualizer
Serving on http://0.0.0.0:5000 ...



127.0.0.1 - - [14/Sep/2020 09:41:10] "GET / HTTP/1.1" 200 2155
127.0.0.1 - - [14/Sep/2020 09:41:11] "GET /favicon.ico HTTP/1.1" 200 2155


Shutting down server on port 5000.
