In [2]:
import spacy
sp = spacy.load('en_core_web_sm')

# Parts of Speech (POS) Tagging
Parts of speech tagging simply refers to assigning parts of speech to individual words in a sentence, which means that, unlike phrase matching, which is performed at the sentence or multi-word level, parts of speech tagging is performed at the token level.

Let's take a very simple example of parts of speech tagging.

In [4]:
sen = sp(u"spend")
for word in sen:
    print(f'{word.text:{12}} {word.lemma_:{10}} {word.pos_:{10}} {word.tag_:{8}} {spacy.explain(word.tag_)}')

spend        spend      VERB       VB       verb, base form


In [2]:
wor = [
        "how honest are my customer",
        "how loyal are my customer",
        "how sincere are my customer",
        "how faithful are my customer",
        "customer trust",
        "customer reliability",
        "customer devotion",
        "customer trustiness",
        "customer faith",
        "customer integrity",
        ]

h = []
for a in wor:
    doc = sp(a)
    lemma_text = []
    for word in doc:
        lemma_text.append(word.lemma_)

    separator: str = " "
    lemma_text: str = separator.join(lemma_text)
    
    h.append(lemma_text)
    
h

['how honest be -PRON- customer',
 'how loyal be -PRON- customer',
 'how sincere be -PRON- customer',
 'how faithful be -PRON- customer',
 'customer trust',
 'customer reliability',
 'customer devotion',
 'customer trustiness',
 'customer faith',
 'customer integrity']

In [20]:
sens = sp(u"how loyal be my customers")
sen_pos = {}
for word in sens:
    sen_pos[word.lemma_] = f'{word.pos_} {word.tag_} {spacy.explain(word.tag_)} {word.lemma_}'
sen_pos

{'how': 'ADV WRB wh-adverb how',
 'loyal': 'ADJ JJ adjective loyal',
 'be': 'AUX VB verb, base form be',
 '-PRON-': 'DET PRP$ pronoun, possessive -PRON-',
 'customer': 'NOUN NNS noun, plural customer'}

In [18]:
dict(sen_pos, **hey) == sen_pos

True

In [5]:
from spacy import displacy

sent = sp(u"fine, good, sad, joyful, excellent, greate, okay, ok, cool, well, better, happy, excited, good health")
for word in sent:
    if word.tag_ in ('JJ'):
        print(word.tag_, word)
#     print(f'{word.text:{12}} {word.pos_:{10}} {word.tag_:{8}} {spacy.explain(word.tag_)}')
    
# displacy.render(sent, style='dep', jupyter=True, options={'distance': 85})
# displacy.render(sen, style='ent', jupyter=True)

JJ fine
JJ good
JJ sad
JJ joyful
JJ excellent
JJ greate
JJ cool
JJ happy
JJ excited
JJ good


# Why POS Tagging is Useful?
POS tagging can be really useful, particularly if you have words or tokens that can have multiple POS tags. For instance, the word "google" can be used as both a noun and verb, depending upon the context. While processing natural language, it is important to identify this difference. Fortunately, the spaCy library comes pre-built with machine learning algorithms that, depending upon the context (surrounding words), it is capable of returning the correct POS tag for the word.

Let's see this in action. Execute the following script:

In [6]:
sen = sp(u'can you google it')
word = sen[2]
print(f'{word.text:{12}} {word.pos_:{10}} {word.tag_:{8}} {spacy.explain(word.tag_)}')

google       VERB       VB       verb, base form


In [7]:
sen = sp(u'can you search it on google?')
word = sen[-2]

print(f'{word.text:{12}} {word.pos_:{10}} {word.tag_:{8}} {spacy.explain(word.tag_)}')

google       PROPN      NNP      noun, proper singular


# Finding the Number of POS Tags
You can find the number of occurrences of each POS tag by calling the count_by on the spaCy document object. The method takes spacy.attrs.POS as a parameter value.

In [8]:
sen = sp(u"I like to play football. I hated it in my childhood though")
num_pos = sen.count_by(spacy.attrs.POS)
num_pos

{95: 3, 100: 3, 94: 1, 92: 2, 97: 1, 85: 1, 90: 1, 98: 1}

In the output, you can see the ID of the POS tags along with their frequencies of occurrence. The text of the POS tag can be displayed by passing the ID of the tag to the vocabulary of the actual spaCy document.

In [9]:
for k,v in sorted(num_pos.items()):
    print(f'{k}. {sen.vocab[k].text:{8}}: {v}')

85. ADP     : 1
90. DET     : 1
92. NOUN    : 2
94. PART    : 1
95. PRON    : 3
97. PUNCT   : 1
98. SCONJ   : 1
100. VERB    : 3


# Visualizing Parts of Speech Tags
Visualizing POS tags in a graphical way is extremely easy. The displacy module from the spacy library is used for this purpose. To visualize the POS tags inside the Jupyter notebook, you need to call the render method from the displacy module and pass it the spacy document, the style of the visualization, and set the jupyter attribute to True as shown below:

In [10]:
from spacy import displacy

sen = sp(u"I like to play football. I hated it in my childhood though")
displacy.render(sen, style='dep', jupyter=True, options={'distance': 85})

# Named Entity Recognition
Named entity recognition refers to the identification of words in a sentence as an entity e.g. the name of a person, place, organization, etc. Let's see how the spaCy library performs named entity recognition. Look at the following script:

In [15]:
import spacy
sp = spacy.load('en_core_web_sm')

sen = sp('what time is it in cameroon')
print(sen.ents)

()


You can see that three named entities were identified. To see the detail of each named entity, you can use the text, label, and the spacy.explain method which takes the entity object as a parameter.

In [16]:
for entity in sen.ents:
    print(entity.text + ' - ' + entity.label_ + ' - ' + str(spacy.explain(entity.label_)))

## Visualizing Named Entities
Like the POS tags, we can also view named entities inside the Jupyter notebook as well as in the browser.

To do so, we will again use the displacy object. Look at the following example:

In [22]:
from spacy import displacy

sen = sp(u'What time is it in Google')
displacy.render(sen, style='ent', jupyter=True)

In [16]:
filter = {'ents': ['PERSON']}
displacy.render(sen, style='ent', jupyter=True, options=filter)