# **Part-of-Speech Tagging**

## **1- Introduction**
In this section, we demonstrate how to implement part-of-speech (POS)
tagging.

POS tagging is used to solve syntactic ambiguity. It adds grammatical word functions and categories to a
given text [[1]](#scrollTo=op-j6UywUt5i). 

In the sentence “Our dogs bark all day,” the word “bark” appears as a verb
(word category) taking the function of the predicate (word function). 

In “The bark of the
old oak tree was wet,” the word “bark” is a noun (word category) in the function of the
subject (word function). This example illustrates that context plays an important role in
POS tagging [[1]](#scrollTo=op-j6UywUt5i).


### **Content**
In this notebook some basic examples for the following topic will be shown:
* Part-Of-Speech (POS) tagging by using spaCy


## **2- Part-Of-Speech (POS) tagging by using spaCy.**

SpaCy is one of the most famous framework for NLP. It can be used for the implementation of tasks for sentiment analysis, chatbots, text summarization, intent and entity extraction, and others [[1]](#scrollTo=op-j6UywUt5i).

More information about spaCy please refer to [[2]](#scrollTo=op-j6UywUt5i).


### **Code Examples**

For POS tagging, we will follow the following steps:
* Import the spaCy library
* Load the language model (English)
* Create a spaCy document
* Access the POS tags by iterating over the document object
* Print the POS tags

**1.** Import spaCy library and English load language model

In [1]:
# Import spaCy library to process the text
## spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python.
## spaCy is designed specifically for production use and helps you build applications that process and “understand” large volumes of text. 
## It can be used to build information extraction or natural language understanding systems, or to pre-process text for deep learning based on [3].
import spacy

# Import "en_core_web_sm" English language model by using spaCy library
## It is a small English pipeline trained on written web text (blogs, news, comments), that includes vocabulary, syntax and entities based on [4].
## It is optimized for CPU and its components are: tok2vec, tagger, parser, senter, ner, attribute_ruler, lemmatizer based on [5].
sp = spacy.load('en_core_web_sm')

**2.** Now, we will create a spaCy document and perform POS tagging.

**NOTE:**  When we create a text, spaCy automatically tokenizes the text to produce a Doc object. The Doc is then processed in several different steps (tokenizer, tagger, parser, ner, etc.). This is also referred to as the processing pipeline.

You can see the processing pipeline in the following picture based on [[6]](#scrollTo=op-j6UywUt5i).

For POS tagging, we will simply use "pos_" attribute of the "Morphologizer" class in the spaCy. For more detail please refer to [[7]](#scrollTo=op-j6UywUt5i).

![spaCy](https://spacy.io/pipeline-fde48da9b43661abcdf62ab70a546d71.svg)

In [2]:
# Create a sample document
## During the document creation process, spaCy will automatically perform POS tagging for the given text.
doc_POS = sp(u"I am going to complete this book by this weekend")

**3.** Print the POS tags

In [3]:
# We will now print each word with its related POS tag.
## For this, we will use "pos_" attribute of the spaCy.
## spaCy predicts the morphological features of a given text.
## These predictions are returned by using the "pos_" attribute based on [7].
for word in doc_POS:
    print(word.text + '-->' + word.pos_)

I-->PRON
am-->AUX
going-->VERB
to-->PART
complete-->VERB
this-->DET
book-->NOUN
by-->ADP
this-->DET
weekend-->NOUN


**4.** If we want, we can also print explanations of the tags.

As you will notice, the readability of the output below is much better than in the previous example (step 3).

We improve the readability and formatting by columns. The numbers in curly brackets indicate the space between columns [[8]](#scrollTo=op-j6UywUt5i).

In [9]:
# We will now print each word with its related POS tag and explanation:
for word in doc_POS:
    print(f'{word.text:{12}} {word.pos_:{10}} {spacy.explain(word.tag_)}')

I            PRON       pronoun, personal
am           AUX        verb, non-3rd person singular present
going        VERB       verb, gerund or present participle
to           PART       infinitival "to"
complete     VERB       verb, base form
this         DET        determiner
book         NOUN       noun, singular or mass
by           ADP        conjunction, subordinating or preposition
this         DET        determiner
weekend      NOUN       noun, singular or mass


## **3- References**

- [1] NLP and Computer Vision_DLMAINLPCV01 Lecture Book
- [2] https://spacy.io/
- [3] https://spacy.io/usage/spacy-101
- [4] https://spacy.io/models
- [5] https://spacy.io/models/en
- [6] https://spacy.io/usage/processing-pipelines
- [7] https://spacy.io/api/morphologizer#section-assigned-attributes
- [8] https://stackabuse.com/python-for-nlp-parts-of-speech-tagging-and-named-entity-recognition/

Copyright © 2021 IU International University of Applied Sciences