# Part of Speech (POS) Tagging in Spacy Library
#### What is POS Tagging?
The Part of speech tagging or POS tagging is the process of marking a word in the text to a particular part of speech based on both its context and definition. In simple language, we can say that POS tagging is the process of identifying a word as nouns, pronouns, verbs, adjectives, etc.

#### Why POS tag is used
Some words can function in more than one way when used in different circumstances. The POS Tagging here plays a crucial role to understand in what context the word is used in the sentence. POS Tagging is useful in sentence parsing, information retrieval, sentiment analysis, etc.

#### POS Tagging in Spacy Library
Spacy provides a bunch of POS tags such as NOUN (noun), PUNCT (punctuation), ADJ(adjective), ADV(adverb), etc. It has a trained pipeline and statistical models which enable spaCy to make classification of which tag or label a token belongs to. For example, a word following “the” in English is most likely a noun

|POS	|DESCRIPTION|	EXAMPLES|
|:------|:--------------------------|:-----------------------------------------|
| ADJ	|adjective	                |*big, old, green, incomprehensible, first*|
|ADP	|adposition	                |*in, to, during*
|ADV	|adverb	                    |*very, tomorrow, down, where, there*
|AUX	|auxiliary	                |*is, has (done), will (do), should (do)*
|CONJ	|conjunction	            |*and, or, but*
|CCONJ	|coordinating conjunction	|*and, or, but*
|DET	|determiner	                |*a, an, the*
|INTJ	|interjection	            |*psst, ouch, bravo, hello*
|NOUN	|noun	                    |*girl, cat, tree, air, beauty*
|NUM	|numeral	                |*1, 2017, one, seventy-seven, IV, MMXIV*
|PART	|particle	                |*’s, not,*
|PRON	|pronoun	                |*I, you, he, she, myself, themselves, somebody*
|PROPN	|proper noun	            |*Mary, John, London, NATO, HBO*
|PUNCT	|punctuation	            |*., (, ), ?*
|SCONJ	|subordinating conjunction	|*if, while, that*
|SYM	|symbol	                    |*$, %, §, ©, +, −, ×, ÷, =, :), 😝*
|VERB	|verb	                    |*run, runs, running, eat, ate, eating*
|X	    |other	                    |*sfpksdpsxmsa*
|SPACE	|space                      |





#### Spacy POS Tagging Example
POS Tagging in Spacy library is quite easy as seen in the below example. We just instantiate a Spacy object as doc. We iterate over doc object and use pos_ , tag_, to print the POS tag. Spacy also lets you access the detailed explanation of POS tags by using spacy.explain() function which is also printed in the same iteration along with POS tags.

In [3]:
import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("Don't judge a book by its cover.")

print(f"{'text':{8}} {'POS':{6}} {'TAG':{6}} {'Dep':{6}} {'POS explained':{20}} {'tag explained'} ")

for token in doc:
    print(f'{token.text:{8}} {token.pos_:{6}} {token.tag_:{6}} {token.dep_:{6}} {spacy.explain(token.pos_):{20}} {spacy.explain(token.tag_)}')

text     POS    TAG    Dep    POS explained        tag explained 
Do       AUX    VBP    aux    auxiliary            verb, non-3rd person singular present
n't      PART   RB     neg    particle             adverb
judge    VERB   VB     ROOT   verb                 verb, base form
a        DET    DT     det    determiner           determiner
book     NOUN   NN     dobj   noun                 noun, singular or mass
by       ADP    IN     prep   adposition           conjunction, subordinating or preposition
its      PRON   PRP$   poss   pronoun              pronoun, possessive
cover    NOUN   NN     pobj   noun                 noun, singular or mass
.        PUNCT  .      punct  punctuation          punctuation mark, sentence closer


#### Fine Grained POS Tag
Spacy also provides a fine-grained tag that further categorizes a token in different sub-categories. For example, when a word is an adjective it further categorizes it as JJR (comparative adjective), JJS (superlative adjective), or AFX (affix adjective). We can get the list of fine grained tags in Spacy by using nlp.pipe_labels[‘tagger’] as shown in the below example.

In [4]:
import spacy

nlp = spacy.load("en_core_web_sm")
tag_lst = nlp.pipe_labels['tagger']

print(len(tag_lst))
print(tag_lst)

50
['$', "''", ',', '-LRB-', '-RRB-', '.', ':', 'ADD', 'AFX', 'CC', 'CD', 'DT', 'EX', 'FW', 'HYPH', 'IN', 'JJ', 'JJR', 'JJS', 'LS', 'MD', 'NFP', 'NN', 'NNP', 'NNPS', 'NNS', 'PDT', 'POS', 'PRP', 'PRP$', 'RB', 'RBR', 'RBS', 'RP', 'SYM', 'TO', 'UH', 'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ', 'WDT', 'WP', 'WP$', 'WRB', 'XX', '_SP', '``']


## Morphology
In linguistics, morphology is defined as the process of analyzing a word, how they are formed, and their relationship to other words in the same language, the structure of words, and parts of words such as stems, root words, prefixes, and suffixes. Morphology also looks at parts of speech, intonation, and stress, and the ways of context can change a word’s pronunciation and meaning.

Spacy uses the token text and fine-grained part-of-speech tags to produce morphological features.

In Spacy, the morphological features are stored in the MorphAnalysis under Token.morph, which allows us to access individual morphological features. In the example below, we are iterating the tokens of doc object and printing all the morphological features by using token.morph attributes. However, we can also access any particular type of morphological features by using morph.get() function. token.morph.to_dict() function returns all the morphological features in a dictionary format.

In [6]:
import spacy

nlp = spacy.load("en_core_web_sm")
print("Pipeline:", nlp.pipe_names)
doc = nlp("I was going to the grocery store.")
for token in doc:  
    print(token.text)
    print(token.morph)   ## Printing all the morphological features.
    print(token.morph.get("Number"))   ## Printing a particular type of morphological 
                                       ## features such as Number(Singular or plural).
    print(token.morph.to_dict())       ## Prining the morphological features in dictionary format.
    print('\n\n')

Pipeline: ['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']
I
Case=Nom|Number=Sing|Person=1|PronType=Prs
['Sing']
{'Case': 'Nom', 'Number': 'Sing', 'Person': '1', 'PronType': 'Prs'}



was
Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin
['Sing']
{'Mood': 'Ind', 'Number': 'Sing', 'Person': '3', 'Tense': 'Past', 'VerbForm': 'Fin'}



going
Aspect=Prog|Tense=Pres|VerbForm=Part
[]
{'Aspect': 'Prog', 'Tense': 'Pres', 'VerbForm': 'Part'}



to

[]
{}



the
Definite=Def|PronType=Art
[]
{'Definite': 'Def', 'PronType': 'Art'}



grocery
Number=Sing
['Sing']
{'Number': 'Sing'}



store
Number=Sing
['Sing']
{'Number': 'Sing'}



.
PunctType=Peri
[]
{'PunctType': 'Peri'}





### Counting POS Tags in Spacy
In the example below, we are passing the POS token attribute to Doc.count() function which returns a frequency dictionary with key as POS attribute value and its frequency as the value. With the help of for loop, we are printing the POS tag and its count.

In [7]:
import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp(u"The grass is always greener on the other side of the fence")

# Counting the frequencies of different POS tags:
POS_counts = doc.count_by(spacy.attrs.POS)
print(POS_counts)

for k,v in sorted(POS_counts.items()):
    print(f'{k:{4}}. {doc.vocab[k].text:{5}}: {v}')

{90: 3, 92: 3, 87: 1, 86: 1, 84: 2, 85: 2}
  84. ADJ  : 2
  85. ADP  : 2
  86. ADV  : 1
  87. AUX  : 1
  90. DET  : 3
  92. NOUN : 3


### Counting fine-grained tags
In the example below, we are passing the TAG token attribute to Doc.count() and it is returning a frequency dictionary with key as TAG attribute value and its frequency as the value. With the help of for loop, we are printing the POS tag and its count.

In [9]:
import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp(u"An apple a day keeps the doctor away.")

# Counting the frequencies of different fine-grained tags:
TAG_counts = doc.count_by(spacy.attrs.TAG)

print(TAG_counts)
for k,v in sorted(TAG_counts.items()):
    print(f'{k}. {doc.vocab[k].text:{4}}: {v}')

{15267657372422890137: 3, 15308085513773655218: 3, 13927759927860985106: 1, 164681854541413346: 1, 12646065887601541794: 1}
164681854541413346. RB  : 1
12646065887601541794. .   : 1
13927759927860985106. VBZ : 1
15267657372422890137. DT  : 3
15308085513773655218. NN  : 3


### Visualizing the POS Tags in Spacy
In Spacy we can visualize the part-of-speech tags and syntactic dependencies using displacy.serve() function which takes a single Doc or list of Doc objects and returns a nice visualization.

In [10]:
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("Tomorrow belongs to people who prepare for it today.")
displacy.serve(doc, style="dep")




Using the 'dep' visualizer
Serving on http://0.0.0.0:5000 ...

Shutting down server on port 5000.


However, We can have different visualization by tuning the display function by passing a list of parameters whose significance is explained below.
Parameters
- <b>distance</b> : Distance between token dipendencies.
- <b>compact</b> : Compactness of color.
- <b>color</b> : Color of the font.
- <b>bg</b> : Background color of the visualization.
- <b>font</b> : Style of the font in the visualization.

In [11]:
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("I will be heading to Amsterdam")
options = {'distance': 110, 'compact': 'True', 'color': 'yellow', 'bg': '#09a3d5', 'font': 'Times'}

displacy.serve(doc, style="dep",options=options)


Using the 'dep' visualizer
Serving on http://0.0.0.0:5000 ...

Shutting down server on port 5000.


### Visualizing POS Tags in Long Texts in Spacy
Long texts can become difficult to read when displayed in one row, so it’s often better to visualize them sentence-by-sentence instead. Displacy supports rendering both Doc and Span objects, as well as lists of Docs or Spans. Instead of passing the full Doc to displacy.serve, we can also pass in a list doc.sents. This will create one visualization for each sentence.

In [12]:
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
text = "Life is a beautiful journey that is meant to be embraced to the fullest every day.However, that doesn’t mean you always wake up ready to seize the day, and sometimes need a reminder that life is a great gift."
doc = nlp(text)
sentence_spans = list(doc.sents)
displacy.serve(sentence_spans, style="dep")


Using the 'dep' visualizer
Serving on http://0.0.0.0:5000 ...

Shutting down server on port 5000.
