# Parts of Speech Tagging (POS)

**Tagging** is a kind of classification, the automatic assignment of description to tokens is called as tagging.

The descriptor is called tag, which represents one of the part-of-speech, semantic information and so on.

**Part-of-Speech tagging** is the process of assigning one of the parts of speech to the given word.

In simple words we can say that, POS tagging is a task of labelling each word in a sentence with its appropriate part of speech

Parts of speech include nouns, verb, adverbs, adjectives, pronouns, conjunction and their sub-categories

**e.g.** Word: Paper, Tag: Noun

### Applications
- Named Entity Recognition (NER)
- Text Parsing and Syntax Analysis
- Sentiment Analysis
- Machine Translation
- Speech Recognition and Synthesis
- Information Retrieval and Extraction
- Coreference Resolution
- Text Summarization
- Question Answering Systems
- Dependency Parsing

In [None]:
# officaial documentation
# https://spacy.io/usage/linguistic-features/#pos-tagging

In [None]:
# Import spaCy
import spacy

In [None]:
# load the English language library
nlp = spacy.load(name="en_core_web_sm")

In [None]:
# create a document object
document = nlp("Apple is looking at buying U.K. startup for $1 billion")

In [None]:
# print entire document text
print(document.text)

Apple is looking at buying U.K. startup for $1 billion


In [None]:
# we can grab tokens by their index positions
print(document[2])

looking


In [None]:
# Grab POS tag
print(document[2].pos_)

VERB


In [None]:
# Fine-grained POS tag
print(document[2].tag_)

# Spacy documentation link: https://spacy.io/usage/linguistic-features/

VBG


In [None]:
# table of information using for loop

for token in document:
  print(token.text.ljust(10), token.lemma_.ljust(8), token.pos_.ljust(8), token.tag_.ljust(6), spacy.explain(token.tag_))

# ljust() put defined space between each column

Apple      Apple    PROPN    NNP    noun, proper singular
is         be       AUX      VBZ    verb, 3rd person singular present
looking    look     VERB     VBG    verb, gerund or present participle
at         at       ADP      IN     conjunction, subordinating or preposition
buying     buy      VERB     VBG    verb, gerund or present participle
U.K.       U.K.     PROPN    NNP    noun, proper singular
startup    startup  NOUN     NN     noun, singular or mass
for        for      ADP      IN     conjunction, subordinating or preposition
$          $        SYM      $      symbol, currency
1          1        NUM      CD     cardinal number
billion    billion  NUM      CD     cardinal number


# 2) Counting POS Tags

**doc.count_by() method** accepts a specific token attribute as its argument and returns a frequency count of the given attribute as a dictionary object

In [None]:
# document two
documentTwo = nlp("Apple is looking at buying U.K. startup for $1 billion")

In [None]:
# Count the frequencies of different coarse-grained POS tags:
PoSCounts = documentTwo.count_by(spacy.attrs.POS) # attrs for attributes

In [None]:
# output is a dictionary
print(PoSCounts)

{96: 2, 87: 1, 100: 2, 85: 2, 92: 1, 99: 1, 93: 2}


In [None]:
# these numbers are actually POS code
# decode POS code
documentTwo.vocab[96].text

'PROPN'

In [None]:
# checking POS for an individual tokan
documentTwo[0].pos_

'PROPN'

# 3) Visualizing the Parts of Speech

In [None]:
# Import the displaCy library
from spacy import displacy

In [None]:
# Render the dependency
displacy.render(documentTwo, style="dep", jupyter=True)

# default spacing is based on the length of the sentence and the relative positions of words and dependency arcs
# Spacy automatically calculates the positioning to fit the dependency tree within the display area, aiming for clear visualization.

In [None]:
# we can control the space by using options parameter
# Render the dependency
displacy.render(documentTwo, style="dep", jupyter=True, options = {"distance": 100})