# 1) POS Tagging Basics

**Tagging** is a kind of classification, the automatic assignment of description to tokens is called as tagging.

The descriptor is called tag, which represents one of the part-of-speech, semantic information and so on.

**Part-of-Speech tagging** is the process of assigning one of the parts of speech to the given word.

In simple words we can say that, POS tagging is a task of labelling each word in a sentence with its appropriate part of speech

Parts of speech include nouns, verb, adverbs, adjectives, pronouns, conjunction and their sub-categories

**e.g.** Word: Paper, Tag: Noun

POS tagging has applications in Named Entity Recognition (NER), sentiment analysis, question answering, etc

In [0]:
# Import spaCy
import spacy

In [0]:
# load the English language library
nlp = spacy.load(name='en_core_web_sm')

In [0]:
# create a doc object
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")

In [4]:
# print entire document text
print(doc.text)

Apple is looking at buying U.K. startup for $1 billion


In [5]:
# we can grab tokens by their index positions
print(doc[2])

looking


In [6]:
# Grab POS tag
print(doc[2].pos_)

VERB


In [7]:
# Fine-grained POS tag
print(doc[2].tag_)

# Spacy documentation link: https://spacy.io/usage/linguistic-features/

VBG


In [8]:
# table of information using for loop

for token in doc:
    print(f'{token.text:{10}} {token.lemma_:{8}} {token.pos_:{8}} {token.tag_:{6}} {spacy.explain(token.tag_)}')

# numbers in bracket used for space between the columns

Apple      Apple    PROPN    NNP    noun, proper singular
is         be       AUX      VBZ    verb, 3rd person singular present
looking    look     VERB     VBG    verb, gerund or present participle
at         at       ADP      IN     conjunction, subordinating or preposition
buying     buy      VERB     VBG    verb, gerund or present participle
U.K.       U.K.     PROPN    NNP    noun, proper singular
startup    startup  NOUN     NN     noun, singular or mass
for        for      ADP      IN     conjunction, subordinating or preposition
$          $        SYM      $      symbol, currency
1          1        NUM      CD     cardinal number
billion    billion  NUM      CD     cardinal number



# 2) Counting POS Tags

**doc.count_by() method** accepts a specific token attribute as its argument and returns a frequency count of the given attribute as a dictionary object

In [0]:
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")

In [0]:
# Count the frequencies of different coarse-grained POS tags:
POS_counts = doc.count_by(spacy.attrs.POS) # attrs for attributes

In [11]:
print(POS_counts)
# output is a dictionary

{96: 2, 87: 1, 100: 2, 85: 2, 92: 1, 99: 1, 93: 2}


In [0]:
# these numbers are actually POS code

In [12]:
# decode POS code 
doc.vocab[96].text

'PROPN'

In [13]:
# checking POS for an individual tokan
doc[0].pos_

'PROPN'

# 3) Visualizing the Parts of Speech

In [0]:
# Import spaCy
import spacy
# load the English language library
nlp = spacy.load(name='en_core_web_sm')
# Import the displaCy library
from spacy import displacy

In [0]:
# Create a simple Doc object
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")

In [16]:
# Render the dependency
displacy.render(doc, style='dep', jupyter=True, options={'distance': 80})