# 5. Part-of-Speech Tagging


Part-of-Speech (POS) tagging is a fundamental task in natural language processing (NLP) that involves assigning each word in a sentence a label that indicates its grammatical role, such as noun, verb, adjective, etc. Understanding POS tagging is crucial because it forms the basis for more complex language processing tasks like syntactic parsing, named entity recognition, and machine translation.

In [2]:
!pip install nltk





In [17]:
import nltk
nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')
nltk.download('tagsets')

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package tagsets to /root/nltk_data...
[nltk_data]   Package tagsets is already up-to-date!


True

In [18]:
#Import Required Libraries

import nltk
from nltk.tokenize import word_tokenize


In [19]:
#Tokenize the Sentence

sentence = "The quick brown fox jumps over the lazy dog."
tokens = word_tokenize(sentence)


In [20]:
# Perform Part-of-Speech Tagging

tagged_tokens = nltk.pos_tag(tokens)

#nltk.pos_tag(tokens) returns a list of tuples where each tuple contains a word and its corresponding POS tag. For example: [(‘The’, 'DT'), (‘quick’, 'JJ'), (‘brown’, 'JJ'), (‘fox’, 'NN'), (‘jumps’, 'VBZ'), ...].


In [22]:
# Understanding POS Tags


# Print the entire Penn Treebank Tagset
nltk.help.upenn_tagset()

$: dollar
    $ -$ --$ A$ C$ HK$ M$ NZ$ S$ U.S.$ US$
'': closing quotation mark
    ' ''
(: opening parenthesis
    ( [ {
): closing parenthesis
    ) ] }
,: comma
    ,
--: dash
    --
.: sentence terminator
    . ! ?
:: colon or ellipsis
    : ; ...
CC: conjunction, coordinating
    & 'n and both but either et for less minus neither nor or plus so
    therefore times v. versus vs. whether yet
CD: numeral, cardinal
    mid-1890 nine-thirty forty-two one-tenth ten million 0.5 one forty-
    seven 1987 twenty '79 zero two 78-degrees eighty-four IX '60s .025
    fifteen 271,124 dozen quintillion DM2,000 ...
DT: determiner
    all an another any both del each either every half la many much nary
    neither no some such that the them these this those
EX: existential there
    there
FW: foreign word
    gemeinschaft hund ich jeux habeas Haementeria Herr K'ang-si vous
    lutihaw alai je jour objets salutaris fille quibusdam pas trop Monte
    terram fiche oui corporis ...
IN: preposition or

In [23]:
# Visualize or Print the Result

for word, tag in tagged_tokens:
    print(f"{word}: {tag}")


The: DT
quick: JJ
brown: NN
fox: NN
jumps: VBZ
over: IN
the: DT
lazy: JJ
dog: NN
.: .


Applications of POS Tagging

  - Syntactic Parsing: POS tags are used in parsing sentences to analyze their grammatical structure.

  - Named Entity Recognition: POS tags help in identifying proper nouns which might be entities like names of people, organizations, etc.

  - Text-to-Speech Systems: POS tags assist in generating the correct intonation and pronunciation by understanding the grammatical role of each word.

  - Information Extraction: POS tags are used to extract relevant information, such as dates, names, and places, from text data.