# `nltk.pos_tag` in NLTK

## What is `pos_tag`?

`nltk.pos_tag()` is a function in NLTK that performs **Part-of-Speech (POS) tagging**, assigning a **grammatical category** (like noun, verb, adjective) to each word in a sentence.

## How It Works:

- It takes a **list of word tokens** and returns a list of tuples: `(word, POS_tag)`.
- Uses the **Penn Treebank tagset** by default (e.g., `NN` = noun, `VB` = verb, `JJ` = adjective).

## Why POS Tagging Matters:
- Helps in understanding grammar and syntax.
- Useful for lemmatization, named entity recognition, and information extraction.
- Can improve accuracy in text classification and semantic analysis.

In [1]:
paragraph = """Natural Language Processing is a fascinating field of artificial intelligence. 
It allows computers to understand, interpret, and generate human language. 
Many applications like chatbots, language translation, and sentiment analysis rely heavily on NLP techniques. 
With the growth of digital content, the ability to analyze large volumes of text has become essential. 
NLP helps in extracting useful information, automating tasks, and enhancing user experiences across different domains."""

In [2]:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package punkt to
[nltk_data]     /home/u5c2dbc0bf2849dd5288e3311262c709/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/u5c2dbc0bf2849dd5288e3311262c709/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


True

In [9]:
# Example-1

sample = "Hey Spidey! You look amazing in the black suit:)"
words = [nltk.pos_tag(sample.split())]
print(words)

[[('Hey', 'NNP'), ('Spidey!', 'NNP'), ('You', 'PRP'), ('look', 'VBP'), ('amazing', 'VBG'), ('in', 'IN'), ('the', 'DT'), ('black', 'JJ'), ('suit:)', 'NN')]]


In [13]:
# Example-2
import nltk
nltk.download('stopwords') # Download the list of common stopwords in various languages
nltk.download('punkt') # Download the tokenizer models for sentence and word tokenization
nltk.download('averaged_perceptron_tagger') # Download the POS tagger model for part-of-speech tagging

from nltk.corpus import stopwords
stopwords.words('english') # Access the list of English stopwords

sentences = nltk.sent_tokenize(paragraph)
for sentence in sentences:
    words = nltk.word_tokenize(sentence)
    words = [word for word in words if word not in set(stopwords.words('english'))]
    pos_tag = nltk.pos_tag(words)
    print(pos_tag)

[('Natural', 'JJ'), ('Language', 'NNP'), ('Processing', 'NNP'), ('fascinating', 'JJ'), ('field', 'NN'), ('artificial', 'JJ'), ('intelligence', 'NN'), ('.', '.')]
[('It', 'PRP'), ('allows', 'VBZ'), ('computers', 'NNS'), ('understand', 'VBP'), (',', ','), ('interpret', 'JJ'), (',', ','), ('generate', 'JJ'), ('human', 'JJ'), ('language', 'NN'), ('.', '.')]
[('Many', 'JJ'), ('applications', 'NNS'), ('like', 'IN'), ('chatbots', 'NNS'), (',', ','), ('language', 'NN'), ('translation', 'NN'), (',', ','), ('sentiment', 'NN'), ('analysis', 'NN'), ('rely', 'RB'), ('heavily', 'RB'), ('NLP', 'NNP'), ('techniques', 'NNS'), ('.', '.')]
[('With', 'IN'), ('growth', 'NN'), ('digital', 'JJ'), ('content', 'NN'), (',', ','), ('ability', 'NN'), ('analyze', 'RB'), ('large', 'JJ'), ('volumes', 'NNS'), ('text', 'JJ'), ('become', 'JJ'), ('essential', 'JJ'), ('.', '.')]
[('NLP', 'NNP'), ('helps', 'VBZ'), ('extracting', 'VBG'), ('useful', 'JJ'), ('information', 'NN'), (',', ','), ('automating', 'VBG'), ('tasks', 

[nltk_data] Downloading package stopwords to
[nltk_data]     /home/u5c2dbc0bf2849dd5288e3311262c709/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     /home/u5c2dbc0bf2849dd5288e3311262c709/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/u5c2dbc0bf2849dd5288e3311262c709/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
