
Part-of-Speech (POS) Tagging is a crucial task in Natural Language Processing (NLP), where each word in a sentence is assigned a grammatical category (POS) such as noun, verb, adjective, etc. This helps in understanding the grammatical structure of a sentence, which is fundamental for many downstream tasks such as syntactic parsing, machine translation, and sentiment analysis.

### What is Part-of-Speech Tagging?
POS tagging is the process of identifying the part of speech for each word in a sentence. POS tags include categories like:
- **Noun (NN)**: Represents things, people, places, or ideas.
- **Verb (VB)**: Represents actions or states of being.
- **Adjective (JJ)**: Describes or modifies nouns.
- **Adverb (RB)**: Describes or modifies verbs, adjectives, or other adverbs.
- **Pronoun (PRP)**: Replaces nouns in a sentence.
- **Preposition (IN)**: Shows relationships between words in a sentence.
- **Conjunction (CC)**: Joins words or phrases.
- **Interjection (UH)**: Words or phrases expressing strong emotions.


In [None]:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


True

In [None]:
from nltk.tokenize import word_tokenize
from nltk import pos_tag

# Tokenize and POS tagging
sentence = "I love programming in Python."
tokens = word_tokenize(sentence)
tagged_tokens = pos_tag(tokens)
print(tagged_tokens)

[('I', 'PRP'), ('love', 'VBP'), ('programming', 'VBG'), ('in', 'IN'), ('Python', 'NNP'), ('.', '.')]


In [1]:
import spacy

# Load pre-trained SpaCy model
nlp = spacy.load("en_core_web_sm")

# Process the sentence with SpaCy
text = "I love programming in Python."
doc = nlp(text)

# Print tokens and their POS tags
for token in doc:
    print(f"{token.text} | {token.pos_} | {spacy.explain(token.tag_)}")


I | PRON | pronoun, personal
love | VERB | verb, non-3rd person singular present
programming | NOUN | noun, singular or mass
in | ADP | conjunction, subordinating or preposition
Python | PROPN | noun, proper singular
. | PUNCT | punctuation mark, sentence closer


In [2]:
for token in doc:
    print(token," | ", token.pos_, " | ", spacy.explain(token.pos_), " | ", token.tag_, " | ", spacy.explain(token.tag_))

I  |  PRON  |  pronoun  |  PRP  |  pronoun, personal
love  |  VERB  |  verb  |  VBP  |  verb, non-3rd person singular present
programming  |  NOUN  |  noun  |  NN  |  noun, singular or mass
in  |  ADP  |  adposition  |  IN  |  conjunction, subordinating or preposition
Python  |  PROPN  |  proper noun  |  NNP  |  noun, proper singular
.  |  PUNCT  |  punctuation  |  .  |  punctuation mark, sentence closer


In [5]:
filtered_tokens = []

for token in doc:
    if token.pos_ not in ["SPACE", "PUNCT", "X"]:
        print(token," | ", token.pos_, " | ", spacy.explain(token.pos_), " | ", token.tag_, " | ", spacy.explain(token.tag_))

I  |  PRON  |  pronoun  |  PRP  |  pronoun, personal
love  |  VERB  |  verb  |  VBP  |  verb, non-3rd person singular present
programming  |  NOUN  |  noun  |  NN  |  noun, singular or mass
in  |  ADP  |  adposition  |  IN  |  conjunction, subordinating or preposition
Python  |  PROPN  |  proper noun  |  NNP  |  noun, proper singular
