<a href="https://colab.research.google.com/github/LAGISHETTYNANDITHA/INFORMATION-RETRIEVAL-METHODOLOGYLO/blob/main/2097_b31_irs_lab_7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [6]:
# POS Tagging using NLTK, spaCy, and Regex

import nltk
import spacy
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.tag import RegexpTagger

nltk.download('punkt')
nltk.download('averaged_perceptron_tagger_eng')

# Sample text
text = "Natural Language Processing is an interesting field. Machine learning is transforming the world of artificial intelligence."

# --- Day 1: NLTK Basic POS Tagging ---
print("\n--- Day 1: POS Tagging using NLTK ---")
tokens_nltk = word_tokenize(text)
pos_tags_nltk = nltk.pos_tag(tokens_nltk, lang='eng')
print(pos_tags_nltk)

# --- Day 2: Tokenization and POS Tagging with NLTK ---
print("\n--- Day 2: Tokenization into sentences and words ---")
sentences = sent_tokenize(text)
for sentence in sentences:
    tokens = word_tokenize(sentence)
    pos_tags = nltk.pos_tag(tokens, lang='eng')
    print(pos_tags)

# --- Day 3: POS Tagging using spaCy ---
print("\n--- Day 3: POS Tagging using spaCy ---")
nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
for token in doc:
    print(f"{token.text} : {token.pos_} ({token.tag_})")

# --- Day 4: Regex-Based POS Tagging ---
print("\n--- Day 4: Rule-Based POS Tagging using Regex ---")
patterns = [
    (r'.*ing$', 'VBG'),  # Verb, gerund/present participle
    (r'.*ed$', 'VBD'),   # Verb, past tense
    (r'.*ly$', 'RB'),    # Adverb
    (r'^The$', 'DT'),    # Determiner
    (r'.*', 'NN')        # Noun (default)
]
regex_tagger = RegexpTagger(patterns)
tokens_nltk = word_tokenize(text)
regex_tags = regex_tagger.tag(tokens_nltk)
print(regex_tags)

# Analysis: This script covers POS tagging using NLTK's statistical model, spaCy's NLP pipeline, and NLTK's rule-based approach using regular expressions.

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.



--- Day 1: POS Tagging using NLTK ---
[('Natural', 'JJ'), ('Language', 'NNP'), ('Processing', 'NNP'), ('is', 'VBZ'), ('an', 'DT'), ('interesting', 'JJ'), ('field', 'NN'), ('.', '.'), ('Machine', 'NNP'), ('learning', 'NN'), ('is', 'VBZ'), ('transforming', 'VBG'), ('the', 'DT'), ('world', 'NN'), ('of', 'IN'), ('artificial', 'JJ'), ('intelligence', 'NN'), ('.', '.')]

--- Day 2: Tokenization into sentences and words ---
[('Natural', 'JJ'), ('Language', 'NNP'), ('Processing', 'NNP'), ('is', 'VBZ'), ('an', 'DT'), ('interesting', 'JJ'), ('field', 'NN'), ('.', '.')]
[('Machine', 'NN'), ('learning', 'NN'), ('is', 'VBZ'), ('transforming', 'VBG'), ('the', 'DT'), ('world', 'NN'), ('of', 'IN'), ('artificial', 'JJ'), ('intelligence', 'NN'), ('.', '.')]

--- Day 3: POS Tagging using spaCy ---
Natural : PROPN (NNP)
Language : PROPN (NNP)
Processing : PROPN (NNP)
is : AUX (VBZ)
an : DET (DT)
interesting : ADJ (JJ)
field : NOUN (NN)
. : PUNCT (.)
Machine : NOUN (NN)
learning : NOUN (NN)
is : AUX (VBZ)

Implement Part-of-Speech (POS) Tagging

Objective: Understand and apply POS tagging on text data.

Tasks:
•    Tokenize the given text.
•    Apply POS tagging using NLTK’s POS tagger.
•    Analyze the tagged output for correctness and patterns.

Tools/Packages: Python, NLTK, spaCy.

Expected Outcome: Correctly tagged sentences with part-of-speech labels.

----------------------------------------------------------------------

Monday- 1:
Introduction to POS Tagging
Objective: Learn what POS tagging is and why it is important.
Requirements: Use the nltk library for basic POS tagging.
Explanation:
•    POS tagging assigns grammatical categories (like noun, verb, adjective) to words in a sentence.
•    We use the nltk.pos_tag() function, which applies a pre-trained statistical model to predict the POS tags.
Code Breakdown:
1.    Tokenize the sentence.
2.    Use nltk.pos_tag() to get the POS tags.
3.    Print the output.
Example Output:
[('The', 'DT'), ('quick', 'JJ'), ('brown', 'JJ'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]
Here, DT → Determiner, JJ → Adjective, NN → Noun, VBZ → Verb (3rd person singular present).
Deliverables:
•    Source code [use filename as: <last 4 digits of Hall ticket No>_<BatchNo>_Lab07.ipynb
•    Upload google colab file and the link in Canvas.

----------------------------------------------------------------------

Tuesday-2
Tokenization and POS Tagging with NLTK
Objective: Learn how to break text into sentences and words. Apply POS tagging to a full paragraph.
Explanation:
•    Tokenization breaks down text into sentences (sent_tokenize()) and words (word_tokenize()).
•    After tokenization, POS tagging is applied to each word.
Example Output:
[('Natural', 'JJ'), ('Language', 'NNP'), ('Processing', 'NNP'), ('is', 'VBZ'), ('an', 'DT'), ('interesting', 'JJ'), ('field', 'NN'), ('.', '.')]
Here, NNP → Proper noun,  VBZ → Verb (present tense), . → Punctuation.
Deliverables:
•    Source code [use filename as: <last 4 digits of Hall ticket No>_<BatchNo>_Lab07.ipynb
•    Upload google colab file and the link in Canvas.

----------------------------------------------------------------------

Wednesday- 3
Implementing POS Tagging Using SpaCy
Objective: Use spaCy for POS tagging. Compare spaCy tags with nltk.
Explanation:
•   spaCy provides an optimized NLP pipeline that is faster than nltk.
•    It has a built-in en_core_web_sm model for tagging words.
Example Output:
Machine PROPN
learning NOUN
is AUX
transforming VERB
the DET
world NOUN
of ADP
artificial ADJ
intelligence NOUN
. PUNCT
Here, PROPN → Proper noun, AUX → Auxiliary verb, ADP → Adposition (preposition).

Deliverables:
•    Source code [use filename as: <last 4 digits of Hall ticket No>_<BatchNo>_Lab07.ipynb
•    Upload google colab file and the link in Canvas.

----------------------------------------------------------------------

Thursday-4
Customizing POS Tagging with Regex and Rule-Based Approaches
Objective:
•    Implement rule-based POS tagging using RegexpTagger.
Explanation:
•    A rule-based tagger assigns POS tags using predefined patterns.
•    Example rules:

Words ending in -ing → Verb (VBG)
Words ending in -ed → Past tense verb (VBD)
Words ending in -ly → Adverb (RB)
"The" → Determiner (DT)
Example Output:
[('The', 'DT'), ('cat', 'NN'), ('jumps', 'NN'), ('quickly', 'RB'), ('over', 'NN'), ('the', 'DT'), ('lazy', 'NN'), ('dog', 'NN'), ('.', 'NN')]
