<a href="https://colab.research.google.com/github/AsraniSanjana/All_Codes/blob/main/All_Semester_Codes/NLP_sem7/ColabFiles/NLP_05_POS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**NAME:** SANJANA ASRANI

**DIV:** D17B

**ROLL NO.**: 01

**NLP LAB-05:** POS TAG



# POS tagging, or Part-of-Speech tagging:
 a natural language processing (NLP) technique used to assign grammatical categories (or parts of speech) to words in a text based on their definition and their context in a sentence. The primary parts of speech include nouns, verbs, adjectives, adverbs, pronouns, prepositions, conjunctions, and interjections.

Here's a brief overview of some common parts of speech and their roles:

1. **Noun (NN)**: A word that represents a person, place, thing, or idea. Examples: "cat," "house," "book."

2. **Verb (VB)**: A word that describes an action, occurrence, or state of being. Examples: "run," "eat," "is."

3. **Adjective (JJ)**: A word that modifies or describes a noun. Examples: "red," "happy," "tall."

4. **Adverb (RB)**: A word that modifies or describes a verb, adjective, or another adverb, typically providing information about how, when, where, or to what extent something is done. Examples: "quickly," "very," "often."

5. **Pronoun (PRP)**: A word that can replace a noun to avoid repetition. Examples: "he," "she," "it."

6. **Preposition (IN)**: A word that shows the relationship between a noun or pronoun and other words in a sentence. Examples: "in," "on," "under."

7. **Conjunction (CC)**: A word that connects words, phrases, or clauses within a sentence. Examples: "and," "but," "or."

8. **Interjection (UH)**: A word or phrase used to express strong emotion or surprise. Examples: "oh," "wow," "ouch."

POS tagging is a fundamental step in various NLP tasks, such as text parsing, sentiment analysis, information retrieval, and machine translation. By tagging words with their respective parts of speech, NLP algorithms can better understand the grammatical structure of sentences and extract meaningful information from text data.

**Types of POS taggers:**

1. **Rule-Based POS Taggers**: Rule-based taggers assign POS tags to words based on predefined grammatical rules and patterns. These taggers rely on handcrafted rules and dictionaries. They can be simple and fast but may not perform well on ambiguous or irregular words.

Working:

Context-pattern rules or as Regular expression compiled into finite-state automata, intersected with lexically ambiguous sentence representation.

Dictionary Lookup: These taggers typically start by looking up each word in a pre-built dictionary or lexicon. The dictionary associates words with their most likely POS tags based on known patterns.
Rule Application: If a word is not found in the dictionary or if there is ambiguity, rule-based taggers apply grammatical rules or heuristics to determine the POS tag based on the word's context, its ending, nearby words, and other linguistic cues.

First stage − In the first stage, it uses a dictionary to assign each word a list of potential parts-of-speech.

Second stage − In the second stage, it uses large lists of hand-written disambiguation rules to sort down the list to a single part-of-speech for each word.

2. **Statistical POS Taggers**: Statistical taggers use probabilistic models to assign POS tags to words based on the likelihood of a word being a certain part of speech given its context. Hidden Markov Models (HMMs) and Conditional Random Fields (CRFs) are commonly used statistical models for POS tagging. These taggers are trained on large annotated corpora and are known for their accuracy.

Working:

Training: Statistical taggers are trained on large annotated corpora where each word in the text is tagged with its correct POS label. The tagger learns statistical patterns and associations between words and their POS tags from this training data.
Inference: During inference (tagging of unseen text), the tagger calculates the probability of each word belonging to various POS categories based on its context (surrounding words). The tagger assigns the most probable POS tag to each word based on these probabilities.

The simplest stochastic tagger applies the following approaches for POS tagging −

Word Frequency Approach
In this approach, the stochastic taggers disambiguate the words based on the probability that a word occurs with a particular tag. We can also say that the tag encountered most frequently with the word in the training set is the one assigned to an ambiguous instance of that word. The main issue with this approach is that it may yield inadmissible sequence of tags.

Tag Sequence Probabilities
It is another approach of stochastic tagging, where the tagger calculates the probability of a given sequence of tags occurring. It is also called n-gram approach. It is called so because the best tag for a given word is determined by the probability at which it occurs with the n previous tags.

3. **Transformation-based Tagging: Brill Tagging: Rule-Based + Statistical POS Taggers**: Some taggers combine both rule-based and statistical approaches to improve accuracy. They may use rules for common cases and statistical models for less predictable cases.

Working:

Lookup Table: These taggers have a lookup table or dictionary that maps words to their corresponding POS tags. During tagging, they simply look up each word in the table and assign the associated tag.

4. **Hidden Markov Model (HMM) POS Tagging**

In [None]:
# !pip install nltk
# !pip install pattern

import nltk
from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer
nltk.download('wordnet')
nltk.download('omq-1.4')

from pattern.en import lemma
from pattern.en import pluralize, singularize
from pattern.en import tenses
from pattern.en import tag
from pattern.en import comparative,superlative

import pandas as pd
from nltk.tokenize import word_tokenize
nltk.download('punkt')


[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Error loading omq-1.4: Package 'omq-1.4' not found in
[nltk_data]     index
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [None]:
pos=PorterStemmer()
df=pd.DataFrame(columns=["Word","Root","Singular/Plural","Tense","POS"])
sentence=input()
# The quick brown fox jumps over the lazy dog
words=word_tokenize(sentence)
df["Word"]=words
df

The quick brown fox jumps over the lazy dog


Unnamed: 0,Word,Root,Singular/Plural,Tense,POS
0,The,,,,
1,quick,,,,
2,brown,,,,
3,fox,,,,
4,jumps,,,,
5,over,,,,
6,the,,,,
7,lazy,,,,
8,dog,,,,


In [None]:
import spacy

nlp = spacy.load("en_core_web_sm")

# Define a function to lemmatize a word
def lemma(word):
    doc = nlp(word)
    return doc[0].lemma_

# Apply the lemmatization function to your DataFrame
df["Root"] = df["Word"].apply(lemma)
df

Unnamed: 0,Word,Root,Singular/Plural,Tense,POS
0,The,the,,,
1,quick,quick,,,
2,brown,brown,,,
3,fox,fox,,,
4,jumps,jump,,,
5,over,over,,,
6,the,the,,,
7,lazy,lazy,,,
8,dog,dog,,,


In [None]:
# Get Singluar or plural form
def pluralize(w):
  if w[-1]=="s": return "Plural"
  else: return "Singular"

df["Singular/Plural"]=df["Word"].apply(pluralize)
df


Unnamed: 0,Word,Root,Singular/Plural,Tense,POS
0,The,the,Singular,,
1,quick,quick,Singular,,
2,brown,brown,Singular,,
3,fox,fox,Singular,,
4,jumps,jump,Plural,,
5,over,over,Singular,,
6,the,the,Singular,,
7,lazy,lazy,Singular,,
8,dog,dog,Singular,,


In [None]:
def tense_word(w):
  return tenses(w)[0][0]

df["Tense"]=df["Word"].apply(tense_word)
df


Unnamed: 0,Word,Root,Singular/Plural,Tense,POS
0,The,the,Singular,infinitive,
1,quick,quick,Singular,infinitive,
2,brown,brown,Singular,infinitive,
3,fox,fox,Singular,infinitive,
4,jumps,jump,Plural,present,
5,over,over,Singular,infinitive,
6,the,the,Singular,infinitive,
7,lazy,lazy,Singular,infinitive,
8,dog,dog,Singular,infinitive,


In [None]:
# Define a function to tag a word
def tag_word(word):
    doc = nlp(word)
    return doc[0].pos_

# Apply the tagging function to your DataFrame
df["POS"] = df["Word"].apply(tag_word)
df

Unnamed: 0,Word,Root,Singular/Plural,Tense,POS
0,The,the,Singular,infinitive,PRON
1,quick,quick,Singular,infinitive,ADJ
2,brown,brown,Singular,infinitive,PROPN
3,fox,fox,Singular,infinitive,PROPN
4,jumps,jump,Plural,present,VERB
5,over,over,Singular,infinitive,ADP
6,the,the,Singular,infinitive,PRON
7,lazy,lazy,Singular,infinitive,ADJ
8,dog,dog,Singular,infinitive,NOUN


# Word Generation

In [None]:
df1 = pd.DataFrame(columns=["Word", "Root", "Singular", "Plural", "Comparative", "Superlative"])

# Insert Base Word
df1["Word"] = ["play", "good", "child", "wolf","wives","sheep"]

# Now, the "Word" column will have the correct values
print(df1)


    Word Root Singular Plural Comparative Superlative
0   play  NaN      NaN    NaN         NaN         NaN
1   good  NaN      NaN    NaN         NaN         NaN
2  child  NaN      NaN    NaN         NaN         NaN
3   wolf  NaN      NaN    NaN         NaN         NaN
4  wives  NaN      NaN    NaN         NaN         NaN
5  sheep  NaN      NaN    NaN         NaN         NaN


In [None]:
# find singular & plural
def singular(x):
  return singularize(x)

def plural(x):
  return pluralize(x)

df1["Singluar"]=df1["Word"].apply(singular)
df1["Plural"]=df1["Word"].apply(plural)
df1

Unnamed: 0,Word,Root,Singular,Plural,Comparative,Superlative,Singluar
0,play,,,Singular,,,play
1,good,,,Singular,,,good
2,child,,,Singular,,,child
3,wolf,,,Singular,,,wolf
4,wives,,,Plural,,,wife
5,sheep,,,Singular,,,sheep


In [None]:
# find comparative and superlatives of a given word

def compare_word(x):
  return comparative(x)

def super_word(x):
  return superlative(x)

df1['Comparative']=df1["Word"].apply(compare_word)
df1["Superlative"]=df1["Word"].apply(super_word)
df1

Unnamed: 0,Word,Root,Singular,Plural,Comparative,Superlative,Singluar
0,play,,,Singular,player,playest,play
1,good,,,Singular,better,best,good
2,child,,,Singular,childer,childest,child
3,wolf,,,Singular,wolfer,wolfest,wolf
4,wives,,,Plural,more wives,most wives,wife
5,sheep,,,Singular,sheeper,sheepest,sheep


In [None]:

def identify_plural(word):
    doc = nlp(word)

    # Check if the word is a noun and not a proper noun
    if len(doc) == 1 and doc[0].pos_ == "NOUN" and doc[0].tag_ != "NNP":
        # Check if the word is plural (e.g., cats, dogs) or singular (e.g., cat, dog)
        if doc[0].morph.get("Number") == "pl":
            return "Plural"
        else:
            return "Singular"

df1["Singular/Plural"]=df1["Word"].apply(identify_plural)
df1

Unnamed: 0,Word,Root,Singular,Plural,Comparative,Superlative,Singluar,Singular/Plural
0,play,,,Singular,player,playest,play,
1,good,,,Singular,better,best,good,
2,child,,,Singular,childer,childest,child,Singular
3,wolf,,,Singular,wolfer,wolfest,wolf,Singular
4,wives,,,Plural,more wives,most wives,wife,Singular
5,sheep,,,Singular,sheeper,sheepest,sheep,Singular


# **Applications**

1. **Text Analysis and Information Retrieval**: POS tagging helps identify the grammatical structure of text, enabling better text analysis, indexing, and retrieval. It aids in search engines and information retrieval systems by improving the relevance of search results.

2. **Machine Translation**: In machine translation systems, understanding the POS of words in the source language helps produce more accurate translations, as it provides insight into word order and sentence structure.

3. **Grammar Checking and Language Processing**: POS tagging is essential for grammar checking tools, aiding in the detection of grammatical errors and suggesting corrections. It also supports natural language understanding and generation.

4. **Named Entity Recognition (NER)**: POS tagging assists in NER by identifying proper nouns, such as names of people, places, and organizations, within text. This is crucial in information extraction and text classification tasks.

5. **Sentiment Analysis**: Sentiment analysis algorithms use POS tags to better understand the sentiment expressed in a piece of text. For example, identifying adjectives can help determine whether a statement is positive or negative.

6. **Text Summarization**: POS tagging can be used in text summarization to identify and extract key information from a text while maintaining sentence structure and coherence.

7. **Speech Recognition**: In speech recognition systems, POS tagging aids in converting spoken language into text by helping identify word boundaries and word forms.

8. **Question Answering Systems**: POS tagging plays a role in question answering systems, helping to identify parts of a question and matching them to relevant information in a knowledge base.

9. **Syntactic Parsing**: POS tagging is a crucial step in syntactic parsing, which involves analyzing the grammatical structure of sentences. It aids in understanding sentence syntax and relationships between words.

10. **Text-to-Speech (TTS) Systems**: In TTS systems, POS tagging assists in generating natural-sounding speech by providing information about word pronunciation and intonation.

11. **Language Modeling**: POS tagging is used in language modeling tasks, such as predicting the next word in a sentence, which is important for applications like autocomplete and predictive text input.