# Parts of Speech Tagging
Part-of-speech tagging, or POS tagging, is a task in natural language
processing that entails classifying words in a text according to their grammatical categories (such as noun, verb, and adjective)<br>
POS tagging can be rule-based or statistical. In statistical approaches, machine learning models are trained on annotated corpora to predict the most likely POS tags for words based on context.<br>
For applications like named entity recognition, information retrieval, and machine translation, POS tagging is essential for comprehending a language’s syntactic structure.

In [None]:
# Importing the NLTK library
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
from nltk.tokenize import word_tokenize
from nltk import pos_tag


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [None]:
# Sample text
text = "I watch your program. I love your watch"
# Tokenize the sentence into words and lowercase
tokens = word_tokenize(text.lower())

In [None]:
# Performing PoS tagging
pos_tags = pos_tag(tokens)

In [None]:
# Displaying the PoS tagged result in separate lines
print("Original Text:")
print(text)

print("\nPoS Tagging Result:")
for word, pos_tag in pos_tags:
	print(f"{word}: {pos_tag}")

Original Text:
I watch your program. I love your watch

PoS Tagging Result:
i: JJ
watch: VBP
your: PRP$
program: NN
.: .
i: VB
love: VBP
your: PRP$
watch: NN


# Implementing POS Tagging with Hidden Markov Models (HMMs)
<ul><li>HMMs as a generative probabilistic model.
<ul><li>Transition probabilities (probability of one tag following another) and <li>emission probabilities (probability of a word being assigned a specific tag).</ul>
<li>The Viterbi Algorithm for finding the most probable tag sequence.

#### Step-by-Step Implementation:

<ul><li>Introduce the concept of training and testing data (e.g., a POS-annotated corpus).
<li>Demonstrate the process of training an HMM to predict POS tags.
<li>Implement a basic HMM POS tagger in Python using NLTK.

In [None]:
import nltk
from nltk.corpus import treebank
from nltk.corpus import brown
from nltk.tag import hmm


In [None]:
# Load POS-tagged corpus (Treebank in this case)
nltk.download('brown')
train_data = treebank.tagged_sents()[:10000]  # Training data
test_data = treebank.tagged_sents()[3000:]  # Test data

[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Package brown is already up-to-date!


#### Training the HMM: Using NLTK’s Treebank corpus to train the HMM.

In [None]:
# Train an HMM POS tagger
trainer = hmm.HiddenMarkovModelTrainer()
hmm_tagger = trainer.train(train_data)


#### Tagging a Sentence: Using the trained model to tag new sentences.

In [None]:
# Test the model on a sample sentence
sample_sentence = "Natural language processing is interesting".split()
pos_tags = hmm_tagger.tag(sample_sentence)
print("POS Tags:", pos_tags)

POS Tags: [('Natural', 'NNP'), ('language', 'NN'), ('processing', 'NN'), ('is', 'VBZ'), ('interesting', 'JJ')]


#### Model Evaluation: Evaluate the tagger’s accuracy on the test data.

In [None]:

# Evaluate the model on test data
accuracy = hmm_tagger.evaluate(test_data)
print("HMM Tagger Accuracy:", accuracy)

  Function evaluate() has been deprecated.  Use accuracy(gold)
  instead.
  accuracy = hmm_tagger.evaluate(test_data)


HMM Tagger Accuracy: 0.9838981221670624


#### ACTIVITY:

<ul><Li>Explore different datasets or sentences to tag.

<li>Modify the sample corpus to see how well the HMM tagger performs with unseen words or rare tag combinations..
<li>Compare the performance of the different tagging methods
