<a href="https://colab.research.google.com/github/Sagaust/DH-Computational-Methodologies/blob/main/POS_Tagging_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Part-of-Speech (POS) Tagging

---

**Definition:**  
Part-of-Speech (POS) Tagging is the process of marking each word in a text (corpus) with its corresponding part of speech. This means labeling words as nouns, verbs, adjectives, adverbs, etc. based on both its definition and its context within the sentence.

---

## 📌 **Why is POS Tagging Important?**

1. **Grammar Analysis**: Helps in understanding the grammatical structure of sentences.
2. **Context Understanding**: Some words can play multiple roles depending on their usage in sentences. POS tagging provides clarity.
3. **Linguistic Analysis**: Essential for researchers studying language patterns.
4. **Pre-processing Step**: Often used in NLP pipelines, like in Named Entity Recognition, dependency parsing, and more.

---

## 🛠 **How Does POS Tagging Work?**

While there are rule-based approaches, modern POS tagging primarily uses statistical algorithms and machine learning models. These models are trained on annotated corpora and then used to predict POS tags for unannotated texts.

---

## 🌐 **Approaches to POS Tagging**:

- **Rule-Based Tagging**: Uses hand-written rules. For instance, words ending with "ing" might be labeled as verbs.
- **Probabilistic Tagging**: Uses models like Hidden Markov Models (HMMs) and considers the probability of a given tag sequence for observed words.
- **Machine Learning-Based Tagging**: Uses algorithms like Decision Trees, CRFs, and neural networks trained on annotated corpora.

---

## 📚 **Applications of POS Tagging**:

1. **Text-to-Speech Systems**: Helps in correct pronunciation of words based on their usage.
2. **Information Retrieval**: Enhances search results by considering the grammatical role of words.
3. **Sentiment Analysis**: Adjectives play a crucial role; POS tagging helps identify them.
4. **Machine Translation**: Helps in structuring translated sentences.

---

## 💡 **Insights from POS Tagging**:

1. **Language Patterns**: Different languages have different grammatical structures. POS tagging helps in highlighting these patterns.
2. **Ambiguity Resolution**: Words like "lead" can be a verb or a noun. POS tagging helps in distinguishing between these usages.
3. **Text Complexity**: The distribution of parts of speech can give insights into the complexity or style of a text.

---

## 🛑 **Challenges with POS Tagging**:

1. **Ambiguity**: Many words can be tagged with multiple parts of speech.
2. **Out-of-Vocabulary Words**: Handling words that were not present in the training data.
3. **Language Variations**: Slang, regional dialects, and evolving language usage can pose challenges.

---

## 🧪 **POS Tagging in Python**:

Python's Natural Language Toolkit (NLTK) provides easy-to-use tools for POS tagging:

```python
import nltk
nltk.download('averaged_perceptron_tagger')

text = "The quick brown fox jumps over the lazy dog."
tokens = nltk.word_tokenize(text)
pos_tags = nltk.pos_tag(tokens)

print(pos_tags)
