
# **Part of Speech (POS) Tagging**

---

## **1. Theory**

* **Definition**: POS tagging assigns **grammatical categories** (noun, verb, adjective, etc.) to each word in a text.
* Example:
  Sentence → *“The quick brown fox jumps over the lazy dog.”*
  Tags → `[The/DET, quick/ADJ, brown/ADJ, fox/NOUN, jumps/VERB, over/ADP, the/DET, lazy/ADJ, dog/NOUN]`

---

### **Why is POS Tagging important?**

* **Syntactic structure**: Identifies grammatical roles.
* **Feature engineering**: Improves text classification and NER.
* **Disambiguation**: Helps in word sense (e.g., *“play” as noun vs. verb*).
* **Pre-processing**: Useful in lemmatization (verb vs. noun roots differ).

---

## **2. Practical Examples**

---

### **POS Tagging with NLTK**

```python
import nltk
from nltk.tokenize import word_tokenize

# Download resources once
nltk.download("punkt")
nltk.download("averaged_perceptron_tagger")

sentence = "The quick brown fox jumps over the lazy dog."
tokens = word_tokenize(sentence)
pos_tags = nltk.pos_tag(tokens)

print(pos_tags)
```

**Output:**

```
[('The', 'DT'), ('quick', 'JJ'), ('brown', 'JJ'), ('fox', 'NN'),
 ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'),
 ('dog', 'NN'), ('.', '.')]
```

👉 Here `"DT"` = Determiner, `"JJ"` = Adjective, `"NN"` = Noun, `"VBZ"` = Verb.

---

### **POS Tagging with SpaCy + Visualization**

```python
import spacy
from spacy import displacy

# Load small English model
nlp = spacy.load("en_core_web_sm")

doc = nlp("The quick brown fox jumps over the lazy dog.")

for token in doc:
    print(f"{token.text:<10} {token.pos_:<10} {token.tag_:<10} {token.dep_:<10}")

# Visualization
displacy.render(doc, style="dep", jupyter=True)
```

**Sample Output Table:**

```
The        DET        DT        det       
quick      ADJ        JJ        amod      
brown      ADJ        JJ        amod      
fox        NOUN       NN        nsubj     
jumps      VERB       VBZ       ROOT      
over       ADP        IN        prep      
the        DET        DT        det       
lazy       ADJ        JJ        amod      
dog        NOUN       NN        pobj      
.          PUNCT      .         punct     
```

👉 `displacy.render` gives you a **dependency tree visualization** in your notebook or web app: arrows showing **grammatical dependencies** between words.

---

## **3. Interview-Style Q&A**

### **Basic**

**Q1. What is POS tagging?**
*A: POS tagging is the process of assigning grammatical categories like noun, verb, adjective, etc., to words in a text.*

**Q2. How is POS tagging useful in NLP?**
*A: It helps in syntactic parsing, lemmatization, disambiguation, and is a foundational step in many downstream NLP tasks such as NER and sentiment analysis.*

---

### **Intermediate**

**Q3. How does NLTK perform POS tagging?**
*A: NLTK uses pre-trained statistical models (like the averaged perceptron tagger) trained on corpora such as the Penn Treebank.*

**Q4. Difference between `pos_` and `tag_` in SpaCy?**
*A: `pos_` is the **universal POS tag** (language-agnostic, coarse-grained), while `tag_` is the **fine-grained POS tag** (language-specific, e.g., VBZ for verb, 3rd person singular).*

---

### **Advanced**

**Q5. How do modern Transformer models handle POS tagging?**
*A: Instead of rule-based or standalone statistical models, Transformers like BERT can predict POS tags as a token classification task, leveraging contextual embeddings.*

**Q6. What are some challenges in POS tagging?**
*A: Ambiguity (e.g., “book a flight” vs. “read a book”), domain adaptation (general models failing on medical/legal text), and multilingual complexities (morphologically rich languages).*

---

## **4. Visualization Snapshot** (SpaCy’s `displacy`)

When rendered, you see something like:

```
fox ──▶ jumps ◀── dog
 ^          |         ^
 |          |         |
 adj        subj      pobj
```

This shows **dependencies**:

* *fox* → subject of *jumps*
* *dog* → object of *over*
* *quick/brown/lazy* → adjectives modifying nouns.

---
