## 📘 1. Lemmatization: Theoretical Overview
Definition:
Lemmatization is the process of reducing a word to its dictionary base form (lemma), considering the context (POS tag) and morphological analysis of the word.

Difference from Stemming:
Unlike stemming, which chops off suffixes, lemmatization uses a lexical knowledge base (like WordNet) and returns a valid dictionary word.

### 🧰 2. Lemmatizers Available in NLTK
| Lemmatizer                                                                      | Description                           | Backend Resource         |
| ------------------------------------------------------------------------------- | ------------------------------------- | ------------------------ |
| `WordNetLemmatizer`                                                             | Most commonly used lemmatizer in NLTK | WordNet lexical database |
| *(Other lemmatizers like spaCy and TextBlob are external and not part of NLTK)* |                                       |                          |


🔹 NLTK does not provide multiple lemmatizers. WordNetLemmatizer is the primary implementation in NLTK. For advanced lemmatization, spaCy or StanfordNLP is used.



### 🧪 3. Example Code: NLTK Lemmatization

In [4]:
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

words = ['running', 'ran', 'runner', 'better', 'worst', 'flies', 'studies', 'feet']

# Without POS (default = noun)
print("Without POS:")
for word in words:
    print(f"{word} → {lemmatizer.lemmatize(word)}")

# With POS
print("\nWith POS tagging:")
for word in words:
    print(f"{word} → {lemmatizer.lemmatize(word, pos='v')}")  # 'v' = verb

## pos means part of speech, which helps the lemmatizer understand the context of the word.
# Pos examples: 
# - 'n' for noun
# - 'v' for verb
# - 'a' for adjective
# - 'r' for adverb
# - 's' for singular noun
# - 'p' for plural noun
# - 'l' for lemmatized form
# - 'd' for past tense
# - 'i' for gerund
# - 'f' for future tense
# - 'c' for comparative
# - 's' for superlative
# - 't' for infinitive
# - 'g' for gerund
# - 'x' for unknown
# - 'y' for possessive

Without POS:
running → running
ran → ran
runner → runner
better → better
worst → worst
flies → fly
studies → study
feet → foot

With POS tagging:
running → run
ran → run
runner → runner
better → better
worst → worst
flies → fly
studies → study
feet → feet


### ✅ 4. Advantages of Lemmatization
| Advantage                     | Description                                                    |
| ----------------------------- | -------------------------------------------------------------- |
| ✅ **Semantically Accurate**   | Respects word context and returns correct base form            |
| ✅ **Valid Words**             | Output is always a dictionary word, improving interpretability |
| ✅ **Handles Irregular Forms** | Better at resolving forms like *went → go*, *better → good*    |
| ✅ **Useful for ML Pipelines** | Reduces vocabulary size while maintaining semantics            |
| ✅ **POS Support**             | Allows more granular control using parts-of-speech tagging     |

### ⚠️ 5. Disadvantages of Lemmatization
| Disadvantage                        | Description                                                       |
| ----------------------------------- | ----------------------------------------------------------------- |
| ❌ **Slower than Stemming**          | Requires dictionary lookup and morphological analysis             |
| ❌ **POS Tag Required for Accuracy** | Without POS, defaults to noun (can lead to incorrect results)     |
| ❌ **Limited to English in NLTK**    | WordNet is English-only; no multilingual lemmatization            |
| ❌ **Less Aggressive**               | May retain inflections when stemming would normalize aggressively |

### 📊 6. Summary: When to Use Lemmatization
| Use Case                                                          | Recommendation                       |
| ----------------------------------------------------------------- | ------------------------------------ |
| **Semantic NLP tasks** (e.g., topic modeling, sentiment analysis) | ✅ Use Lemmatization                  |
| **Search engines / IR systems**                                   | ✅ Lemmatization improves precision   |
| **Real-time/low-latency apps**                                    | ❌ Consider stemming for speed        |
| **Multi-lingual NLP**                                             | ❌ Use spaCy or other language models |
