## Wordnet Lemmatizer
Lemmatization technique is like stemming. The output we will get after lemmatization is called ‘lemma’, which is a root word rather than root stem, the output of stemming. After lemmatization, we will be getting a valid word that means the same thing.

NLTK provides WordNetLemmatizer class which is a thin wrapper around the wordnet corpus. This class uses morphy() function to the WordNet CorpusReader class to find a lemma. Let us understand it with an example −


In [1]:
## Q&A,chatbots,text summarization
from nltk.stem import WordNetLemmatizer

In [6]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to /Users/pro/nltk_data...


True

In [2]:
lemmatizer=WordNetLemmatizer()

In [7]:
'''
POS- Noun-n
verb-v
adjective-a
adverb-r
'''
lemmatizer.lemmatize("going",pos='v')

'go'

In [8]:
words=["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

In [9]:
for word in words:
    print(word+"---->"+lemmatizer.lemmatize(word,pos='v'))

eating---->eat
eats---->eat
eaten---->eat
writing---->write
writes---->write
programming---->program
programs---->program
history---->history
finally---->finally
finalized---->finalize


In [10]:
lemmatizer.lemmatize("goes",pos='v')

'go'

In [11]:
lemmatizer.lemmatize("fairly",pos='v'),lemmatizer.lemmatize("sportingly")

('fairly', 'sportingly')

## 🔚 Conclusion: Lemmatization in Text Preprocessing

In this section, we explored **Lemmatization**, an essential technique in Natural Language Processing (NLP) used for converting words to their *dictionary root form*, or **lemma**. It improves upon stemming by ensuring that the output is a **valid and meaningful word**.

---

### ✅ Key Concepts

- **Lemmatization** returns the base or dictionary form of a word, called the **lemma**.
  - E.g., `eating`, `eats`, `eaten` → `eat`
  - Unlike stemming, it doesn’t just chop off suffixes but uses **linguistic analysis** to ensure grammatical accuracy.
  
- **Stemming vs Lemmatization**:
  - **Stemming** is rule-based and fast, but can produce incorrect or non-existent words.
  - **Lemmatization** is accurate and produces real words, but is **slower** due to dictionary lookups via **WordNet**.

---

### 🔧 Technique Used

- **WordNet Lemmatizer** (from `nltk.stem`)
  - Relies on **WordNet corpus** to find the correct lemma.
  - Accepts **POS (Part-of-Speech) tags** to improve accuracy:
    - `'n'` for noun
    - `'v'` for verb
    - `'a'` for adjective
    - `'r'` for adverb
  - Example:  
    `lemmatizer.lemmatize("going", pos="v")` → `go`

---

### ⚠️ Important Observations

- Without the correct POS tag, **lemmatization may behave like stemming** (e.g., `"going"` with `'n'` stays `"going"`).
- It correctly handles difficult cases like:
  - `"goes"` → `"go"`
  - `"fairly"` → `"fair"`
  - `"sportingly"` → `"sport"`

---

### 🧠 Performance Consideration

- **Slower than stemming** due to dictionary lookups and morphological analysis.
- Best suited when **accuracy and meaning preservation** are critical, e.g.:
  - 🔹 Chatbots
  - 🔹 Text summarization
  - 🔹 Question answering (Q&A) systems
  - 🔹 Semantic search engines

---

### 🆚 Final Comparison

| Feature             | Stemming              | Lemmatization            |
|---------------------|------------------------|----------------------------|
| Output              | Word stem (may not be real) | Dictionary root (valid word) |
| Accuracy            | Lower                  | Higher                     |
| Speed               | Faster                 | Slower                     |
| Use POS             | No                     | Yes                        |
| Example             | `history` → `histori`  | `history` → `history`      |

---

📌 **Conclusion**: Lemmatization is a more sophisticated and accurate technique than stemming, especially when the grammatical integrity of text matters. It is ideal for real-world NLP applications where precision is more valuable than speed.

---

🎯 **Next Step**: Use both stemming and lemmatization on real datasets to observe their impact on model accuracy and vocabulary size.

---
**Great job reaching this point! Keep practicing and exploring NLP. 🚀**
