# Lemmatization in NLP

## What is Lemmatization?

**Lemmatization** is the process of reducing a word to its **base or dictionary form** (called a **lemma**) using **vocabulary and morphological analysis**.

- Unlike stemming, lemmatization returns real words.
- Example:  
  - "running" → "run"  
  - "better" → "good"  
  - "was" → "be"

## Why Use Lemmatization?

- More accurate than stemming.
- Preserves the **meaning** of words.
- Essential for tasks like **text classification**, **question answering**, and **semantic analysis**.

# Difference Between Stemming and Lemmatization

| Feature            | **Stemming**                           | **Lemmatization**                           |
|--------------------|----------------------------------------|--------------------------------------------|
| **Definition**      | Reduces words to their root form, often a non-word. | Reduces words to their dictionary form (real word). |
| **Accuracy**        | Less accurate, as it follows heuristic rules. | More accurate, uses vocabulary and context. |
| **Speed**           | Faster, since it only applies simple rules. | Slower, as it involves dictionary lookups and context analysis. |
| **Use Case**        | Useful for quick, simple tasks where precision is not critical. | Essential for tasks requiring high precision and linguistic meaning (e.g., text classification, sentiment analysis). |
| **Example**         | "better" → "bett" | "better" → "good" |
| **Complexity**      | Simpler to implement (uses rule-based algorithms). | More complex, uses lexical databases like WordNet. |



### Bonus Tip:
✅ Use Lemmatization when precision and correctness are important.

✅ Prefer it over stemming for linguistic tasks and text understanding.

In [1]:
from nltk.stem import WordNetLemmatizer

lemmatizer=WordNetLemmatizer()

In [6]:
# Make sure to download these
import nltk
nltk.download('wordnet') # Downloads the WordNet corpus, a lexical database for the English language, used for tasks like lemmatization.
nltk.download('omw-1.4') # Downloads the Open Multilingual WordNet, providing multilingual support for WordNet, enabling cross-lingual NLP tasks.

[nltk_data] Downloading package wordnet to
[nltk_data]     /home/u5c2dbc0bf2849dd5288e3311262c709/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     /home/u5c2dbc0bf2849dd5288e3311262c709/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


True

#### You must specify the part of speech (pos) for best results (e.g., 'n' for noun, 'v' for verb, 'a' for adjective, 'r' for adverb).

In [9]:
print(lemmatizer.lemmatize("running", pos="v"))  
print(lemmatizer.lemmatize("better", pos="a"))   
print(lemmatizer.lemmatize("was", pos="v"))  
print(lemmatizer.lemmatize("fairly", pos='r'))

run
good
be
fairly
