# **WordNet Lemmatizer in NLP**

## **What is WordNet Lemmatizer?**
`WordNetLemmatizer` (from NLTK) reduces words to their **base or dictionary form (root word)** using **linguistic rules**, unlike stemming, which simply removes suffixes.

## **Word Stem vs. Root Word**
- **Word Stem (Stemming Result)**: A truncated version of a word after removing suffixes (may not be a real word).
  - Example: `"studies"` → `"studi"` ❌ (not a valid word).
- **Root Word (Lemmatization Result)**: The actual **dictionary word** (lemma).
  - Example: `"studies"` → `"study"` ✅ (valid word).

## **How Lemmatization Works**
- **Uses WordNet dictionary** to find correct root words.
- **Considers part of speech (POS)** (`'v'` for verbs, `'n'` for nouns, etc.).
- **More accurate than stemming**, producing valid words.

## **Example Comparison**
| Word    | Stemming (Porter) | Root Word (Lemma) |
|---------|------------------|-------------------|
| Running | run             | run               |
| Studies | studi           | study             |
| Better  | better          | good              |
| Caring  | care            | care              |

## **Why Use It?**
✔ **Produces real words instead of chopped forms.**  
✔ **Considers word meaning and POS.**  
✔ **Better for NLP tasks like chatbots, search engines, and text analysis.**  

## **Limitations**
❌ **Slower than stemming** (requires dictionary lookup).  
❌ **POS tagging is needed for best accuracy.**  

**Use `WordNetLemmatizer` when you need correct root words instead of just truncating text!** 🚀


## **WordNetLemmatizer class**
`WordNetLemmatizer` is a class in **NLTK** that uses the **WordNet corpus** to find the **base (lemma) form** of words. It performs **lemmatization** by looking up words in **WordNet CorpusReader** using **morphy() function** rather than just removing suffixes like stemming.

In [1]:
from nltk.stem import WordNetLemmatizer

In [2]:
words = [
    "writing", "writes", "written", "writer", "wrote", "rewrite", "rewriting", "rewritten",
    "running", "runs", "ran", "runner", "runners",
    "studying", "studies", "studied", "student", "students",
    "playing", "plays", "played", "player", "players",
    "singing", "sings", "sang", "singer", "singers"]


In [4]:
lemmatizer= WordNetLemmatizer()

In [6]:
for word in words:
    print(word+ "-->"+lemmatizer.lemmatize(word, pos='v'))

writing-->write
writes-->write
written-->write
writer-->writer
wrote-->write
rewrite-->rewrite
rewriting-->rewrite
rewritten-->rewrite
running-->run
runs-->run
ran-->run
runner-->runner
runners-->runners
studying-->study
studies-->study
studied-->study
student-->student
students-->students
playing-->play
plays-->play
played-->play
player-->player
players-->players
singing-->sing
sings-->sing
sang-->sing
singer-->singer
singers-->singers


## **Parameters for `lemmatizer.lemmatize()`**
- **`word`** → The word to be lemmatized.
- **`pos`** *(optional)* → The **Part of Speech (POS)** (default: `'n'` for noun).

### **POS Tag Options**
| POS Tag | Description |
|---------|------------|
| `'n'`   | Noun       |
| `'v'`   | Verb       |
| `'a'`   | Adjective  |
| `'r'`   | Adverb     |


# **Use Cases of Lemmatization**

- **Search Engines** – Improves search accuracy by normalizing word variations.  
- **Chatbots & Virtual Assistants** – Helps understand different word forms.  
- **Text Normalization** – Reduces vocabulary size for NLP tasks.  
- **Sentiment Analysis** – Standardizes words for better emotion detection.  
- **Machine Translation** – Enhances translation accuracy.  
- **Named Entity Recognition (NER)** – Identifies entities by handling word variations.  
- **Text Summarization** – Eliminates redundant words.  
- **Plagiarism Detection** – Helps compare different word forms.  
- **Speech Recognition** – Improves spoken language understanding.  
- **Spell Checking & Grammar Correction** – Standardizes text for corrections.  

✔ **Lemmatization ensures better accuracy in NLP by converting words to their root form.** 🚀
