# Lemmatization
- Lemmatization is a process in Natural Language Processing (NLP) that reduces words to their base or dictionary form, known as the "lemma."
- Unlike stemming, which often simply cuts off prefixes or suffixes, lemmatization considers the context and converts words to their meaningful base forms based on their parts of speech.

### Key Differences Between Lemmatization and Stemming
**Stemming:** Reduces words to their root by removing suffixes, often leading to non-existent or partial words.
- Example: "running" → "run", "better" → "bet"
  
**Lemmatization:** Reduces words to their dictionary form by considering the word’s meaning and context.
- Example: "running" → "run", "better" → "good"

### Why Lemmatization?
- Lemmatization is more accurate than stemming because it considers the context of the word and returns a proper word that exists in the language.
- For example, while stemming would reduce "better" to "bet," lemmatization correctly identifies that the lemma for "better" is "good."

### How Lemmatization Works
- Lemmatization relies on a dictionary (or a similar linguistic resource) to look up the correct lemma of a word.
- It typically involves:

1 **Part-of-Speech Tagging:** Determining the grammatical category (noun, verb, adjective, etc.) of a word to find its appropriate lemma.

2 **Mapping to the Base Form:** Using linguistic rules and a dictionary to map the word to its lemma.

## Types of Lemmatization with Examples

#### WordNet Lemmatizer (in NLTK):
- One of the most commonly used lemmatizers, based on the WordNet lexical database.

In [3]:
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet

lemmatizer = WordNetLemmatizer()
print(lemmatizer.lemmatize("running", pos="v"))  # Output: run
print(lemmatizer.lemmatize("better", pos="a"))  # Output: good
print(lemmatizer.lemmatize("geese", pos="n"))  # Output: goose


run
good
goose


#### Custom Lemmatizer:
- Sometimes, a custom lemmatizer is created for specific applications, using predefined rules and dictionaries.

In [4]:
def custom_lemmatizer(word, pos):
    lemma_dict = {
        'children': 'child',
        'mice': 'mouse',
        'feet': 'foot'
    }
    return lemma_dict.get(word, word)

print(custom_lemmatizer("children", pos="n"))  # Output: child
print(custom_lemmatizer("mice", pos="n"))  # Output: mouse


child
mouse


## Use Cases and Considerations

**Information Retrieval:** Lemmatization helps in improving search accuracy by matching various forms of a word to its base form.

**Text Analysis:** Provides a cleaner and more accurate representation of the text, making it useful for tasks like sentiment analysis, text classification, and machine translation.


#### Limitations: 
- Lemmatization is computationally more expensive than stemming since it requires context and a look-up in a dictionary.