#### Lemmatization

**Lemmatization** is a text preprocessing technique in Natural Language Processing (NLP) that reduces a word to its base or root form (known as a lemma) while ensuring that the resulting word is a valid word in the language.  

**Difference Between Lemmatization and Stemming:**  
- Stemming: Cuts off prefixes or suffixes to produce a base form, which may not be a valid word (e.g., "running" → "run").
- Lemmatization: Uses linguistic rules and vocabulary to find the proper root form, ensuring the output is meaningful (e.g., "better" → "good").      

- Unlike stemming, lemmatization uses linguistic rules and a vocabulary to ensure the word is valid and meaningful.
  
**Purpose of Lemmatization:**  
- Normalize words for analysis (e.g., run, running, ran → run).
- Retain context by using valid dictionary words.

**With POS Tags for Better Accuracy:**  
Lemmatization performs better when provided with the part of speech (POS) of the word.

**Why Use POS Tags?**  
Lemmatization performs better when given the correct Part of Speech (POS) because the same word can have different lemmas depending on its role in a sentence.  

**For example:**  
  
Verb: "running" → "run"
Noun: "running" → "running" (if interpreted as a noun)
  
**POS Tags:**  

- "n": Noun
- "v": Verb
- "a": Adjective
- "r": Adverb

In [1]:
# WordNetLemmatizer
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
import nltk

In [2]:
# Download WordNet data
nltk.download('wordnet')
nltk.download('omw-1.4')

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Dell\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     C:\Users\Dell\AppData\Roaming\nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


True

In [3]:
# Create a lemmatizer
lemmatizer = WordNetLemmatizer()

In [4]:
# Lemmatize words without specifying POS
words = ["running", "ran", "easily", "fairness", "better", "children"]
lemmatized = [lemmatizer.lemmatize(word) for word in words]

In [5]:
print("Without POS:", lemmatized)

Without POS: ['running', 'ran', 'easily', 'fairness', 'better', 'child']


In [6]:
# Lemmatize words with POS tags
print("With POS:")

print(lemmatizer.lemmatize("running", pos="v"))  # Verb -> 'run'
print(lemmatizer.lemmatize("better", pos="a"))  # Adjective -> 'good'
print(lemmatizer.lemmatize("children", pos="n"))  # Noun -> 'child'

With POS:
run
good
child


# End!