# $$Step\ 7 : Lemmatization$$

__________________

# **Text Preprocessing: Lemmatization vs. Stemming in NLP**

## **1️⃣ What is Lemmatization?**
Lemmatization is a text preprocessing technique used in **Natural Language Processing (NLP)** to reduce words to their base or **dictionary form** (lemma) while preserving their meaning.  

Unlike **Stemming**, which cuts off word endings, **Lemmatization** uses a **predefined dictionary** to ensure the root form of the word is meaningful.

### **🔹 Example Comparison**
| Word        | Stemming   | Lemmatization |
|------------|-----------|--------------|
| running    | run       | run          |
| connected  | connect   | connect      |
| better     | better    | good         |
| studies    | studi     | study        |
| worse      | wors      | worse        |

📌 **Key Difference:**  
- **Stemming:** Chops off word endings and may produce meaningless words.  
- **Lemmatization:** Returns the **dictionary base form**, ensuring meaningful words.

---

## **2️⃣ Why Use Lemmatization?**
✔ **Better Accuracy:** Produces correct words instead of cutting off characters randomly.  
✔ **Reduces Vocabulary Size:** Groups different forms of a word (e.g., "studying" → "study").  
✔ **Important for Sentiment Analysis:** Helps understand words correctly in context.

⚠ **Limitation:** Lemmatization is **slower** than stemming because it uses a dictionary lookup.

---

## **3️⃣ Implementing Stemming and Lemmatization in Python**
We will compare both techniques using the **NLTK** library.


_____________________

### Exemple :

In [1]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Admin\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [8]:
from nltk.stem import WordNetLemmatizer

# create lemmatizer 
lem = WordNetLemmatizer()

# Sample words to process
words = ["running", "connected", "better", "studies", "dogs", "happily"]

for x in words :
    print(x , " : " , lem.lemmatize(x) )

running  :  running
connected  :  connected
better  :  better
studies  :  study
dogs  :  dog
happily  :  happily


In [9]:
from nltk.stem import PorterStemmer

# Create a stemmer 
ps = PorterStemmer()

# Sample words to process
words = ["running", "connected", "better", "studies", "dogs", "happily"]

for x in words :
    print( x , " : " , ps.stem(x))

running  :  run
connected  :  connect
better  :  better
studies  :  studi
dogs  :  dog
happily  :  happili


### Exemple : Lemmatization in Sentiment Analysis 

In [10]:
reviews = [
    "The food was amazing, and I enjoyed it!",
    "I loved the experience of dining here.",
    "The service was the worst I have ever seen!",
    "The atmosphere was relaxing and pleasant."
]

# Tokenize and lemmatize words in reviews
from nltk.tokenize import word_tokenize

lemmatized_reviews = []
for review in reviews:
    words = word_tokenize(review)  # Tokenize words
    lemmatized_words = [lemmatizer.lemmatize(word) for word in words]  # Apply lemmatization
    lemmatized_reviews.append(" ".join(lemmatized_words))  # Reconstruct sentences

# Print results
for original, lemmatized in zip(reviews, lemmatized_reviews):
    print(f"Original: {original}\nLemmatized: {lemmatized}\n")

Original: The food was amazing, and I enjoyed it!
Lemmatized: The food wa amazing , and I enjoyed it !

Original: I loved the experience of dining here.
Lemmatized: I loved the experience of dining here .

Original: The service was the worst I have ever seen!
Lemmatized: The service wa the worst I have ever seen !

Original: The atmosphere was relaxing and pleasant.
Lemmatized: The atmosphere wa relaxing and pleasant .

