# 🌟 [**Lemmatization**](https://www.analyticsvidhya.com/blog/2022/06/stemming-vs-lemmatization-in-nlp-must-know-differences/) 🚀

## 📌 **1. What is Lemmatization?**
Lemmatization is a process in **Natural Language Processing (NLP)** that finds the **base or dictionary form** of a word, called a **lemma**, by analyzing its **context** and **part of speech (POS)**.

✅ **Why is Lemmatization Important?**  
- It ensures words are in their correct base form.  
- It improves text analysis by maintaining proper meaning.  
- It is widely used in **search engines, AI chatbots, and sentiment analysis**.

---

## 🏷 **2. What is Part of Speech (POS)?**
**Part of Speech (POS)** refers to the **category** a word belongs to based on its grammatical function in a sentence.

🔹 Words are classified into different parts of speech:

| 🏷 **POS**      | 📖 **Definition**                | 🔍 **Example**            |
|--------------|------------------------------|------------------------|
| **Noun**     | A person, place, or thing.    | **dog, city, book**    |
| **Verb**     | An action or state of being.  | **run, eat, is**       |
| **Adjective**| Describes a noun.            | **big, happy, red**    |
| **Adverb**   | Describes a verb or adjective. | **quickly, very, well** |
| **Pronoun**  | Replaces a noun.              | **he, she, they**      |
| **Preposition** | Shows relationships.      | **on, under, at**      |
| **Conjunction** | Connects words/phrases.   | **and, but, because**  |
| **Interjection** | Expresses emotion.       | **Wow!, Oh!, Oops!**   |

---

## 🛠 **3. How Does Lemmatization Work?**
Lemmatization follows a **structured** approach:

1️⃣ **Tokenization** → Break text into words.  
2️⃣ **POS Tagging** → Identify each word's role.  
3️⃣ **Apply Lemmatization** → Convert words to base form using a **linguistic database** (like WordNet).  
4️⃣ **Output** → Get meaningful and standardized words.  

📌 **Example:**

| 🔤 **Word**  | 📝 **Lemmatized Form** |
|-------------|------------------|
| **Running** | **Run** |
| **Studies** | **Study** |
| **Happily** | **Happy** |
| **Better**  | **Good** |

---

# 🎯 **4. Lemmatization vs. Stemming: Key Differences**
Both techniques reduce words to their root forms, but **lemmatization is more precise!**  

| ⚡ **Feature**      | 🔍 **Lemmatization** | 🔧 **Stemming** |
|----------------|-----------------|-------------|
| **Definition**  | Converts words to their dictionary form, considering **context** and **POS**. | Removes **suffixes** without considering meaning. |
| **Accuracy**    | ✅ **High** – produces valid words. | ❌ **Lower** – may produce gibberish. |
| **Speed**       | ❌ **Slower** (requires NLP rules). | ✅ **Faster** (rule-based). |
| **Use Case**    | **Chatbots, text understanding, AI.** | **Search engines, keyword matching.** |
| **Example**     | "better" → **"good"**, "running" → **"run"** | "better" → **"bet"**, "running" → **"run"** |

📌 **Comparison Example:**

| 🔤 **Word**  | 📝 **Lemmatization** | 🔧 **Stemming** |
|-------------|------------------|-------------|
| **Running** | **Run** | **Run** |
| **Studies** | **Study** | **Studi** |
| **Happily** | **Happy** | **Happi** |
| **Better**  | **Good**  | **Bet** |

✅ **Lemmatization = More accurate, preserves meaning.**  
❌ **Stemming = Faster, but may distort words.**

---

# 🧐 **5. Should I Use Both Stemming and Lemmatization?**
Follow these **5 simple steps** to decide which method fits your NLP task.  

### **Step 1: Define Your Needs**
🔹 **What is your goal?**  
- Need to group **similar concepts** (e.g., synonyms)?  
- Need to **preserve exact meaning** for **AI embeddings**?  

🔹 **What matters more—Speed or Accuracy?**  
- **Large dataset?** → **Speed is crucial.**  
- **Precise meaning?** → **Accuracy is more important.**  

---

### **Step 2: Consider the Trade-Offs**
| ⚡ **Feature**      | 🔧 **Stemming** | 📝 **Lemmatization** |
|----------------|-------------|----------------|
| **Speed**     | ✅ **Fast** | ❌ **Slower** |
| **Accuracy**  | ❌ May distort words | ✅ Preserves meaning |
| **Output**    | ❌ Can create non-words | ✅ Produces real words |
| **Best Use**  | **Search engines, large datasets** | **AI, chatbots, sentiment analysis** |

---

### **Step 3: Choose Based on Your Needs**
✅ **Use Stemming if:**  
- You need **fast processing**.  
- You can tolerate **some loss of meaning**.  

✅ **Use Lemmatization if:**  
- **Accuracy** is essential.  
- You need grammatically correct base words.  

---

### **Step 4: Experiment with Both**  
If unsure, try both on a **small dataset** and compare:  
- **Does stemming distort words too much?**  
- **Does lemmatization slow down processing significantly?**  

---

### **Step 5: Refine Your Approach**  
- **If speed is critical**, use **stemming**.  
- **If accuracy is more important**, use **lemmatization**.  
- **Hybrid approach?** Try **stemming first** for quick reduction, then **lemmatization** for refinement.  

💡 **Final Tip:** The best choice depends on your specific **NLP task**! 🚀  


##**📌 Real-World Use Cases**
## **1. Sentiment Analysis (Product Reviews, Social Media Monitoring)**

### ✅ **Best Choice: Lemmatization**  

- **Why?** Lemmatization helps **normalize words** while maintaining their correct meaning.  
- **Example:**  
  - If a user writes: **"This product is better than I expected!"**  
  - Lemmatization converts **"better" → "good"**, helping the sentiment model recognize it as **positive sentiment**.  

---

### 💡 **Example Sentiment Analysis:**

| 📝 **Raw Text**                     | 🎯 **Lemmatized Text**         | 😃 **Sentiment**  |
|--------------------------------------|--------------------------------|------------------|
| "I loved the movies"                 | "I love the movie"            | **Positive** 👍  |
| "This phone is worse than before"    | "This phone be bad than before" | **Negative** 👎  |
| "Running is exhausting"              | "Run be exhaust"              | **Neutral** 😐  |

---

### **Why Use Lemmatization for Sentiment Analysis?**
✔ **Maintains correct word meaning** (e.g., "better" → "good")  
✔ **Reduces variations of words** for better text processing  
✔ **More accurate sentiment classification**  

----

# **2. 📂 Search Engine for a Company’s Internal Reports**  

**Scenario:** A company with thousands of internal reports (e.g., project updates, meeting notes) needs a search tool to help employees find documents quickly. Speed is critical because employees need results in seconds, and the database is large.

## ✅ **Best Choice: Stemming**  

### **🚀 Why Stemming?**  
- **Speed is critical** – Employees need results **instantly**.  
- **Large database** – Processing must be **fast and efficient**.  
- **Employees use different word forms** – Stemming helps match them quickly.  

---

## **💡 Example: Searching for "meeting"**
| **Search Query**     | **Stemming Applied**  | **Matches Found**        |
|----------------------|----------------------|--------------------------|
| "meeting notes"     | "meet"               | ✅ "meeting notes"       |
| "meetings summary"  | "meet"               | ✅ "meetings summary"    |
| "met with team"     | "met" → "meet"       | ✅ "met with team"       |

📌 **Stemming helps match related words, making searches broader and faster!**  

---

## **❌ Why Not Lemmatization?**  
- **Slower processing** → Lemmatization requires **linguistic analysis**, increasing search time.  
- **Overkill for simple searches** → Employees need **fast** keyword-based results, not deep semantic understanding.  

---

## **🔹 Final Decision: Use Stemming**  
✔ **Fast & efficient for large internal databases**  
✔ **Broadens search results by matching word variations**  
✔ **Ideal when speed matters more than perfect accuracy**  


### [Lemmatization using NLTK](https://gaurav5430.medium.com/using-nltk-for-lemmatizing-sentences-c1bfff963258)

In [5]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [None]:
from nltk.stem import WordNetLemmatizer

# Initialize lemmatizer
lemmatizer = WordNetLemmatizer()

# Basic lemmatization
print(lemmatizer.lemmatize("running", pos="v"))
print(lemmatizer.lemmatize("better", pos="a"))

run
good


### [Lemmatization using Spacy](https://spacy.io/api/lemmatizer)

In [None]:
import spacy

# Load English NLP model
nlp = spacy.load("en_core_web_sm")

# Process text
doc = nlp("She is running better than her friend.")

# Print lemmatized words
for token in doc:
    print(token.text, "→", token.lemma_)

She → she
is → be
running → run
better → well
than → than
her → her
friend → friend
. → .


# **Remove contractions**
# **📌 Expanding Contractions in NLP**  

## **🚀 What Are Contractions?**  
**Contractions** are **shortened word forms** created by combining two words and **removing some letters**, replaced with an **apostrophe (`'`)**.  

🔹 **Example:**  
- `"I'm"` → `"I am"`  
- `"Don't"` → `"Do not"`  
- `"It's"` → `"It is"`  

---

## **📌 Why Expand Contractions in NLP?** 🤖  
Many **Natural Language Processing (NLP) models** struggle with contractions because they make text **informal and inconsistent**.  
Expanding them helps in **better text preprocessing**, making it easier for AI models to analyze and understand sentences.  


---

## **📌 Common Contractions & Their Expansions** 🔄  

| **Contraction** | **Expanded Form**  |  
|---------------|----------------|  
| I'm          | I am           |  
| You're       | You are        |  
| It's         | It is          |  
| He's         | He is          |  
| She's        | She is         |  
| We're        | We are         |  
| They're      | They are       |  
| Isn't        | Is not         |  
| Aren't       | Are not        |  
| Can't        | Cannot        |  
| Won't        | Will not       |  


---



**Install the Library**

In [None]:
%%capture
!pip install contractions

In [None]:
import contractions

text = "I'm learning NLP, but I won't give up!"
expanded_text = contractions.fix(text)

print(f"Before: {text}")
print(f"After : {expanded_text}")


Before: I'm learning NLP, but I won't give up!
After : I am learning NLP, but I will not give up!


# **Remove Punctuation**

In [None]:
import string
#This is a constant in Python's string module that contains all standard punctuation characters.
string.punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

[string_maketrans](https://www.w3schools.com/python/ref_string_maketrans.asp)

In [None]:
#The maketrans() method returns a mapping table that can be used with the translate() method to replace specified characters.
txt = "Hello Sam!"
mytable = str.maketrans("S", "P")
print(mytable)

{83: 80}


In [None]:
# Use a mapping table to replace many characters
txt = "Hi Sam!"
x = "mSa"
y = "eJo"
mytable = str.maketrans(x, y)
print(txt.translate(mytable))

Hi Joe!


In [None]:
# The third parameter in the mapping table describes characters that you want to remove from the string
txt = "Good night Sam!"
x = "mSa"
y = "eJo"
z = "odnght"
mytable = str.maketrans(x, y, z)
print(txt.translate(mytable))

G i Joe!


In [None]:
import string

text = "Hello, World! How's everything?"
translator = str.maketrans('', '', string.punctuation)
clean_text = text.translate(translator)

print(f'Original: {text}')
print(f'Cleaned : {clean_text}')


Original: Hello, World! How's everything?
Cleaned : Hello World Hows everything
