## Stemming
Stemming is the process of reducing a word to its word stem that affixes to suffixes and prefixes or to the roots of words known as a lemma. Stemming is important in natural language understanding (NLU) and natural language processing (NLP).

In [1]:
## Classification Problem
## Comments of product is a positive review or negative review
## Reviews----> eating, eat,eaten [going,gone,goes]--->go

words=["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

### PorterStemmer

In [2]:
from nltk.stem import PorterStemmer

In [3]:
stemming=PorterStemmer()

In [4]:
for word in words:
    print(word+"---->"+stemming.stem(word))

eating---->eat
eats---->eat
eaten---->eaten
writing---->write
writes---->write
programming---->program
programs---->program
history---->histori
finally---->final
finalized---->final


In [5]:
stemming.stem('congratulations')

'congratul'

In [7]:
stemming.stem("sitting")

'sit'

### RegexpStemmer class
NLTK has RegexpStemmer class with the help of which we can easily implement Regular Expression Stemmer algorithms. It basically takes a single regular expression and removes any prefix or suffix that matches the expression. Let us see an example

In [8]:
from nltk.stem import RegexpStemmer

In [22]:
reg_stemmer=RegexpStemmer('ing$|s$|e$|able$', min=4)

In [23]:
reg_stemmer.stem('eating')

'eat'

In [24]:
reg_stemmer.stem('ingeating')

'ingeat'

### Snowball Stemmer
 It is a stemming algorithm which is also known as the Porter2 stemming algorithm as it is a better version of the Porter Stemmer since some issues of it were fixed in this stemmer.

In [1]:
from nltk.stem import SnowballStemmer

In [26]:
snowballsstemmer=SnowballStemmer('english')

In [27]:
for word in words:
    print(word+"---->"+snowballsstemmer.stem(word))

eating---->eat
eats---->eat
eaten---->eaten
writing---->write
writes---->write
programming---->program
programs---->program
history---->histori
finally---->final
finalized---->final


In [28]:
stemming.stem("fairly"),stemming.stem("sportingly")

('fairli', 'sportingli')

In [31]:
snowballsstemmer.stem("fairly"),snowballsstemmer.stem("sportingly")

('fair', 'sport')

In [33]:
snowballsstemmer.stem('goes')

'goe'

In [34]:
stemming.stem('goes')

'goe'

## 🔚 Conclusion: Stemming in Text Preprocessing

In this notebook, we explored **Stemming**, a fundamental technique in Natural Language Processing (NLP) used during text preprocessing. Here's a summary of what we covered:

---

### ✅ Key Concepts

- **Stemming** is the process of reducing words to their *word stem*, base, or root form — e.g., `eating`, `eats`, `eaten` → `eat`.
- The purpose of stemming is to **reduce vocabulary size** without losing semantic meaning, which simplifies downstream tasks like classification or sentiment analysis.

---

### 🔧 Techniques Covered

1. **Porter Stemmer**
   - Most commonly used.
   - Easy to implement using NLTK.
   - **Limitation:** May produce stemmed forms that aren't real words (e.g., `history` → `histori`).

2. **Regex-based Stemmer**
   - Customizable via regular expressions.
   - Effective when you want control over specific suffixes or patterns.
   - Requires careful handling to avoid over-stemming.

3. **Snowball Stemmer**
   - An improvement over Porter.
   - Supports multiple languages.
   - More consistent and accurate in many cases (e.g., `fairly` → `fair`, `sportingly` → `sport`).

---

### ⚠️ Limitations of Stemming

- **Not context-aware**: It simply chops suffixes without understanding the word's part of speech or meaning.
- Can lead to **meaning distortion** (e.g., `goes` → `goe`).
- Not suitable for high-precision applications like chatbots, translators, or grammar correction.

---

### 💡 When to Use Stemming?

- Tasks like **spam detection**, **sentiment analysis**, or **topic modeling**, where **approximate word meaning is sufficient**.
- When speed and simplicity are more important than perfect grammatical correctness.

---

### 🆚 What’s Next?

To overcome stemming's limitations, especially in meaning preservation, we turn to **Lemmatization** — a technique that uses vocabulary and morphology to return the proper base form of a word.

📌 *We'll cover Lemmatization in the next section!*

---

👨‍💻 **Practice Tip:** Try applying different stemming techniques to various real-world datasets and observe how the outputs affect your model's performance.

---
**Happy NLP-ing! 🚀**
