---

## **Notes on the Code**

### **Concept of Stemming**
Stemming is the process of reducing words to their root or base form by removing prefixes or suffixes. This is commonly used in Natural Language Processing (NLP) to normalize text data for tasks like text classification, search engines, and sentiment analysis.

Different stemming algorithms have different rules for how they reduce words to their base form. This code demonstrates three popular stemming techniques: **Porter Stemmer**, **Regexp Stemmer**, and **Snowball Stemmer**.

---

### **Step-by-Step Explanation**

#### **1. Input Word List**
```python
words = ["eating", "eats", "eaten", "writing", "writes", "programming", "programs", "history", "finally", "finalized"]
```
This list contains words in various forms, including:
- **Present participle**: "eating," "writing"
- **Third-person singular**: "eats," "writes"
- **Past participle**: "eaten," "finalized"
- **Base form**: "programs," "history"

The goal is to reduce these words to their base forms using different stemmers.

---

#### **2. Porter Stemmer**
```python
from nltk.stem import PorterStemmer
ps = PorterStemmer()
for word in words:
    print(word + "---->" + ps.stem(word))
```
- **Porter Stemmer**: This is one of the oldest and most widely used stemming algorithms. It uses a set of heuristic rules to iteratively remove suffixes and reduce words to their stems.
- Example transformations:
  - "eating" → "eat"
  - "writing" → "write"
  - "programming" → "program"
  - "finalized" → "final"
- Additional examples:
  - `ps.stem('congratulations')` → "congratul"
  - `ps.stem('sitting')` → "sit"
- **Limitations**: Porter Stemmer sometimes over-stems or under-stems, leading to results that may not always match the intended base form (e.g., "congratulations" → "congratul").

---

#### **3. Regexp Stemmer**
```python
from nltk.stem import RegexpStemmer
reg_stem = RegexpStemmer('ing|s$|e$|able$', min=4)
```
- **Regexp Stemmer**: Allows custom stemming by defining a regular expression to remove specific patterns from words.
- Parameters:
  - `'ing|s$|e$|able$'`: Removes the suffixes "ing," "s" (if it appears at the end), "e," and "able."
  - `min=4`: Ensures that the stemmed word must be at least 4 characters long.
- Example transformations:
  - `reg_stem.stem('eating')` → "eat"
  - `reg_stem.stem('ingeating')` → "ingeat"
- **Use Case**: This stemmer is useful when you need precise control over the stemming process using custom rules.

---

#### **4. Snowball Stemmer**
```python
from nltk.stem import SnowballStemmer
snow_stem = SnowballStemmer('english')
for word in words:
    print(word + "----->" + snow_stem.stem(word))
```
- **Snowball Stemmer**: A more advanced and efficient version of the Porter Stemmer. It is language-specific and supports multiple languages (e.g., "english," "french," etc.).
- Example transformations:
  - "eating" → "eat"
  - "writes" → "write"
  - "programming" → "program"
  - "history" → "histori" (stems historical terms)
- **Advantages**:
  - More accurate and consistent than Porter Stemmer.
  - Better handling of irregular suffixes and edge cases.

---

### **Comparison of Stemmers**
| **Stemmer**       | **Description**                                                                                     | **Use Case**                                 |
|--------------------|----------------------------------------------------------------------------------------------------|---------------------------------------------|
| **Porter Stemmer** | Heuristic-based, iterative stemming process.                                                       | Quick and basic stemming.                   |
| **Regexp Stemmer** | Customizable stemming using regular expressions.                                                   | Fine-tuned stemming for specific patterns.  |
| **Snowball Stemmer** | Advanced and efficient stemming with better language-specific handling than Porter Stemmer.       | More precise and language-aware stemming.   |

---

### **Summary**
This code demonstrates how to use three different stemming algorithms to reduce words to their base form. Depending on the context and requirements, you can choose:
- **Porter Stemmer** for quick and basic stemming.
- **Regexp Stemmer** for precise control over suffix removal.
- **Snowball Stemmer** for more accurate and language-specific stemming.

By normalizing words using these techniques, NLP tasks like text mining, classification, and search indexing can be improved.

In [None]:
words = ["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

In [None]:
##PorterStremmer
from nltk.stem import PorterStemmer
ps = PorterStemmer()

In [None]:
for word in words:
  print(word+"---->"+ ps.stem(word))

eating---->eat
eats---->eat
eaten---->eaten
writing---->write
writes---->write
programming---->program
programs---->program
history---->histori
finally---->final
finalized---->final


In [None]:
ps.stem('congratulations')

'congratul'

In [None]:
ps.stem('sitting')

'sit'

In [None]:
##RegexpStemmer
from nltk.stem import RegexpStemmer
reg_stem = RegexpStemmer('ing|s$|e$|able$', min=4)

In [None]:
reg_stem.stem('eating')

'eat'

In [None]:
reg_stem.stem('ingeating')

'eat'

In [None]:
##SnowBall Stemmer
from nltk.stem import SnowballStemmer
snow_stem = SnowballStemmer('english')

In [None]:
for word in words:
  print(word+"----->"+snow_stem.stem(word))

eating----->eat
eats----->eat
eaten----->eaten
writing----->write
writes----->write
programming----->program
programs----->program
history----->histori
finally----->final
finalized----->final
