## **Stemming in NLP**
Stemming is a text normalization technique that reduces words to their root form by **removing suffixes**. It is commonly used in **text processing, search engines, and NLP applications**.

### **Examples:**
- `writing` → `writ`
- `happily` → `happili`
- `running` → `run`

In [8]:
words = [
    "writing", "writes", "written", "writer", "wrote", "rewrite", "rewriting", "rewritten",
    "running", "runs", "ran", "runner", "runners",
    "studying", "studies", "studied", "student", "students",
    "playing", "plays", "played", "player", "players",
    "singing", "sings", "sang", "singer", "singers"]


## **Porter Stemmer in NLP**
The **Porter Stemmer** is a widely used stemming algorithm that reduces words to their root form by applying a set of **suffix-stripping rules**.

### **Examples:**
- `running` → `run`
- `happily` → `happili`
- `studies` → `studi`
- `writing` → `write`

✅ **Efficient and fast**  
✅ **Good for text normalization**  
❌ **Sometimes over-stems words (e.g., "happiness" → "happi")**


In [9]:
from nltk.stem import PorterStemmer

In [10]:
stemming= PorterStemmer()

In [11]:
for word in words:
    print(word+ "-->"+stemming.stem(word))

writing-->write
writes-->write
written-->written
writer-->writer
wrote-->wrote
rewrite-->rewrit
rewriting-->rewrit
rewritten-->rewritten
running-->run
runs-->run
ran-->ran
runner-->runner
runners-->runner
studying-->studi
studies-->studi
studied-->studi
student-->student
students-->student
playing-->play
plays-->play
played-->play
player-->player
players-->player
singing-->sing
sings-->sing
sang-->sang
singer-->singer
singers-->singer


In [12]:
stemming.stem('congratulations')

'congratul'

In [13]:
stemming.stem('history')

'histori'

## **RegexpStemmer in NLP**
`RegexpStemmer` is a rule-based stemming technique in **NLTK** that removes or replaces parts of words using **regular expressions**. It is useful for customizing stemming rules when other stemmers are too aggressive or not precise.

**Example:** Removing common suffixes like `ing`, `ly`, `ed`, `es`, `s`.


In [14]:
from nltk.stem import RegexpStemmer

In [19]:
regex_stemmer= RegexpStemmer('ing$|s$|e$|ed$|able$', min=4)

In [20]:
for word in words:
    print(word+ "-->"+regex_stemmer.stem(word))

writing-->writ
writes-->write
written-->written
writer-->writer
wrote-->wrot
rewrite-->rewrit
rewriting-->rewrit
rewritten-->rewritten
running-->runn
runs-->run
ran-->ran
runner-->runner
runners-->runner
studying-->study
studies-->studie
studied-->studi
student-->student
students-->student
playing-->play
plays-->play
played-->play
player-->player
players-->player
singing-->sing
sings-->sing
sang-->sang
singer-->singer
singers-->singer


## **Snowball Stemmer in NLP**
The **Snowball Stemmer** is an improved version of the **Porter Stemmer** that provides better accuracy and supports multiple languages. It is useful for text processing tasks where **stemming precision** is important.

### **Examples:**
- `running` → `run`
- `happily` → `happili`
- `studying` → `studi`

**Supports languages like:** English, French, Spanish, German, and more.


In [22]:
from nltk.stem import SnowballStemmer

In [24]:
snowball_stemmer= SnowballStemmer('english')

In [25]:
for word in words:
    print(word+ "-->"+snowball_stemmer.stem(word))

writing-->write
writes-->write
written-->written
writer-->writer
wrote-->wrote
rewrite-->rewrit
rewriting-->rewrit
rewritten-->rewritten
running-->run
runs-->run
ran-->ran
runner-->runner
runners-->runner
studying-->studi
studies-->studi
studied-->studi
student-->student
students-->student
playing-->play
plays-->play
played-->play
player-->player
players-->player
singing-->sing
sings-->sing
sang-->sang
singer-->singer
singers-->singer


## **Comparison between PorterStemmer and SnowballStemmer**

In [27]:
stemming.stem("fairly"), stemming.stem("sportingly")

('fairli', 'sportingli')

In [28]:
snowball_stemmer.stem("fairly"), snowball_stemmer.stem("sportingly")

('fair', 'sport')

## **Disadvantages of Stemming**
Stemming is a technique used to reduce words to their root form by removing suffixes. However, it has several disadvantages:

- **Over-Stemming**: Sometimes removes too much of the word, leading to incorrect roots.  
  - Example: `"happiness"` → `"happi"`
- **Under-Stemming**: Some word variations are not reduced to the same root.  
  - Example: `"ran"` and `"running"` are not stemmed to the same form.
- **Not Context-Aware**: It does not consider the meaning of words.  
  - Example: `"better"` is stemmed to `"better"`, but its actual root should be `"good"`.
- **Not Always a Real Word**: Stemming produces roots that may not exist in the dictionary.  
  - Example: `"studies"` → `"studi"`, which is not a real word.
- **Not Suitable for Chatbots**: Since stemming does not consider word meanings, it is **not ideal for conversational AI**.  
  - Example: `"went"` (past tense of `"go"`) is not stemmed correctly.

### **Use Lemmatization Instead?**
Lemmatization solves these problems by converting words to their **actual dictionary form** using a vocabulary and morphological analysis. It is more **accurate and context-aware**, making it better for applications like **chatbots, text analysis, and NLP**.
