## Stemming

Stemming is the process of reducing a word to its word stem that affixes to suffixes and prefixes or to the roots of words known as a lemma. Stemming is important in natural language understanding (NLU) and natural language processing (NLP).


##### stemming is not much used in chatbot instead we use **lemmatization**

In [17]:
## Classification Problem
## Comments of product is a positive review or negative review
## Reviews----> eating, eat,eaten [going,gone,goes]--->go

words=["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

### What is PorterStemmer?

The **Porter Stemmer** is an algorithm used in **Natural Language Processing (NLP)** to reduce words to their root or base form (called the **stem**). It was developed by Martin Porter in 1980 and is one of the most widely used stemming techniques. The goal is to simplify words by stripping suffixes, reducing words like "running," "runner," and "ran" to the root form "run."

#### How it Works:

Porter Stemmer applies a series of rule-based steps to remove common word endings or suffixes. It operates in phases, each of which attempts to remove various suffixes and produce the stem form of the word.

For example:
- **running** → **run**
- **happiness** → **happi**
- **connection** → **connect**

---

#### Advantages of Porter Stemmer:

- **Reduces Dimensionality in NLP Tasks**:
   - Reducing words to their stems decreases the size of the vocabulary, simplifying models and computations in tasks like classification or clustering.

- **Widely Adopted**:
   - The algorithm is well-known and implemented in many NLP libraries (like NLTK and SpaCy), making it easy to integrate into projects.


**importing porterStemmer class from nltk.stem** 

In [18]:
from nltk.stem import PorterStemmer 

In [19]:
# creating instance of class 
stemming=PorterStemmer()

In [None]:
# stem() use to find the stem word of giving word 
# like eat is stem word of eating

for word in words:
    print(word+"---->"+stemming.stem(word))

you can observe that the porterstemmer() convert word into there root words 
but there is ***disadvantage*** is that it produces non-linguistic stems like 
history---->histori ***which is not same meaning as history***

### RegexpStemmer class
NLTK has RegexpStemmer class with the help of which we can easily implement Regular Expression Stemmer algorithms. It basically takes a single regular expression and removes any prefix or suffix that matches the expression. Let us see an example

In [21]:
#IMPORTING RegexpStemmer class from nltk.stem

from nltk.stem import RegexpStemmer

In [22]:
# RegexpSTemmer take two positional argument (regexp,min)
# where regexp is expression which take out stem word
# like ing,s,es,able etc where $ indicate remove from last

reg_expression=RegexpStemmer('ing|s$|able$',min=3)


In [None]:
reg_expression.stem('eating')

In [None]:
reg_expression.stem('ingeating')

In [None]:
reg_expression.stem('noteable')

### Snowball Stemmer
 It is a stemming algorithm which is also known as the Porter2 stemming algorithm as it is a better version of the Porter Stemmer since some issues of it were fixed in this stemmer.

In [29]:
# Importing snowballStemmer

from nltk.stem import SnowballStemmer

In [30]:
# Snowball Stemmer

# class SnowballStemmer(
#   language:str,
#   ignore_stopWords:bool=False
# )

# The following languages are supported: 
# Arabic, Danish, Dutch, English, Finnish, French,
# German, Hungarian, Italian, Norwegian, Portuguese, Romanian, 
# Russian, Spanish and Swedish.

snowballstemmer=SnowballStemmer('english')

In [None]:
for word in words:
    print(word+" ---> ",snowballstemmer.stem(word))

In [None]:

# difference in output of porterStemmer and SnowballStemmer/porter2

#stemming is instance of class porterStemmer
stemming.stem("fairly"),stemming.stem("sportingly")


In [None]:
# and snowballStemmer is instance of class SNowballStemmer
snowballstemmer.stem("fairly"),snowballstemmer.stem("sportingly")