## **Stemming**
Stemming is a text normalization process in natural language processing (NLP) that involves reducing words to their base or root form, known as the "stem." The purpose of stemming is to simplify words and group together different inflections or derivations of the same word so that they can be treated as a single entity.

For example, consider the words "running," "ran," and "runner." The stem of these words is "run." Stemming would transform all these words into the common base form:

- running -> run
- ran -> run
- runner -> run

The idea is to eliminate variations in word forms so that similar words are treated as equivalent during text analysis, retrieval, or other NLP tasks.

In [1]:
words=["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

### PorterStemmer
The Porter stemming algorithm is a process for removing suffixes from words in English. Removing suffixes. automatically is an operation which is especially useful in the field of information retrieval.

In [2]:
from nltk.stem import PorterStemmer

In [3]:
porter_stem = PorterStemmer()

In [6]:
for i, word in enumerate(words):
    print(f"{i+1}. {word} --> {porter_stem.stem(word)}")

1. eating --> eat
2. eats --> eat
3. eaten --> eaten
4. writing --> write
5. writes --> write
6. programming --> program
7. programs --> program
8. history --> histori
9. finally --> final
10. finalized --> final


### RegexpStemmer class
A stemmer that uses regular expressions to identify morphological affixes. Any substrings that match the regular expressions will be removed.

In [9]:
from nltk.stem import RegexpStemmer

In [16]:
reg_stemmer = RegexpStemmer('ing$|s$|e$|able$', min=4)

In [18]:
for i, word in enumerate(words):
    print(f"{i+1}. {word} --> {reg_stemmer.stem(word)}")

1. eating --> eat
2. eats --> eat
3. eaten --> eaten
4. writing --> writ
5. writes --> write
6. programming --> programm
7. programs --> program
8. history --> history
9. finally --> finally
10. finalized --> finalized


### Snowball Stemmer
Snowball Stemmer is also known as the Porter2 stemming algorithm because it is a better version of the Porter Stemmer. It is more aggressive than Porter Stemmer.

In [19]:
from nltk.stem import SnowballStemmer

In [20]:
sn_stemmer = SnowballStemmer('english')

In [21]:
for i, word in enumerate(words):
    print(f"{i+1}. {word} --> {sn_stemmer.stem(word)}")

1. eating --> eat
2. eats --> eat
3. eaten --> eaten
4. writing --> write
5. writes --> write
6. programming --> program
7. programs --> program
8. history --> histori
9. finally --> final
10. finalized --> final


Snowball vs porter:

In [23]:
porter_stem.stem("fairly"), sn_stemmer.stem("fairly")

('fairli', 'fair')

In [24]:
porter_stem.stem("sportingly"), sn_stemmer.stem("sportingly")

('sportingli', 'sport')