Stemming is the process of reducing a word to its root or base form by removing suffixes or prefixes. For example, "running", "runs", and "ran" are all reduced to the root word "run".

In [37]:
words=["eating","eat","eaten","writing","writes","programming","programs","history","finally","finalize"]

## Porter Stemmer

Porter Stemmer is one of the most widely used stemming algorithms in NLP. It applies a set of rules to iteratively remove common morphological and inflectional endings from words in English—for example, it reduces “caresses” to “caress” and “ponies” to “poni”.


In [38]:
from nltk.stem import PorterStemmer

In [39]:
stemming=PorterStemmer()

In [40]:
for word in words:
    print(word + "----->" + stemming.stem(word))

eating----->eat
eat----->eat
eaten----->eaten
writing----->write
writes----->write
programming----->program
programs----->program
history----->histori
finally----->final
finalize----->final


The major disadvantage of Stemming is that it can change the meaning of word like that:

In [41]:
stemming.stem("congratulations")

'congratul'

In [42]:
stemming.stem("sitting")

'sit'

## RegexpStemmer Class

RegexpStemmer (Regular Expression Stemmer) is a rule-based stemmer in NLTK that uses regular expressions to strip prefixes or suffixes from words. It gives you full control to define custom stemming rules.


In [43]:
from nltk.stem import RegexpStemmer

In [44]:
reg_stemming=RegexpStemmer("ing$|s$|e$|able$", min=4)

In [45]:
reg_stemming.stem("eating")

'eat'

In [46]:
reg_stemming.stem("ingeating")

'ingeat'

In [47]:
reg_stemming.stem("writes")

'write'

In [48]:
reg_stemming.stem("congratulating")

'congratulat'

In [49]:
reg_stemming.stem("congratulations")

'congratulation'

## SnowballStemmer

Snowball Stemmer (also known as the Porter2 Stemmer) is an improved version of the Porter Stemmer. It is more aggressive, consistent, and supports multiple languages (unlike the original Porter stemmer, which only supports English).

In [50]:
from nltk.stem import SnowballStemmer

In [51]:
snow_stemming=SnowballStemmer("english")

In [52]:
for word in words:
    print(word + "---->" + snow_stemming.stem(word))

eating---->eat
eat---->eat
eaten---->eaten
writing---->write
writes---->write
programming---->program
programs---->program
history---->histori
finally---->final
finalize---->final


Difference b/w PorterStemmer & SnowballStemmer

In [53]:
#PorterStemmer
stemming.stem("fairly"),stemming.stem("sportingly")

('fairli', 'sportingli')

In [54]:
#SnowballStemmer
snow_stemming.stem("fairly"),snow_stemming.stem("sportingly")

('fair', 'sport')