In [3]:
words = ['writing', 'writes', 'written', 'writer', 'wrote', 'eating', 'eaten', 'eats', 'ate', 'runner', 'running', 'ran']

### Porter Stemmer


In [4]:
from nltk.stem import PorterStemmer
ps = PorterStemmer()

for word in words:
    print(f"{word} --> {ps.stem(word)}")

writing --> write
writes --> write
written --> written
writer --> writer
wrote --> wrote
eating --> eat
eaten --> eaten
eats --> eat
ate --> ate
runner --> runner
running --> run
ran --> ran


Porter Stemmer can not find the root word for every word. This is the major disadvantage of it.

In [None]:
ps.stem("history") # history --> histori (Wrong)

'histori'

## RegexpStemmer

In [5]:
from nltk.stem import RegexpStemmer
rs = RegexpStemmer('ing$|s$|e$|r$')
for word in words:
    print(f"{word} --> {rs.stem(word)}")

writing --> writ
writes --> write
written --> written
writer --> write
wrote --> wrot
eating --> eat
eaten --> eaten
eats --> eat
ate --> at
runner --> runne
running --> runn
ran --> ran


This is not much efficient as it only removes `ing`, `s`, `e` and `s` from the last without properly reducing it to its proper form

## SnowballStemmer

In [6]:
from nltk.stem import SnowballStemmer
ss = SnowballStemmer('english')
for word in words:
    print(f"{word} --> {ss.stem(word)}")

writing --> write
writes --> write
written --> written
writer --> writer
wrote --> wrote
eating --> eat
eaten --> eaten
eats --> eat
ate --> ate
runner --> runner
running --> run
ran --> ran


This gives pretty much same result as PorterStemmer but it is slightly better than it.

In [7]:
ps.stem("fairly"), ps.stem("sportingly")

('fairli', 'sportingli')

In [8]:
ss.stem("fairly"), ss.stem("sportingly")

('fair', 'sport')

Snowball Stemmer was able to reduce both words to its lemma correctly whereas Porter Stemmer couldn't