## Stemming 

Stemming in NLP is a text normalization technique that chops off word endings (suffixes/prefixes) to reduce words to their base or "stem" form, like turning "running," "runs," "ran" into "run," helping to group related words for tasks like search, classification, and analysis by simplifying text and improving efficiency, though the resulting stem might not always be a real word (e.g., "arguing" to "argu"). 

In [1]:
words = ["running", "runs", "runner", "easily", "fairly","eating","eats","eaten","writing","writes","programming","programs"]

### Implementing PorterStemmer 

In [2]:
from nltk.stem import PorterStemmer
por_stem=PorterStemmer()

In [3]:
for word in words:
    print(word+"----->"+por_stem.stem(word))

running----->run
runs----->run
runner----->runner
easily----->easili
fairly----->fairli
eating----->eat
eats----->eat
eaten----->eaten
writing----->write
writes----->write
programming----->program
programs----->program


In [4]:
por_stem.stem("misuderstanding")

'misuderstand'

In [5]:
por_stem.stem("congratulations")

'congratul'

In [7]:
por_stem.stem("wellbeing")

'wellb'

In [10]:
por_stem.stem("chasing")

'chase'

## Implementing RegexpStemmer

A stemmer that uses regular expressions to identify morphological affixes. Any substrings that match the regular expressions will be removed.

In [22]:
from nltk.stem import RegexpStemmer
reg_stem=RegexpStemmer('ing$|s$|e$|able$', min=4)

In [24]:
reg_stem.stem('sleeping')

'sleep'

In [25]:
reg_stem.stem('sleep')

'sleep'

In [26]:
reg_stem.stem('Countable')

'Count'

In [27]:
reg_stem.stem('runs')

'run'

In [28]:
reg_stem.stem('non-negotiable')

'non-negoti'

### Implementing SnowballStemmer 

In [31]:
from nltk.stem import SnowballStemmer
snowball_stem=SnowballStemmer('english')

In [32]:
for word in words:
    print(word+"---->"+snowball_stem.stem(word))

running---->run
runs---->run
runner---->runner
easily---->easili
fairly---->fair
eating---->eat
eats---->eat
eaten---->eaten
writing---->write
writes---->write
programming---->program
programs---->program


## Comparing porterStemmer and  snowballStemmer

In [38]:
por_stem.stem('easily'),por_stem.stem('successfully'),por_stem.stem("fairly")

('easili', 'success', 'fairli')

In [37]:
snowball_stem.stem('easily'),snowball_stem.stem('successfully'),snowball_stem.stem("fairly")

('easili', 'success', 'fair')

In [39]:
por_stem.stem('congratulations')

'congratul'

In [40]:
snowball_stem.stem('congratulations')

'congratul'

Because in regexp it has parameter includes suffix "s"


reg_stem=RegexpStemmer('ing$|s$|e$|able$', min=4)

In [41]:
reg_stem.stem('congratulations')

'congratulation'

## Implementing Lancaster Stemming

It uses an iterative approach, and this makes it the most aggressive algorithm among the three stemmers described in this article. Due to its iterative approach, it may lead to over-stemming, which may result in the linguistically incorrect roots. It is not as efficient as a porter or snowball stemmer. Also, it only supports the English language.

In [43]:
from nltk.stem import LancasterStemmer
lan_stem=LancasterStemmer()

In [44]:
for word in words:
    print(word+"------->"+lan_stem.stem(word))

running------->run
runs------->run
runner------->run
easily------->easy
fairly------->fair
eating------->eat
eats------->eat
eaten------->eat
writing------->writ
writes------->writ
programming------->program
programs------->program


It leads to linguistic incorrect results, due to its aggressive approach