# Stemming
#### In Natural Language Processing (NLP), stemming is the process of reducing words to their root or base form by removing suffixes or prefixes. The root form (also called the "stem") may not be a valid word, but it helps group similar words together

In [4]:
words = ["Eat","Eaten","Eating","Drink","Drunk","Drinking","Swimming","Wrote","Written","Dancing","Dance","Sang","Singing"]

# Porter Stemmer

In [5]:
from nltk.stem import PorterStemmer

In [6]:
stemming=PorterStemmer()

In [8]:
for word in words:
    print(word + "------>" + stemming.stem(word))

Eat------>eat
Eaten------>eaten
Eating------>eat
Drink------>drink
Drunk------>drunk
Drinking------>drink
Swimming------>swim
Wrote------>wrote
Written------>written
Dancing------>danc
Dance------>danc
Sang------>sang
Singing------>sing


### There are some disadvantages of stemming , it won't convert all word into it's original.

# Snowball 
#### In NLP, the Snowball Stemmer is an improved version of the Porter Stemmer, designed to handle stemming more efficiently and for multiple languages. 

In [9]:
from nltk.stem import SnowballStemmer

In [10]:
snowball_stemmer = SnowballStemmer('english')

In [11]:
for word in words:
    print(word + "----->" + snowball_stemmer.stem(word))

Eat----->eat
Eaten----->eaten
Eating----->eat
Drink----->drink
Drunk----->drunk
Drinking----->drink
Swimming----->swim
Wrote----->wrote
Written----->written
Dancing----->danc
Dance----->danc
Sang----->sang
Singing----->sing


### Diffenrence between PorterStemmer and SnowballStemmner

In [12]:
stemming.stem('fairly'),stemming.stem('Sportingly')

('fairli', 'sportingli')

In [13]:
snowball_stemmer.stem('fairly'),snowball_stemmer.stem('Sportingly')

('fair', 'sport')

# RegexStemmer
#### In NLP, RegexStemmer is a type of stemming method that uses regular expressions (regex) to define specific rules for removing suffixes or prefixes from words. Unlike standard stemmers like Porter or Snowball, which follow a set of predefined rules, a RegexStemmer allows you to customize how words are reduced to their base form.

In [14]:
from nltk.stem import RegexpStemmer

In [15]:
reg_stemmer=RegexpStemmer('ing$|s$|e$|able$', min=4)

In [16]:
reg_stemmer.stem('eating')

'eat'

In [17]:
reg_stemmer.stem('available')

'avail'