#### Stemming

**Stemming** is a text preprocessing technique in Natural Language Processing (NLP) that reduces a word to its root or base form.  
The resulting "stem" may not always be a valid word but represents the core meaning of the original word.

**Purpose of Stemming:**  
Stemming helps in reducing words to a common base to simplify text analysis by:

- Removing prefixes and suffixes.
- Normalizing words with similar meanings (e.g., run, running, runner → run).
- Example:  
**Input Words:**    
running, runner, ran  
**Stemmed Output:**   
run

**Common Stemming Algorithms:**  
- Porter Stemmer: One of the most widely used stemming algorithms.
- Lancaster Stemmer: More aggressive than Porter Stemmer.
- Snowball Stemmer: An improved version of Porter Stemmer with support for multiple languages.
- RegexpStemmer: regexp: str or regexp  
A stemmer that uses regular expressions to identify morphological affixes. Any substrings that match the regular expressions will be removed.

In [9]:
words = ["eating","eaten","ate","eat","running","run","runner","ran","doing","does","do","done","finally","finalize",
         "writing","writer","writes","programming","programmer"]
for i in words:
    print(i)

eating
eaten
ate
eat
running
run
runner
ran
doing
does
do
done
finally
finalize
writing
writer
writes
programming
programmer


In [5]:
# Porter Stemmer
from nltk.stem import PorterStemmer

stemming = PorterStemmer()

for word in words:
    print(word+" ---------------> "+stemming.stem(word))

eating ---------------> eat
eaten ---------------> eaten
ate ---------------> ate
eat ---------------> eat
running ---------------> run
run ---------------> run
runner ---------------> runner
ran ---------------> ran
doing ---------------> do
does ---------------> doe
do ---------------> do
done ---------------> done
finally ---------------> final
finalize ---------------> final
writing ---------------> write
writer ---------------> writer
writes ---------------> write
programming ---------------> program
programmer ---------------> programm


In [8]:
print(stemming.stem('Congratulations'))
print(stemming.stem('Congratulations!'))

congratul
congratulations!


In [10]:
# RegexpStemmer
 
from nltk.stem import RegexpStemmer

reg_stemmer = RegexpStemmer('ing$|s$|e$|able$|!$', min=4)   # min: The minimum length of string to stem

reg_stemmer.stem('Congratulations!')

'Congratulations'

In [12]:
reg_stemmer.stem('Congratulations')

'Congratulation'

In [15]:
## snowball stemmer

from nltk.stem import SnowballStemmer

snowb_stemmer = SnowballStemmer("english")

snowb_stemmer.stem('eating')

'eat'

In [17]:
for i in words:
    print(i+" --------------------> "+snowb_stemmer.stem(i))

eating --------------------> eat
eaten --------------------> eaten
ate --------------------> ate
eat --------------------> eat
running --------------------> run
run --------------------> run
runner --------------------> runner
ran --------------------> ran
doing --------------------> do
does --------------------> doe
do --------------------> do
done --------------------> done
finally --------------------> final
finalize --------------------> final
writing --------------------> write
writer --------------------> writer
writes --------------------> write
programming --------------------> program
programmer --------------------> programm


In [20]:
snowb_stemmer.stem('Congratulations')

'congratul'

In [21]:
snowb_stemmer.stem('sportingly')

'sport'

In [22]:
stemming.stem('sportingly')

'sportingli'

In [23]:
## LancasterStemmer

from nltk.stem import LancasterStemmer


lan_stemmer = LancasterStemmer()

lan_stemmer.stem("sportingly")

'sport'

In [24]:
lan_stemmer.stem('Congratulations')

'congrat'

# End!