Stemming is a text normalization technique used in Natural Language Processing (NLP) to reduce words to their root or base form. It is a crucial step in tokenization and helps improve search results, text mining, and information retrieval by treating different forms of a word as the same.



Stemming removes suffixes and prefixes from words to get their root form (stem). However, this process is often rule-based and heuristic, meaning it may not always produce a valid word.



Common Stemming Algorithms <br>
Porter Stemmer – One of the most widely used stemming algorithms, based on a set of heuristic rules.<br>
Lancaster Stemmer – More aggressive and can produce very short stems.<br>
Snowball Stemmer – An improved version of Porter Stemmer, supporting multiple languages.<br>

In [8]:
words = ['fairly','fasting','having', 'running','history','historically','final','stemming','algorithms','however','removes']


In [9]:
from nltk import PorterStemmer

porter_stemmer = PorterStemmer()
for word in words:
    print(f"{word}------> {porter_stemmer.stem(word)}")

fairly------> fairli
fasting------> fast
having------> have
running------> run
history------> histori
historically------> histor
final------> final
stemming------> stem
algorithms------> algorithm
however------> howev
removes------> remov


In [11]:
from nltk import SnowballStemmer

snowball_stemmer = SnowballStemmer("english")
for word in words:
    print(f"{word}------> {snowball_stemmer.stem(word)}")

fairly------> fair
fasting------> fast
having------> have
running------> run
history------> histori
historically------> histor
final------> final
stemming------> stem
algorithms------> algorithm
however------> howev
removes------> remov


In [14]:
def stem_formatted(word):
        stem_word = snowball_stemmer.stem(word)
        print(f"{word}------> {stem_word}")
        return stem_word

stems = [stem_formatted(word) for word in words]

fairly------> fair
fasting------> fast
having------> have
running------> run
history------> histori
historically------> histor
final------> final
stemming------> stem
algorithms------> algorithm
however------> howev
removes------> remov


In [None]:
print(stems)

['fair', 'fast', 'have', 'run', 'histori', 'histor', 'final', 'stem', 'algorithm', 'howev', 'remov']


: 