stemming is the process of reducing a word to its root or base form, called the "stem." The idea is to strip away prefixes and suffixes to obtain a word's core meaning, which can help reduce variations of a word to a single form. For instance, words like "running," "runner," and "ran" may all be reduced to the stem "run."

## Stemming technique 1 - Porter stemming

One of the oldest and most commonly used. It's rule-based and reduces words according to predefined suffix-removal rules.

In [2]:
!pip install nltk
from nltk.stem import PorterStemmer

stemmar = PorterStemmer()

words = ["eating", "eats", "eat", "ate", "adjustable", "rafting", "ability", "meeting"]

for word in words:
    print(word, ":", stemmar.stem(word))

eating : eat
eats : eat
eat : eat
ate : ate
adjustable : adjust
rafting : raft
ability : abil
meeting : meet


## Disadvantage - While using stemming, you may not get the correct word as a result.

For example -

In [3]:
stemmar.stem("congratulations")

'congratul'

# RegexpStemmer class

NLTK has RegexpStemmer class with the help of which we can easily implement Regular
Expression Stemmer algorithms. It basically takes a single regular expression and removes any
prefix or suffix that matches the expression. Let us see an example

In [4]:
from nltk.stem import RegexpStemmer

regexp = RegexpStemmer('ing$|s$|e$|able$', min=4)

# if dollar($) is written after the word, then it will remove that suffix from provided word.
# if dollar($) is written before the word, then it will remove that prefix from provided word.

regexp.stem("creates")

'create'

In [5]:
regexp.stem("create")

'creat'

## SnowBall Stemmer -

The Snowball Stemmer, also known as the Porter2 Stemmer, is an advanced and versatile stemming algorithm used in natural language processing. It is an improvement upon the original Porter Stemmer, developed by Martin Porter, and is designed to handle the morphological complexities of various languages more accurately and consistently.

In [6]:
from nltk.stem import SnowballStemmer

snowball = SnowballStemmer('english')

for word in words:
    print(word, "--->", snowball.stem(word))

eating ---> eat
eats ---> eat
eat ---> eat
ate ---> ate
adjustable ---> adjust
rafting ---> raft
ability ---> abil
meeting ---> meet


### difference between porter stemmer and snowball stemmer -

In [8]:
stemmar.stem('fairly'), snowball.stem('fairly')

('fairli', 'fair')

In [9]:
# even snowball stemmer gives incorrect results

snowball.stem('goes')

'goe'

## these techniques cannot be used for usecases like chatbots,
# so we have to go for lemmatization