# Stemming

Stemming is the process of reducing a word to its word stem that affixes to suffixes and prefixes or to the roots of words known as a lemma. Stemming is important in natural language understanding (NLU) and nautral language processing (NLP).

In [1]:
## Classification Problem
## Comments of the product is a positive review or negative review
## Reviews ----> eating, eat, eaten
## The root of the above words or stemming of the above words is "eat"

words = ["eating", "eats", "eaten", "writing", "writes", "programming", "programs", "history", "finally", "finalize"]

## PorterStemmer

In [2]:
from nltk.stem import PorterStemmer

In [3]:
stemming = PorterStemmer()

In [4]:
for word in words:
    print(word + " : " + stemming.stem(word))

eating : eat
eats : eat
eaten : eaten
writing : write
writes : write
programming : program
programs : program
history : histori
finally : final
finalize : final


In [5]:
stemming.stem("congratulations")

'congratul'

In [6]:
stemming.stem("sitting")

'sit'

## Cons about stemming

Some words loss their meaning like contratulations or history. This can be solved with limitation. Be aware about this issue with stemming.

# RegexpStemmer class

NLTK has RegexpStemmer class with the help of which we can easily implement Regular Expression Stemmer algorithms. It basically takes a single regular expression and remove prefix or suffix that matches the expression. Let us see an example.

In [7]:
from nltk.stem import RegexpStemmer

In [None]:
# Remove whatever ends on ing, s, e, or able
# Dollar sign means ends with
regex_stemmer = RegexpStemmer("ing$|s$|e$|able$", min=4)

In [10]:
regex_stemmer.stem('eating')

'eat'

In [11]:
regex_stemmer.stem('ingeating')

'ingeat'

# Snowball Stemmer class

Better technique compared to PorterStemmer

In [12]:
from nltk.stem import SnowballStemmer

In [13]:
snowball_stemmer = SnowballStemmer("english")

In [15]:
for word in words:
    print(word + " --------> " + snowball_stemmer.stem(word))

eating --------> eat
eats --------> eat
eaten --------> eaten
writing --------> write
writes --------> write
programming --------> program
programs --------> program
history --------> histori
finally --------> final
finalize --------> final


In [16]:
## Difference between PorterStemmer and SnowballStemmer
stemming.stem("fairly"), stemming.stem("sportingly")

('fairli', 'sportingli')

In [17]:
snowball_stemmer.stem("fairly"), snowball_stemmer.stem("sportingly")

('fair', 'sport')