# Stemming
Stemming is the process of reducing a word to its word stem that affixes to suffixes and prefixes or to the roots of words known as a lemma. Stemming is important in natural language understanding (NLU) and natural language processing (NLP).

In [4]:
## Classification Problem
## Comments of product is a positive review or negative review
## Reviews----> eating, eat,eaten [going,gone,goes]--->go

words=["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

### PorterStemmer

In [5]:
from nltk.stem import PorterStemmer
stemming = PorterStemmer()
for word in words:
    print(word+"----->"+stemming.stem(word))


eating----->eat
eats----->eat
eaten----->eaten
writing----->write
writes----->write
programming----->program
programs----->program
history----->histori
finally----->final
finalized----->final


In [6]:
stemming.stem("Congratulations")

'congratul'

In [7]:
stemming.stem("sitting")

'sit'

### RegexpStemmer class
NLTK has RegexpStemmer class with the help of which we can easily implement Regular Expression Stemmer algorithms. It basically takes a single regular expression and removes any prefix or suffix that matches the expression. Let us see an example

In [8]:
from nltk.stem import RegexpStemmer
rege_stemmer = RegexpStemmer("ing$|s$|ly$|able$",min=4)
for word in words:
    print(word+"----->"+rege_stemmer.stem(word))


eating----->eat
eats----->eat
eaten----->eaten
writing----->writ
writes----->write
programming----->programm
programs----->program
history----->history
finally----->final
finalized----->finalized


In [9]:
rege_stemmer.stem("writing")


'writ'

In [10]:
rege_stemmer.stem("eating")

'eat'

In [11]:
rege_stemmer.stem('ingeating')

'ingeat'

### Snowball Stemmer
It is a stemming algorithm which is also known as the Porter2 stemming algorithm as it is a better version of the Porter Stemmer since some issues of it were fixed in this stemmer.

In [12]:
from nltk.stem import SnowballStemmer
snow_stemmer = SnowballStemmer("english")
for word in words:
    print(word+"----->"+snow_stemmer.stem(word))

eating----->eat
eats----->eat
eaten----->eaten
writing----->write
writes----->write
programming----->program
programs----->program
history----->histori
finally----->final
finalized----->final


In [13]:
stemming.stem("fairly"),stemming.stem("sportingly")

('fairli', 'sportingli')

In [14]:
snow_stemmer.stem("fairly"),snow_stemmer.stem("sportingly")

('fair', 'sport')

In [15]:
snow_stemmer.stem('goes')
'goe'


'goe'

In [16]:
stemming.stem('goes')
'goe'

'goe'

## Wordnet Lemmatizer
Lemmatization technique is like stemming. The output we will get after lemmatization is called ‘lemma’, which is a root word rather than root stem, the output of stemming. After lemmatization, we will be getting a valid word that means the same thing.

NLTK provides WordNetLemmatizer class which is a thin wrapper around the wordnet corpus. This class uses morphy() function to the WordNet CorpusReader class to find a lemma. Let us understand it with an example −

In [20]:
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

In [21]:
for word in words:
    print(word + "------>" + lemmatizer.lemmatize(word, pos='v'))


eating------>eat
eats------>eat
eaten------>eat
writing------>write
writes------>write
programming------>program
programs------>program
history------>history
finally------>finally
finalized------>finalize
