# Stemming

Stemming is the process of reducing a word to its word stem that affixes to suffixes and prefixes or to the roots of words known as a lemma. Stemming is important in natural language understanding (NLU) and natural language processing (NLP).



In [1]:
from nltk.stem import PorterStemmer

In [2]:
stemming = PorterStemmer()

In [3]:
words=["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

In [4]:
for word in words:
    print(word+"---->"+stemming.stem(word))


eating---->eat
eats---->eat
eaten---->eaten
writing---->write
writes---->write
programming---->program
programs---->program
history---->histori
finally---->final
finalized---->final


In [5]:
stemming.stem('congratulations')

'congratul'

In [6]:
stemming.stem('sitting')

'sit'

## Lancaster  Stemming algorithm


In [8]:
from nltk.stem import LancasterStemmer

lancaster=LancasterStemmer()

In [9]:
for word in words:
    print(word+"---->"+lancaster.stem(word))

eating---->eat
eats---->eat
eaten---->eat
writing---->writ
writes---->writ
programming---->program
programs---->program
history---->hist
finally---->fin
finalized---->fin


## RegexpStemmer Class

NLTK has RegexpStemmer class with the help of which we can easily implement Regular Expression Stemmer algorithms. It basically takes a single regular expression and removes any prefix or suffix that matches the expression. Let us see an example



In [11]:
from nltk.stem import RegexpStemmer
reg_stemmer = RegexpStemmer('ing|s$|e$|able$', min=4)

In [12]:
reg_stemmer.stem("eating")

'eat'

In [13]:
reg_stemmer.stem("ingplaying")

'play'

## Snowball Stemmer

In [14]:
from nltk.stem import SnowballStemmer
snowballstemmer=SnowballStemmer('english',ignore_stopwords=False)


In [15]:
for word in words:
    print(word+"---->"+snowballstemmer.stem(word))

eating---->eat
eats---->eat
eaten---->eaten
writing---->write
writes---->write
programming---->program
programs---->program
history---->histori
finally---->final
finalized---->final


In [16]:
stemming.stem("fairly"),stemming.stem("sportingly")

('fairli', 'sportingli')

In [17]:
snowballstemmer.stem("fairly"),snowballstemmer.stem("sportingly")

('fair', 'sport')

# Wordnet Lemmatizer

Lemmatization technique is like stemming. The output we will get after lemmatization is called ‘lemma’, which is a root word rather than root stem, the output of stemming. After lemmatization, we will be getting a valid word that means the same thing.

NLTK provides WordNetLemmatizer class which is a thin wrapper around the wordnet corpus. This class uses morphy() function to the WordNet CorpusReader class to find a lemma. Let us understand it with an example −

In [18]:
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

In [19]:
for word in words:
    print(word+"---->"+lemmatizer.lemmatize(word,pos='v'))

eating---->eat
eats---->eat
eaten---->eat
writing---->write
writes---->write
programming---->program
programs---->program
history---->history
finally---->finally
finalized---->finalize


In [20]:
lemmatizer.lemmatize("good",pos='v')

'good'

In [None]:
## Sentiment Analysis-- stemming
## Chatbot---lemmatization