# STEMMING
Stemming is the process of reducing a word to its word stem that affixes to suffixes and prefixes or to the roots of words known as a lemma. Stemming is important in natural language understanding (NLU) and natural language processing (NLP).

In [1]:
from nltk.stem import PorterStemmer

In [2]:
stemming = PorterStemmer()

In [3]:
words = ["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

In [4]:
for word in words:
    print(word,"-",stemming.stem(word))

eating - eat
eats - eat
eaten - eaten
writing - write
writes - write
programming - program
programs - program
history - histori
finally - final
finalized - final


In [5]:
stemming.stem('congratulated')

'congratul'

In [6]:
stemming.stem('understanding')

'understand'

In [7]:
stemming.stem('sitting')

'sit'

# LANCASTER STEMMING ALGORITHM

In [8]:
from nltk.stem import LancasterStemmer

In [9]:
lancaster = LancasterStemmer()

In [10]:
for word in words:
    print(word,"-",lancaster.stem(word))

eating - eat
eats - eat
eaten - eat
writing - writ
writes - writ
programming - program
programs - program
history - hist
finally - fin
finalized - fin


# RegexpStemmer class
NLTK has RegexpStemmer class with the help of which we can easily implement Regular Expression Stemmer algorithms. It basically takes a single regular expression and removes any prefix or suffix that matches the expression. Let us see an example

In [11]:
from nltk.stem import RegexpStemmer

In [12]:
reg_stemmer = RegexpStemmer('ing|s$|e$|able$', min = 4)

In [13]:
reg_stemmer.stem('eating')

'eat'

In [14]:
reg_stemmer.stem('ingplaying')

'play'

# SNOWBALL STEMMER

In [15]:
from nltk.stem import SnowballStemmer

In [16]:
snowballstemmer = SnowballStemmer('english',ignore_stopwords=True)

In [17]:
for word in words:
    print(word,"-",snowballstemmer.stem(word))

eating - eat
eats - eat
eaten - eaten
writing - write
writes - write
programming - program
programs - program
history - histori
finally - final
finalized - final


In [18]:
stemming.stem("fairly"),stemming.stem("sportingly")

('fairli', 'sportingli')

In [19]:
snowballstemmer.stem("fairly"),snowballstemmer.stem("sportingly")

('fair', 'sport')

# WORDNET LEMMATIZER
Lemmatization technique is like stemming. The output we will get after lemmatization is called ‘lemma’, which is a root word rather than root stem, the output of stemming. After lemmatization, we will be getting a valid word that means the same thing.

NLTK provides WordNetLemmatizer class which is a thin wrapper around the wordnet corpus. This class uses morphy() function to the WordNet CorpusReader class to find a lemma. Let us understand it with an example −

In [20]:
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

In [21]:
'''
POS- Noun-n
Verb-v
Adjective-a
Adverb-r
'''
for word in words:
    print(word,"-",lemmatizer.lemmatize(word,pos='v'))

eating - eat
eats - eat
eaten - eat
writing - write
writes - write
programming - program
programs - program
history - history
finally - finally
finalized - finalize


In [22]:
for word in words:
    print(word,"-",lemmatizer.lemmatize(word,pos='a'))

eating - eating
eats - eats
eaten - eaten
writing - writing
writes - writes
programming - programming
programs - programs
history - history
finally - finally
finalized - finalized


In [23]:
for word in words:
    print(word,"-",lemmatizer.lemmatize(word,pos='r'))

eating - eating
eats - eats
eaten - eaten
writing - writing
writes - writes
programming - programming
programs - programs
history - history
finally - finally
finalized - finalized


In [24]:
lemmatizer.lemmatize("better",pos='v')

'better'

In [25]:
lemmatizer.lemmatize("good",pos='v')

'good'

In [26]:
lemmatizer.lemmatize("best",pos='v')

'best'