## Stemming

Stemming is the process of reducing a word to its word stem that affixes to suffixes and prefixes or to the roots of words known as a lemma. Stemming is important in natural language understanding (NLU) and natural language processing(NLP).

In [1]:
from nltk.stem import PorterStemmer

In [3]:
stemming = PorterStemmer()

In [27]:
words = ["eating", "driving", "riding", "programming", "sucking", "eats", "eaten", "writes","history"]

In [5]:
for word in words:
    print(word + "--->"+stemming.stem(word))

eating--->eat
driving--->drive
riding--->ride
programming--->program
sucking--->suck
eats--->eat
eaten--->eaten
writes--->write


In [8]:
#sometimes not correct for some words
stemming.stem("congratulations")

'congratul'

In [9]:
# Lancaster Stemming Algorithm
from nltk.stem import LancasterStemmer

In [10]:
lan = LancasterStemmer()

In [12]:
#not a good choice over PorterStemmer
for word in words:
    print(word + "--->"+lan.stem(word))

eating--->eat
driving--->driv
riding--->rid
programming--->program
sucking--->suck
eats--->eat
eaten--->eat
writes--->writ


In [13]:
from nltk.stem import RegexpStemmer

In [20]:
#position of $ decides text from start or end
reg = RegexpStemmer('ing$|s$|e$|able$', min=4) 

In [22]:
reg.stem("eating")
reg.stem("history")
reg.stem("ingplaying")
reg.stem("eaten")

'eaten'

In [24]:
# Snowball Stemmer
from nltk.stem import SnowballStemmer

In [25]:
snow = SnowballStemmer('english', ignore_stopwords=False)

In [29]:
for word in words:
    print(word + "--->"+snow.stem(word))

eating--->eat
driving--->drive
riding--->ride
programming--->program
sucking--->suck
eats--->eat
eaten--->eaten
writes--->write
history--->histori


In [31]:
snow.stem("sportingly")
snow.stem("fairly")

'fair'

## Wordnet Lemmatizer
Lemmatization technique is like stemming. The output we will get after lemmatization is called 'lemma', which is a root word rather than root stem, the output of stemming. After lemmatization, we will be getting a valid word that means the same thing.

NLTK provides WordNetLemmatizer class which is a thin wrapper around the wordnet corpus. This
class uses morphy() function to the WordNet CorpusReader class to find a lemma. Let us
understand it with an example —

#### Lemmatization is different from stemming, which removes prefixes and suffixes to reduce a word to its root form. Lemmatization considers the context and meaning of a word to convert it to a more meaningful format


In [32]:
from nltk.stem import WordNetLemmatizer
lemm = WordNetLemmatizer()

In [36]:
for word in words:
    print(word+"--->"+lemm.lemmatize(word))

eating--->eating
driving--->driving
riding--->riding
programming--->programming
sucking--->sucking
eats--->eats
eaten--->eaten
writes--->writes
history--->history


In [38]:
for word in words:
    print(word+"--->"+lemm.lemmatize(word, pos='v')) #pos = part of speech, here v = verb

eating--->eat
driving--->drive
riding--->rid
programming--->program
sucking--->suck
eats--->eat
eaten--->eat
writes--->write
history--->history


In [None]:
# use cases - 
# sentiment analysis ---> stemming
# Chatbot ---> lemmatizer