# Stemming vs Lemmatization

|Stemming	                                           |                    Lemmatization                                   |
|------------------------------------------------------|-----------------------------------------------------------------   |                  
|Finding base word	                                   |                    Finding different representations of a word     |
|Faster	                                               |                     Slower                                         |
|Used in sentiment analysis	                           |                     Answering to human queries                     |


Sentiment analysis (or opinion mining) is a natural language processing (NLP) technique used to determine whether data is positive, negative or neutral

# Stemming

- Reducing word to it’s root form
	Eg: computer, computing have same root form
- Not supported in Spacy
- Can be done using `nltk` library
- Types of stemmers in `nltk` - Porter and Snowball
- Porter vs Snowball stemmer
Snowball is an improved version of Porter and is preferred over porter

In [18]:
import nltk

In [2]:
# Porter Stemmer
from nltk.stem.porter import *

In [3]:
stemmer = PorterStemmer()

In [6]:
tokens = ["Computer", "computing"]

In [7]:
for token in tokens:
    print(token, "==", stemmer.stem(token))

Computer == comput
computing == comput


In [19]:
tokens = ["thought", "thinking", "think"]

In [20]:
for token in tokens:
    print(token, "==", stemmer.stem(token))

thought == thought
thinking == think
think == think


Problematic case with root analysis. But thus will work in lammentization.

In [21]:
tokens = ["house", "tree", "treehouse"]

In [22]:
for token in tokens:
    print(token, "==", stemmer.stem(token))

house == hous
tree == tree
treehouse == treehous


### Result

Here, we extracted the root word from both Computer and Computing.\
Same output from #Snowball in this case


# Lammentization

- Matching similar meaning words
- We use the `lemma_` attribute

In [11]:
import spacy
nlp = spacy.load("en_core_web_sm")

In [12]:
text = "Computer computing"
doc = nlp(text)

In [13]:
for token in doc:
    print(token.text, token.lemma_)

Computer computer
computing computing


In [14]:
text = "Computer computing computed"
doc = nlp(text)

In [15]:
for token in doc:
    print(token.text, token.lemma_)

Computer computer
computing computing
computed compute


In [16]:
text = "He was having a stroke"
doc = nlp(text)

In [17]:
for token in doc:
    print(token.text, token.lemma_)

He he
was be
having have
a a
stroke stroke


In [25]:
text = "thought thinking think"
doc = nlp(text)

Here, it did game different output from thinking and think :(

In [26]:
for token in doc:
    print(token.text, token.lemma_)

thought think
thinking thinking
think think


In [27]:
text = "house tree treehouse"
doc = nlp(text)

In [28]:
for token in doc:
    print(token.text, token.lemma_)

house house
tree tree
treehouse treehouse


### Result 

- converted all tenses into first form
- Simalary `has` becomes `have` and so on

# Some More Examples on Stemming vs Lemmentization I found online

Stemming:  
Walking  :  walk  
is  :  is  
one  :  one  
of  :  of  
the  :  the  
main  :  main  
gaits  :  gait  
of  :  of  
terrestrial  :  terrestri  
locomotion  :  locomot  
among  :  among  
legged  :  leg  
animals  :  anim  

Lemmatization:  
Walking  :  Walking  
is  :  is  
one  :  one  
of  :  of  
the  :  the  
main  :  main  
gaits  :  gait  
of  :  of  
terrestrial  :  terrestrial  
locomotion  :  locomotion  
among  :  among  
legged  :  legged  
animals  :  animal
