# Stemming

we find the word stem, which is the base word. history -> histori <- historical. It converts the word into its base word. base word also calls lemma. 

## why use it ?
Stemming is a process where words are reduced to a root by removing inflection through dropping unnecessary characters, usually a suffix.Stemming and AI knowledge extract meaningful information from vast sources like big data or the Internet since additional forms of a word related to a subject may need to be searched to get the best results. 
#### Like changing, changed , change -> after performing stemming -> stem
sometimes it doesn't make any sense. 

## when to use it ? 
go with stemming when the vocab space is small and the documents are large. Conversely, go with word embeddings when the vocab space is large but the documents are small. postive, negative sentiment analysis, gmail classifier. 

# Lemmatization
Lemmatization is a bit more complex in that the computer can group together words that do not have the same stem, but still have the same inflected meaning.

#### incase of stemming the word doesn't have any meaning, where as lemmatization do have some meaning. Lemmatization takes more time as its word have meaning, where as stemming dont take as much time as lemmatization.  

#### lemmatization provides better results by performing an analysis that depends on the word's part-of-speech and producing real, dictionary words. As a result, lemmatization is harder to implement and slower compared to stemming

#### history -> history <- historical , (finally, final, finalized) -> final


# why and where to use ?

chatbots, question answering . Because here you have to understand what its saying and get a meaning of it. so that is why we cant use stemming here. 

spacy dont support stemming , nltk has both stemming and lemmatization. 

In [51]:
import spacy
import nltk 

In [52]:
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()

In [53]:
words= ["eating", "eats", "eat", "ate", "adjustable", "rafting", "ability", "meeting"]

In [54]:
for word in words:
    print(word, "  -> after stemming  -> ", stemmer.stem(word))

eating   -> after stemming  ->  eat
eats   -> after stemming  ->  eat
eat   -> after stemming  ->  eat
ate   -> after stemming  ->  ate
adjustable   -> after stemming  ->  adjust
rafting   -> after stemming  ->  raft
ability   -> after stemming  ->  abil
meeting   -> after stemming  ->  meet


#### above you can see when perfoming ability -> it transformed into abil, which basically makes no sense.  that is why people preffer stemming over lemmatization.

In [55]:
import en_core_web_sm
nlp = en_core_web_sm.load()

In [56]:
doc = nlp("eating eats eat ate adjustable rafting ability meeting better")

In [57]:
for token in doc:
    print(token, " | ", token.lemma_)

eating  |  eating
eats  |  eat
eat  |  eat
ate  |  eat
adjustable  |  adjustable
rafting  |  raft
ability  |  ability
meeting  |  meeting
better  |  well


#### our model sometimes dont understand slang words like, bruh and bro is same, so they dont change them at all, at this time we can customize our model based on our need. 

In [58]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [59]:
ar = nlp.get_pipe('attribute_ruler')

ar.add([[{"TEXT":"Bro"}],[{"TEXT":"Brah"}]],{"LEMMA":"Brother"})

doc = nlp("Bro, you wanna go? Brah, don't say no! I am exhausted")
for token in doc:
    print(token.text, "|", token.lemma_)

Bro | Brother
, | ,
you | you
wanna | wanna
go | go
? | ?
Brah | Brother
, | ,
do | do
n't | not
say | say
no | no
! | !
I | I
am | be
exhausted | exhaust


# Exercise

In [60]:
lst_words = ['running', 'painting', 'walking', 'dressing', 'likely', 'children', 'whom', 'good', 'ate', 'fishing']

In [61]:
#using stemming in nltk
for word in lst_words:
    print(word, "  ->  ", stemmer.stem(word))

running   ->   run
painting   ->   paint
walking   ->   walk
dressing   ->   dress
likely   ->   like
children   ->   children
whom   ->   whom
good   ->   good
ate   ->   ate
fishing   ->   fish


In [62]:
#using lemmatization in spacy
doc = nlp("running painting walking dressing likely children who good ate fishing")

In [63]:
for token in doc:
    print(token, " | ", token.lemma_)

running  |  run
painting  |  painting
walking  |  walking
dressing  |  dress
likely  |  likely
children  |  child
who  |  who
good  |  good
ate  |  eat
fishing  |  fishing


# Exercise2:

convert the given text into it's base form using both stemming and lemmatization

In [64]:
nlp1 = spacy.blank("en")

In [65]:
text = """Latha is very multi talented girl.She is good at many skills like dancing, running, singing, playing.She also likes eating Pav Bhagi. she has a 
habit of fishing and swimming too.Besides all this, she is a wonderful at cooking too.
"""
doc = nlp1(text)

In [66]:
#using stemming in nltk

#step1: Word tokenizing
list_ = []
for token in doc:
    list_.append(str(token))

for word in list_:
    print(word, "  ->  ", stemmer.stem(word))
#step2: getting the base form for each token using stemmer



#step3: joining all words in a list into string using 'join()'

Latha   ->   latha
is   ->   is
very   ->   veri
multi   ->   multi
talented   ->   talent
girl   ->   girl
.   ->   .
She   ->   she
is   ->   is
good   ->   good
at   ->   at
many   ->   mani
skills   ->   skill
like   ->   like
dancing   ->   danc
,   ->   ,
running   ->   run
,   ->   ,
singing   ->   sing
,   ->   ,
playing   ->   play
.   ->   .
She   ->   she
also   ->   also
likes   ->   like
eating   ->   eat
Pav   ->   pav
Bhagi   ->   bhagi
.   ->   .
she   ->   she
has   ->   ha
a   ->   a

   ->   

habit   ->   habit
of   ->   of
fishing   ->   fish
and   ->   and
swimming   ->   swim
too   ->   too
.   ->   .
Besides   ->   besid
all   ->   all
this   ->   thi
,   ->   ,
she   ->   she
is   ->   is
a   ->   a
wonderful   ->   wonder
at   ->   at
cooking   ->   cook
too   ->   too
.   ->   .

   ->   



In [67]:
# lemmatzation 
doc3 = nlp(text)

In [69]:
for token in doc3:
    print(token, " | ", token.lemma_)

Latha  |  Latha
is  |  be
very  |  very
multi  |  multi
talented  |  talented
girl  |  girl
.  |  .
She  |  she
is  |  be
good  |  good
at  |  at
many  |  many
skills  |  skill
like  |  like
dancing  |  dancing
,  |  ,
running  |  running
,  |  ,
singing  |  singing
,  |  ,
playing  |  play
.  |  .
She  |  she
also  |  also
likes  |  like
eating  |  eat
Pav  |  Pav
Bhagi  |  Bhagi
.  |  .
she  |  she
has  |  have
a  |  a

  |  

habit  |  habit
of  |  of
fishing  |  fishing
and  |  and
swimming  |  swim
too  |  too
.  |  .
Besides  |  besides
all  |  all
this  |  this
,  |  ,
she  |  she
is  |  be
a  |  a
wonderful  |  wonderful
at  |  at
cooking  |  cook
too  |  too
.  |  .

  |  

