<h3>Stemming in NLTK</h3> 

In [1]:
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()

In [4]:
words = ["Hugging","Sleeping","eating", "eats", "eat", "ate", "adjustable", "rafting", "ability", "meeting"]

for word in words:
    print(word, "|", stemmer.stem(word))

Hugging | hug
Sleeping | sleep
eating | eat
eats | eat
eat | eat
ate | ate
adjustable | adjust
rafting | raft
ability | abil
meeting | meet


<h3>Lemmatization in Spacy</h3>

In [6]:
import spacy

In [7]:
nlp = spacy.load("en_core_web_sm")

doc = nlp("Mando talked for 3 hours although talking isn't his thing")
doc = nlp("eating eats eat ate adjustable rafting ability meeting better")
for token in doc:
    print(token, " | ", token.lemma_)

eating  |  eat
eats  |  eat
eat  |  eat
ate  |  eat
adjustable  |  adjustable
rafting  |  raft
ability  |  ability
meeting  |  meeting
better  |  well


<h3>Customizing lemmatizer</h3>

In [8]:

nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [9]:
ar = nlp.get_pipe('attribute_ruler')

ar.add([[{"TEXT":"Bro"}],[{"TEXT":"Brah"}]],{"LEMMA":"Brother"})

doc = nlp("Bro, you wanna go? Brah, don't say no! I am exhausted")
for token in doc:
    print(token.text, "|", token.lemma_)

Bro | Brother
, | ,
you | you
wanna | wanna
go | go
? | ?
Brah | Brother
, | ,
do | do
n't | not
say | say
no | no
! | !
I | I
am | be
exhausted | exhaust


In [10]:
doc[5]

?

In [11]:
doc[6]

Brah

In [12]:
doc[6].lemma

4347558510128575363

In [13]:
doc[6].lemma_

'Brother'

<h3>Exercise</h3>

In [17]:
words = ["Hugging","Sleeping","eating", "eats", "eat", "ate", "adjustable", "rafting", "ability", "meeting"]
words=[  stemmer.stem(word) for word in words]

In [18]:
words

['hug', 'sleep', 'eat', 'eat', 'eat', 'ate', 'adjust', 'raft', 'abil', 'meet']

In [20]:
words = ["Hugging","Sleeping","eating", "eats", "eat", "ate", "adjustable", "rafting", "ability", "meeting"]
words=' '.join(words)

In [22]:
words
words=nlp(words)

In [23]:
words=[token.lemma_ for token in words]

In [24]:
words

['hug',
 'sleep',
 'eating',
 'eat',
 'eat',
 'eat',
 'adjustable',
 'raft',
 'ability',
 'meeting']

<h3>convert the given text into it's base form using both stemming and lemmatization</h3>

In [55]:
text = """Latha is very multi talented girl.She is good at many skills like dancing, running, singing, playing.She also likes eating Pav Bhagi. she has a 
habit of fishing and swimming too.Besides all this, she is a wonderful at cooking too."""

In [27]:
import nltk
words=nltk.word_tokenize(text)

In [28]:
words

['Latha',
 'is',
 'very',
 'multi',
 'talented',
 'girl.She',
 'is',
 'good',
 'at',
 'many',
 'skills',
 'like',
 'dancing',
 ',',
 'running',
 ',',
 'singing',
 ',',
 'playing.She',
 'also',
 'likes',
 'eating',
 'Pav',
 'Bhagi',
 '.',
 'she',
 'has',
 'a',
 'habit',
 'of',
 'fishing',
 'and',
 'swimming',
 'too.Besides',
 'all',
 'this',
 ',',
 'she',
 'is',
 'a',
 'wonderful',
 'at',
 'cooking',
 'too',
 '.']

In [29]:
word_stem=[stemmer.stem(word) for word in words]

In [32]:
word_stem=' '.join(word_stem)

In [33]:
word_stem

'latha is veri multi talent girl.sh is good at mani skill like danc , run , sing , playing.sh also like eat pav bhagi . she ha a habit of fish and swim too.besid all thi , she is a wonder at cook too .'

In [56]:
words=nlp(text)
print(words)

Latha is very multi talented girl.She is good at many skills like dancing, running, singing, playing.She also likes eating Pav Bhagi. she has a 
habit of fishing and swimming too.Besides all this, she is a wonderful at cooking too.


In [57]:
words=[token.lemma_ for token in words]

In [58]:
words

['Latha',
 'be',
 'very',
 'multi',
 'talented',
 'girl',
 '.',
 'she',
 'be',
 'good',
 'at',
 'many',
 'skill',
 'like',
 'dancing',
 ',',
 'running',
 ',',
 'singing',
 ',',
 'play',
 '.',
 'she',
 'also',
 'like',
 'eat',
 'Pav',
 'Bhagi',
 '.',
 'she',
 'have',
 'a',
 '\n',
 'habit',
 'of',
 'fishing',
 'and',
 'swim',
 'too',
 '.',
 'besides',
 'all',
 'this',
 ',',
 'she',
 'be',
 'a',
 'wonderful',
 'at',
 'cook',
 'too',
 '.']

In [60]:
words=' '.join(words)

In [61]:
words

'Latha be very multi talented girl . she be good at many skill like dancing , running , singing , play . she also like eat Pav Bhagi . she have a \n habit of fishing and swim too . besides all this , she be a wonderful at cook too .'