Stemming in NLTK

In [1]:
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()

In [2]:
words = ["eating", "eats", "eat", "ate", "adjustable", "rafting", "ability", "meeting"]

for word in words:
    print(word, "|", stemmer.stem(word))

eating | eat
eats | eat
eat | eat
ate | ate
adjustable | adjust
rafting | raft
ability | abil
meeting | meet


Lemmatization in Spacy

In [4]:
import spacy

In [8]:
nlp = spacy.load("en_core_web_sm")

doc = nlp("Mando talked for 3 hours although talking isn't his thing")
doc = nlp("eating eats eat ate adjustable rafting ability meeting better")
for token in doc:
    print(token, " | ", token.lemma_)

eating  |  eat
eats  |  eat
eat  |  eat
ate  |  eat
adjustable  |  adjustable
rafting  |  raft
ability  |  ability
meeting  |  meet
better  |  well


Customizing lemmatizer

In [9]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [10]:
ar = nlp.get_pipe('attribute_ruler')

ar.add([[{"TEXT":"Bro"}],[{"TEXT":"Brah"}]],{"LEMMA":"Brother"})

doc = nlp("Bro, you wanna go? Brah, don't say no! I am exhausted")
for token in doc:
    print(token.text, "|", token.lemma_)

Bro | Brother
, | ,
you | you
wanna | wanna
go | go
? | ?
Brah | Brother
, | ,
do | do
n't | not
say | say
no | no
! | !
I | I
am | be
exhausted | exhaust


In [11]:
doc[6]

Brah

In [14]:
doc[6].lemma_

'Brother'

Stemming and Lemmatization: Exercises

Run this cell to import all necessary packages

In [None]:
#let import necessary libraries and create the object

#for nltk
import nltk
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()

#downloading all neccessary packages related to nltk
nltk.download('all')


#for spacy
import spacy
nlp = spacy.load("en_core_web_sm")

Exercise1:

Convert these list of words into base form using Stemming and Lemmatization and observe the transformations


Write a short note on the words that have different base words using stemming and Lemmatization

In [17]:
#using stemming in nltk
lst_words = ['running', 'painting', 'walking', 'dressing', 'likely', 'children', 'whom', 'good', 'ate', 'fishing']

for word in lst_words:
    print(word, "|", stemmer.stem(word))

running | run
painting | paint
walking | walk
dressing | dress
likely | like
children | children
whom | whom
good | good
ate | ate
fishing | fish


In [18]:
#using lemmatization in spacy

doc = nlp("running painting walking dressing likely children who good ate fishing")

for token in doc:
    print(token.text, "|", token.lemma_)

running | run
painting | painting
walking | walking
dressing | dress
likely | likely
children | child
who | who
good | good
ate | eat
fishing | fish
