**Difference between Stemming and Lemmatisation**

*   A stemmer operates on a single word without knowledge of the context, and therefore cannot discriminate between words which have different meanings depending on part of speech.
*   While converting any word to the root/base word, stemming can create non-existent work but lemmatization creates actual dictionary words.
*   Stemmers are typically easier to implement than Lemmatizers.
*   Stemmers run faster than Lemmatizers.
*   The accuracy of stemming is less than that of lemmatization.







![alt text](https://miro.medium.com/max/585/1*uVgEZI7UFLMjHqemI_MzGA.png)

# Menggunakan NLTK

In [0]:
import nltk
nltk.download('punkt')
nltk.download('wordnet')

from nltk.tokenize import word_tokenize 
from nltk.stem import WordNetLemmatizer

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [0]:
wordnet_lemmatizer = WordNetLemmatizer()

word_data = "It originated from the idea that there are readers who prefer learning new skills from the comforts of their drawing rooms"
nltk_tokens = nltk.word_tokenize(word_data)
for w in nltk_tokens:
       print ("Actual: %s  Lemma: %s"  % (w,wordnet_lemmatizer.lemmatize(w)))

Actual: It  Lemma: It
Actual: originated  Lemma: originated
Actual: from  Lemma: from
Actual: the  Lemma: the
Actual: idea  Lemma: idea
Actual: that  Lemma: that
Actual: there  Lemma: there
Actual: are  Lemma: are
Actual: readers  Lemma: reader
Actual: who  Lemma: who
Actual: prefer  Lemma: prefer
Actual: learning  Lemma: learning
Actual: new  Lemma: new
Actual: skills  Lemma: skill
Actual: from  Lemma: from
Actual: the  Lemma: the
Actual: comforts  Lemma: comfort
Actual: of  Lemma: of
Actual: their  Lemma: their
Actual: drawing  Lemma: drawing
Actual: rooms  Lemma: room


In [0]:
text = "It originated from the idea that there are readers who prefer learning new skills from the comforts of their drawing rooms"
text = text.lower()
 
words = word_tokenize(text)
print("Tokenization :", words)
 
lemmatizer = WordNetLemmatizer()
words_lemma = [lemmatizer.lemmatize(word) for word in words]
print("Lematisation :", words_lemma)

Tokenization : ['it', 'originated', 'from', 'the', 'idea', 'that', 'there', 'are', 'readers', 'who', 'prefer', 'learning', 'new', 'skills', 'from', 'the', 'comforts', 'of', 'their', 'drawing', 'rooms']
Lematisation : ['it', 'originated', 'from', 'the', 'idea', 'that', 'there', 'are', 'reader', 'who', 'prefer', 'learning', 'new', 'skill', 'from', 'the', 'comfort', 'of', 'their', 'drawing', 'room']


In [0]:
from nltk.corpus import wordnet as wn
from nltk.stem.wordnet import WordNetLemmatizer
from nltk import word_tokenize, pos_tag
from collections import defaultdict
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

In [0]:
tag_map = defaultdict(lambda : wn.NOUN)
tag_map['J'] = wn.ADJ
tag_map['V'] = wn.VERB
tag_map['R'] = wn.ADV

text = "lemmatizer minimizes text ambiguity. Example words like bicycle or bicycles are converted to base word bicycle"
tokens = word_tokenize(text)
lemma_function = WordNetLemmatizer()

for token, tag in pos_tag(tokens):
	lemma = lemma_function.lemmatize(token, tag_map[tag[0]])
	print(token, "=>", lemma)

lemmatizer => lemmatizer
minimizes => minimizes
text => text
ambiguity => ambiguity
. => .
Example => Example
words => word
like => like
bicycle => bicycle
or => or
bicycles => bicycle
are => be
converted => convert
to => to
base => base
word => word
bicycle => bicycle


# Menggunakan Spacy

In [0]:
pip install spacy



In [0]:
import spacy

sp = spacy.load('en_core_web_sm')
kalimat = 'She had been with her father and sister when she was attacked and received first aid at the scene, an official said'
lemma_sent = sp(kalimat)
for word in lemma_sent:
  print(word.text + '  ===>', word.lemma_)

She  ===> -PRON-
had  ===> have
been  ===> be
with  ===> with
her  ===> -PRON-
father  ===> father
and  ===> and
sister  ===> sister
when  ===> when
she  ===> -PRON-
was  ===> be
attacked  ===> attack
and  ===> and
received  ===> receive
first  ===> first
aid  ===> aid
at  ===> at
the  ===> the
scene  ===> scene
,  ===> ,
an  ===> an
official  ===> official
said  ===> say


In [0]:
from spacy import displacy

nlp = spacy.load('en_core_web_sm')
about_interest_text = 'He is interested in learning, Natural Language Processing.'
about_interest_doc = nlp(about_interest_text)
displacy.render(about_interest_doc, style='dep', jupyter=True)