In [None]:
#Stemming usually operates on single word without knowledge of the context
#Lemmatization usually considers words and the context of the word in the sentence

Stemming is faster compared to lemmatization as it cuts the prefixes(pre-, extra-, in-, im-, ir-, etc.)  and suffixes(ed-, ing-, es-, -ity, -ty, -ship, -ness, etc.) without considering the context of the words. Due to its aggressiveness, there is a possibility that the outcome from the stemming algorithm may not be a valid word.

WHY??
Stemming follows an algorithm with steps to perform on the words which makes it faster. Whereas, in lemmatization, you used a corpus also to supply lemma which makes it slower than stemming. you furthermore might had to define a parts-of-speech to get the proper lemma.

In [None]:
import nltk
from nltk.stem.porter import PorterStemmer
porter_stemmer = PorterStemmer()
from nltk.stem import WordNetLemmatizer
wordnet_lemmatizer = WordNetLemmatizer()
word_data = "I also once encountered bluetooth on the plain of Vidya. Alas his latencies ended him. This headsets was my victorious saving graces.Heard some melodies and symphonies."
# First Word tokenization
nltk_tokens = nltk.word_tokenize(word_data)
#Next find the roots of the word
for w in nltk_tokens:
       print( "Actual: %s  Stem: %s"  % (w,porter_stemmer.stem(w)))
       print ("Actual: %s  Lemma: %s"  % (w,wordnet_lemmatizer.lemmatize(w)))

Actual: I  Stem: i
Actual: I  Lemma: I
Actual: also  Stem: also
Actual: also  Lemma: also
Actual: once  Stem: onc
Actual: once  Lemma: once
Actual: encountered  Stem: encount
Actual: encountered  Lemma: encountered
Actual: bluetooth  Stem: bluetooth
Actual: bluetooth  Lemma: bluetooth
Actual: on  Stem: on
Actual: on  Lemma: on
Actual: the  Stem: the
Actual: the  Lemma: the
Actual: plain  Stem: plain
Actual: plain  Lemma: plain
Actual: of  Stem: of
Actual: of  Lemma: of
Actual: Vidya  Stem: vidya
Actual: Vidya  Lemma: Vidya
Actual: .  Stem: .
Actual: .  Lemma: .
Actual: Alas  Stem: ala
Actual: Alas  Lemma: Alas
Actual: his  Stem: hi
Actual: his  Lemma: his
Actual: latencies  Stem: latenc
Actual: latencies  Lemma: latency
Actual: ended  Stem: end
Actual: ended  Lemma: ended
Actual: him  Stem: him
Actual: him  Lemma: him
Actual: .  Stem: .
Actual: .  Lemma: .
Actual: This  Stem: thi
Actual: This  Lemma: This
Actual: headsets  Stem: headset
Actual: headsets  Lemma: headset
Actual: was  Ste

In [None]:
#STEMMING DOES NOT USE POS TAGS
#LEMMATIZATION USES POS TAGS

In [None]:
#Real Time example showing use of Wordnet Lemmatization and POS Tagging in Python

In [None]:
import nltk
nltk.download('wordnet')
nltk.download('omw-1.4')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


True

In [None]:
from nltk.corpus import wordnet as wn
from nltk.stem.wordnet import WordNetLemmatizer
from nltk import word_tokenize, pos_tag
from collections import defaultdict


In [None]:
tag_map = defaultdict(lambda : wn.NOUN)
tag_map['J'] = wn.ADJ
tag_map['V'] = wn.VERB
tag_map['R'] = wn.ADV
text = "I also once encountered bluetooth on the plain of Vidya. Alas his latencies ended him. This headsets was my victorious saving graces.Heard some melodies and symphonies."
tokens = word_tokenize(text)
lemma_function = WordNetLemmatizer()
for token, tag in pos_tag(tokens):
		lemma = lemma_function.lemmatize(token, tag_map[tag[0]])
		print(token, "=>", lemma)

I => I
also => also
once => once
encountered => encounter
bluetooth => bluetooth
on => on
the => the
plain => plain
of => of
Vidya => Vidya
. => .
Alas => Alas
his => his
latencies => latency
ended => end
him => him
. => .
This => This
headsets => headset
was => be
my => my
victorious => victorious
saving => save
graces.Heard => graces.Heard
some => some
melodies => melody
and => and
symphonies => symphony
. => .


One thing to note about lemmatization is that it is harder to create a lemmatizer in a new language than it is a stemming algorithm because we require a lot more knowledge about structure of a language in lemmatizers.

Conclusion : If speed is concentrated then stemming should be used since lemmatizers scan a corpus which consumes time and processing.
It depends on the problem you’re working on that decides if stemmers should be used or lemmatizers.