# Lemmatization
In contrast to stemming, lemmatization looks beyond word reduction, and considers a language's full vocabulary to apply a *morphological analysis* to words. The lemma of 'was' is 'be' and the lemma of 'mice' is 'mouse'. Further, the lemma of 'meeting' might be 'meet' or 'meeting' depending on its use in a sentence.

In [1]:
import spacy

In [2]:
nlp = spacy.load('en')

In [3]:
doc1 = nlp(u"I am a runner running in a race because I love to run since I ran today.")

In [9]:
# print word, part-of-speech, lemma number from language library, and lemma of word
for token in doc1:
    print(token.text,'\t\t',token.pos_,'\t\t',token.lemma,'\t\t',token.lemma_)

I 		 PRON 		 561228191312463089 		 -PRON-
am 		 VERB 		 10382539506755952630 		 be
a 		 DET 		 11901859001352538922 		 a
runner 		 NOUN 		 12640964157389618806 		 runner
running 		 VERB 		 12767647472892411841 		 run
in 		 ADP 		 3002984154512732771 		 in
a 		 DET 		 11901859001352538922 		 a
race 		 NOUN 		 8048469955494714898 		 race
because 		 ADP 		 16950148841647037698 		 because
I 		 PRON 		 561228191312463089 		 -PRON-
love 		 VERB 		 3702023516439754181 		 love
to 		 PART 		 3791531372978436496 		 to
run 		 VERB 		 12767647472892411841 		 run
since 		 ADP 		 10066841407251338481 		 since
I 		 PRON 		 561228191312463089 		 -PRON-
ran 		 VERB 		 12767647472892411841 		 run
today 		 NOUN 		 11042482332948150395 		 today
. 		 PUNCT 		 12646065887601541794 		 .


In [10]:
# format the output
def show_lemmas(text):
    for token in text:
        print(f'{token.text:{12}} {token.pos_:{6}} {token.lemma:<{22}} {token.lemma_}')

In [11]:
doc2 = nlp(u"I saw ten mice today!")

In [12]:
show_lemmas(doc2)

I            PRON   561228191312463089     -PRON-
saw          VERB   11925638236994514241   see
ten          NUM    7970704286052693043    ten
mice         NOUN   1384165645700560590    mouse
today        NOUN   11042482332948150395   today
!            PUNCT  17494803046312582752   !
