# Lemmatization


In [None]:
"""

In contrast to stemming, lemmatization looks beyond word reduction, and
considers a language's full vocabulary to apply a morphological analysis
to words.

The lemma of "was" is "be" and the lemma of "mice" is "mouse".
Further, the lemma of "meeting" might be "meet" or "meeting"
depending on its use in a sentence.

Here we are not just shortening words or cutting off the end of them.
Instead, we are looking at the full context of the word.

Lemmatization is typically seen as much more informative than simple
stemming which is why Spacy has opted to only have Lemmatizaation available
instead of Stemming.

Lemmatization looks at surrounding text to determine a given word's
part of speech, it does not categorize pharases.


"""



In [1]:
import spacy

In [4]:
nlp = spacy.load('en_core_web_sm')

In [10]:
doc1 = nlp(u'To get more healthy body I do running and walking. After running \
I do some yoga as well.')



In [11]:
for token in doc1:
    print(token.text," \t",token.pos_,'\t',token.lemma,'\t',token.lemma_)



To  	 PART 	 3791531372978436496 	 to
get  	 VERB 	 2013399242189103424 	 get
more  	 ADV 	 2160362229054775535 	 more
healthy  	 ADJ 	 5379644128261274187 	 healthy
body  	 NOUN 	 12275956947298454125 	 body
I  	 PRON 	 4690420944186131903 	 I
do  	 AUX 	 2158845516055552166 	 do
running  	 VERB 	 12767647472892411841 	 run
and  	 CCONJ 	 2283656566040971221 	 and
walking  	 VERB 	 1674876016505392235 	 walk
.  	 PUNCT 	 12646065887601541794 	 .
After  	 ADP 	 13428508259213873547 	 after
running  	 VERB 	 12767647472892411841 	 run
I  	 PRON 	 4690420944186131903 	 I
do  	 VERB 	 2158845516055552166 	 do
some  	 DET 	 7000492816108906599 	 some
yoga  	 NOUN 	 6756860817772158373 	 yoga
as  	 ADV 	 7437575085468336610 	 as
well  	 ADV 	 4525988469032889948 	 well
.  	 PUNCT 	 12646065887601541794 	 .


In [15]:
def lemma(text):

    for token in text:
        print(f'{token.text:{12}} {token.pos_:{6}} {token.lemma:<{22}}\
        {token.lemma_}')

lemma(doc1)

To           PART   3791531372978436496           to
get          VERB   2013399242189103424           get
more         ADV    2160362229054775535           more
healthy      ADJ    5379644128261274187           healthy
body         NOUN   12275956947298454125          body
I            PRON   4690420944186131903           I
do           AUX    2158845516055552166           do
running      VERB   12767647472892411841          run
and          CCONJ  2283656566040971221           and
walking      VERB   1674876016505392235           walk
.            PUNCT  12646065887601541794          .
After        ADP    13428508259213873547          after
running      VERB   12767647472892411841          run
I            PRON   4690420944186131903           I
do           VERB   2158845516055552166           do
some         DET    7000492816108906599           some
yoga         NOUN   6756860817772158373           yoga
as           ADV    7437575085468336610           as
well         ADV    4525988

In [18]:
doc2 =nlp(u'Artificial Intelligence (AI) enables machines to mimic human cognitive functions like learning, reasoning, and problem-solving. It powers applications such as voice assistants, autonomous vehicles, and data analytics. AI is transforming industries by automating tasks and providing intelligent insights.')


In [19]:
lemma(doc2)

Artificial   PROPN  6055699347906093817           Artificial
Intelligence PROPN  9808343991309517954           Intelligence
(            PUNCT  12638816674900267446          (
AI           PROPN  5530044837203964789           AI
)            PUNCT  3842344029291005339           )
enables      VERB   1080083029942854337           enable
machines     NOUN   1826470356240629538           machine
to           PART   3791531372978436496           to
mimic        VERB   2520723944239179494           mimic
human        ADJ    5674190450704392893           human
cognitive    ADJ    16654971456144141498          cognitive
functions    NOUN   598416036538124536            function
like         ADP    18194338103975822726          like
learning     NOUN   7342778914265824300           learning
,            PUNCT  2593208677638477497           ,
reasoning    NOUN   7174139258353040852           reasoning
,            PUNCT  2593208677638477497           ,
and          CCONJ  2283656566040971221   