# Lemmatization

1. The process to convert the word to its meaningful base form.
2. It has the grammar attached to them. 
3. Like Part of Speech .

In [9]:
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet

In [15]:
s = "We are putting in efforts to enhance our understanding of Lemmatization"
tokens = nltk.tokenize.word_tokenize(s)
tokens

['We',
 'are',
 'putting',
 'in',
 'efforts',
 'to',
 'enhance',
 'our',
 'understanding',
 'of',
 'Lemmatization']

In [16]:
lemmatizer = WordNetLemmatizer()
" ".join([lemmatizer.lemmatize(word) for word in tokens])

'We are putting in effort to enhance our understanding of Lemmatization'

The WordNet lemmatizer works well if the POS tags are also provided as inputs

In [17]:
pos_tags = nltk.pos_tag(tokens)
pos_tags

[('We', 'PRP'),
 ('are', 'VBP'),
 ('putting', 'VBG'),
 ('in', 'IN'),
 ('efforts', 'NNS'),
 ('to', 'TO'),
 ('enhance', 'VB'),
 ('our', 'PRP$'),
 ('understanding', 'NN'),
 ('of', 'IN'),
 ('Lemmatization', 'NN')]

In [18]:
help(lemmatizer.lemmatize)

Help on method lemmatize in module nltk.stem.wordnet:

lemmatize(word: str, pos: str = 'n') -> str method of nltk.stem.wordnet.WordNetLemmatizer instance
    Lemmatize `word` using WordNet's built-in morphy function.
    Returns the input word unchanged if it cannot be found in WordNet.
    
    :param word: The input word to lemmatize.
    :type word: str
    :param pos: The Part Of Speech tag. Valid options are `"n"` for nouns,
        `"v"` for verbs, `"a"` for adjectives, `"r"` for adverbs and `"s"`
        for satellite adjectives.
    :param pos: str
    :return: The lemma of `word`, for the given `pos`.



Since the lemmatize take the first letter of the POS, we need to modify the output of the POS tag.

In [19]:
def get_POS(token):
        tag_dict = {"J": wordnet.ADJ,
                    "N": wordnet.NOUN,
                    "V": wordnet.VERB,
                    "R": wordnet.ADV}
        tag = nltk.pos_tag([token])[0][1][0].upper()
        return tag_dict.get(tag,wordnet.NOUN)       

In [21]:
lemma_out = [lemmatizer.lemmatize(token,pos=get_POS(token)) for token in tokens ]

In [22]:
" ".join(lemma_out)

'We be put in effort to enhance our understand of Lemmatization'

In [23]:
# we can compare the output with stemmer
stemmer = nltk.stem.SnowballStemmer(language="english")
stem_sent = [stemmer.stem(token) for token in tokens]
" ".join(stem_sent)

'we are put in effort to enhanc our understand of lemmat'