# POS-Lemma

> Specifying the `Part-of-Speech` (POS) of a word to the WordNetLemmatizer makes it more efficient. Run the code below to see the difference.

In [1]:
from nltk.stem import WordNetLemmatizer, wordnet

lemmatizer = WordNetLemmatizer()

print("Without POS tag %s %s" % (":", lemmatizer.lemmatize("loving")))
print("With POS tag %s %s" % (":", lemmatizer.lemmatize("loving", pos = "v")))


Without POS tag : loving
With POS tag : love


🧑🏻‍🎓 Understanding the `pos_tag` from `nltk`.

Run the following cells:

In [11]:
import nltk
nltk.download('averaged_perceptron_tagger')
from nltk.tag import pos_tag


[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/reecepalmer/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


In [12]:
noun = "love"
adjective = "big"
adverb = "lovely"
verb = "loving"


In [13]:
pos_tag([noun])[0][1][0].upper() # --> N for noun


'N'

In [5]:
pos_tag([adjective])[0][1][0].upper() # --> J for adjective


'J'

In [6]:
pos_tag([adverb])[0][1][0].upper() # --> R for adverb


'R'

In [7]:
pos_tag([verb])[0][1][0].upper() # --> V for verb


'V'

❓ **Question** ❓

Create a function that lemmatizes your text, taking into account the associated POS tags. 

💡 Hint: The `WordNetLemmatizer` requires the POS tags to be specified in a certain form, different from the tags outputed by `nltk.pos_tag`. You will need to map them to the correct form.

In [16]:
# ------
# Map a POS tag to a format WordNetLemmatizer accepts:
# ------

from nltk.corpus import wordnet

def get_wordnet_pos(treebank_tag):
    if treebank_tag.startswith('J'):
        return wordnet.ADJ
    elif treebank_tag.startswith('V'):
        return wordnet.VERB
    elif treebank_tag.startswith('R'):
        return wordnet.ADV
    elif treebank_tag.startswith('N'):
        return wordnet.NOUN
    else:
        return wordnet.NOUN

from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from nltk.tag import pos_tag

def pos_lemma(text):
    lemmatizer = WordNetLemmatizer()

    tokens = word_tokenize(text)
    pos_tags = pos_tag(tokens)

    lemmatized_words = [lemmatizer.lemmatize(word, pos=get_wordnet_pos(pos_tag))
                        for word, pos_tag in pos_tags]

    return ' '.join(lemmatized_words)


👇 Try your function:

In [17]:
sentence = "I am loving Paris"


In [19]:
lemmatized_text = pos_lemma(sentence)
print(lemmatized_text)


I be love Paris


🏁 Congratulations. With this minichallenge, you've raised some self-awareness about to find the root of a word, no matter if this is a noun, an adjective, an adverb or a verb.

💾 Don't forget to `git add / commit / push`