# POS-Lemma

> Specifying the `Part-of-Speech` (POS) of a word to the WordNetLemmatizer makes it more efficient. Run the code below to see the difference.

In [33]:
from nltk.stem import WordNetLemmatizer, wordnet

lemmatizer = WordNetLemmatizer()

print("Without POS tag %s %s" % (":", lemmatizer.lemmatize("loving")))
print("With POS tag %s %s" % (":", lemmatizer.lemmatize("loving", pos = "v")))

Without POS tag : loving
With POS tag : love


🧑🏻‍🎓 Understanding the `pos_tag` from `nltk`.

Run the following cells:

In [34]:
from nltk import pos_tag
import nltk

In [35]:
noun = "love"
adjective = "big"
adverb = "lovely"
verb = "loving"
# nltk.download('averaged_perceptron_tagger')

In [36]:
pos_tag([noun])[0][1][0].upper()     # --> N for noun

'N'

In [37]:
pos_tag([adjective])[0][1][0].upper() # --> J for adjective

'J'

In [38]:
pos_tag([adverb])[0][1][0].upper() # --> R for adverb

'R'

In [39]:
pos_tag([verb])[0][1][0].upper() # --> V for verb

'V'

❓ **Question** ❓

Create a function that lemmatizes your text, taking into account the associated POS tags. 

💡 Hint: The `WordNetLemmatizer` requires the POS tags to be specified in a certain form, different from the tags outputed by `nltk.pos_tag`. You will need to map them to the correct form.

In [40]:
# ------
# Map a POS tag to a format WordNetLemmatizer accepts:
# ------
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
from nltk.tokenize import word_tokenize
from nltk import pos_tag

def get_wordnet_pos(tag):
    if tag.startswith('N'):
        return wordnet.NOUN
    elif tag.startswith('V'):
        return wordnet.VERB
    elif tag.startswith('R'):
        return wordnet.ADV
    elif tag.startswith('J'):
        return wordnet.ADJ
    else:
        return wordnet.NOUN

def pos_lemma(text):
    lemmatizer = WordNetLemmatizer()
    tokens = word_tokenize(text)
    pos_tags = pos_tag(tokens)
    lemmas = []
    for word, tag in pos_tags:
        pos = get_wordnet_pos(tag)
        lemma = lemmatizer.lemmatize(word, pos=pos)
        lemmas.append(lemma)
    return ' '.join(lemmas)

👇 Try your function:

In [41]:
sentence = "I am loving Paris"

In [42]:
# YOUR CODE HERE
lemmatized_sentence = pos_lemma(sentence)
print(lemmatized_sentence)

I be love Paris


🏁 Congratulations. With this minichallenge, you've raised some self-awareness about to find the root of a word, no matter if this is a noun, an adjective, an adverb or a verb.

💾 Don't forget to `git add / commit / push`