## Wordnet Lemmatizer
Lemmatization technique is like stemming. The output we will get after lemmatization is called ‘lemma’, which is a root word rather than root stem, the output of stemming. After lemmatization, we will be getting a valid word that means the same thing.

NLTK provides WordNetLemmatizer class which is a thin wrapper around the wordnet corpus. This class uses morphy() function to the WordNet CorpusReader class to find a lemma. Let us understand it with an example −


In [12]:
## Q&A,chatbots,text summarization
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
from nltk import pos_tag, word_tokenize
import nltk

In [18]:
nltk.download('wordnet')
nltk.download('omw-1.4')  # Multilingual WordNet support
nltk.download('averaged_perceptron_tagger')
nltk.download('averaged_perceptron_tagger_eng')
nltk.download('punkt')

[nltk_data] Downloading package wordnet to /home/vpsr/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /home/vpsr/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/vpsr/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /home/vpsr/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.
[nltk_data] Downloading package punkt to /home/vpsr/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [6]:
lemmatizer=WordNetLemmatizer()

In [14]:
# Define a function to map NLTK POS tags to WordNet POS tags
def get_wordnet_pos(tag):
    if tag.startswith('J'):
        return wordnet.ADJ
    elif tag.startswith('V'):
        return wordnet.VERB
    elif tag.startswith('N'):
        return wordnet.NOUN
    elif tag.startswith('R'):
        return wordnet.ADV
    else:
        return wordnet.NOUN  # Default to noun

In [7]:
'''
POS(Part of Speech) can take the following params: 
-> Noun-n
-> verb-v
-> adjective-a
-> adverb-r
'''
lemmatizer.lemmatize("going",pos='v')

'go'

In [15]:
sentence = "The striped bats are hanging on their feet for best flying practices."

In [8]:
words=["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

In [9]:
for word in words:
    print(word+"---->"+lemmatizer.lemmatize(word,pos='v'))

eating---->eat
eats---->eat
eaten---->eat
writing---->write
writes---->write
programming---->program
programs---->program
history---->history
finally---->finally
finalized---->finalize


In [10]:
lemmatizer.lemmatize("goes",pos='v')

'go'

In [11]:
lemmatizer.lemmatize("fairly",pos='v'),lemmatizer.lemmatize("sportingly")

('fairly', 'sportingly')

In [16]:
# Example sentence
sentence = "The striped bats are hanging on their feet for best flying practices."
words = word_tokenize(sentence)

In [19]:
# Lemmatize each word with its POS tag
lemmatized_words = [
    lemmatizer.lemmatize(word, get_wordnet_pos(pos))
    for word, pos in pos_tag(words)
]

In [20]:
print("\nOriginal Sentence:")
print(sentence)

print("\nLemmatized Words:")
print(lemmatized_words)


Original Sentence:
The striped bats are hanging on their feet for best flying practices.

Lemmatized Words:
['The', 'striped', 'bat', 'be', 'hang', 'on', 'their', 'foot', 'for', 'best', 'fly', 'practice', '.']
