## Wordnet Lemmatizer
Lemmatization technique is like stemming. The output we will get after lemmatization is called ‘lemma’, which is a root **word** rather than root **stem**, the output of stemming. After lemmatization, we will be getting a valid word that means the same thing.

NLTK provides WordNetLemmatizer class which is a thin wrapper around the wordnet corpus. This class uses morphy() function to the WordNet CorpusReader class to find a lemma. Let us understand it with an example −


In [1]:
## Q&A,chatbots,text summarization
from nltk.stem import WordNetLemmatizer

In [2]:
lemmatizer=WordNetLemmatizer()

In [4]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to C:\Users\Artur
[nltk_data]     Dragunov\AppData\Roaming\nltk_data...


True

In [5]:
'''
POS- 
Noun-n
verb-v
adjective-a
adverb-r
'''
lemmatizer.lemmatize("going",pos='v')

'go'

In [6]:
words=["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

In [7]:
for word in words:
    print(word+"---->"+lemmatizer.lemmatize(word,pos='v')) # by default, lemmatize(word,pos='n')

eating---->eat
eats---->eat
eaten---->eat
writing---->write
writes---->write
programming---->program
programs---->program
history---->history
finally---->finally
finalized---->finalize


In [8]:
lemmatizer.lemmatize("goes",pos='v')

'go'

In [9]:
lemmatizer.lemmatize("fairly",pos='v'),lemmatizer.lemmatize("sportingly")

('fairly', 'sportingly')

Full example with NLTK

💡 In Practice:
- You tokenize the text.
- You POS-tag each token.
- You map POS tags to the format WordNet expects.
- Then you lemmatize with that info.

Tools like spaCy simplify this even more — it has built-in lemmatization with POS tagging.

In [11]:
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
from nltk import pos_tag, word_tokenize

nltk.download('punkt')
nltk.download('averaged_perceptron_tagger_eng')
nltk.download('wordnet')

# Map NLTK POS tags to WordNet format
def get_wordnet_pos(treebank_tag):
    if treebank_tag.startswith('J'):
        return wordnet.ADJ
    elif treebank_tag.startswith('V'):
        return wordnet.VERB
    elif treebank_tag.startswith('N'):
        return wordnet.NOUN
    elif treebank_tag.startswith('R'):
        return wordnet.ADV
    else:
        return wordnet.NOUN  # default fallback

lemmatizer = WordNetLemmatizer()

text = "The striped bats are hanging on their feet for best"
tokens = word_tokenize(text)
tagged = pos_tag(tokens)

lemmatized = [lemmatizer.lemmatize(word, get_wordnet_pos(pos)) for word, pos in tagged]

print(lemmatized)
# Output: ['The', 'striped', 'bat', 'be', 'hang', 'on', 'their', 'foot', 'for', 'good']


[nltk_data] Downloading package punkt to C:\Users\Artur
[nltk_data]     Dragunov\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     C:\Users\Artur Dragunov\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping taggers\averaged_perceptron_tagger_eng.zip.
[nltk_data] Downloading package wordnet to C:\Users\Artur
[nltk_data]     Dragunov\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


['The', 'striped', 'bat', 'be', 'hang', 'on', 'their', 'foot', 'for', 'best']


## spaCy
spaCy is simpler and used for production pipelines

In [12]:
%pip install spacy


Collecting spacy
  Downloading spacy-3.8.5-cp310-cp310-win_amd64.whl.metadata (28 kB)
Collecting spacy-legacy<3.1.0,>=3.0.11 (from spacy)
  Downloading spacy_legacy-3.0.12-py2.py3-none-any.whl.metadata (2.8 kB)
Collecting spacy-loggers<2.0.0,>=1.0.0 (from spacy)
  Downloading spacy_loggers-1.0.5-py3-none-any.whl.metadata (23 kB)
Collecting murmurhash<1.1.0,>=0.28.0 (from spacy)
  Downloading murmurhash-1.0.12-cp310-cp310-win_amd64.whl.metadata (2.2 kB)
Collecting cymem<2.1.0,>=2.0.2 (from spacy)
  Downloading cymem-2.0.11-cp310-cp310-win_amd64.whl.metadata (8.8 kB)
Collecting preshed<3.1.0,>=3.0.2 (from spacy)
  Downloading preshed-3.0.9-cp310-cp310-win_amd64.whl.metadata (2.2 kB)
Collecting thinc<8.4.0,>=8.3.4 (from spacy)
  Downloading thinc-8.3.6-cp310-cp310-win_amd64.whl.metadata (15 kB)
Collecting wasabi<1.2.0,>=0.9.1 (from spacy)
  Downloading wasabi-1.1.3-py3-none-any.whl.metadata (28 kB)
Collecting srsly<3.0.0,>=2.4.3 (from spacy)
  Downloading srsly-2.5.1-cp310-cp310-win_amd64


[notice] A new release of pip is available: 25.0.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [14]:
import spacy
import spacy.cli
spacy.cli.download("en_core_web_sm")
# Load the English model
nlp = spacy.load("en_core_web_sm")

text = "The striped bats are hanging on their feet for best"

# Process the text
doc = nlp(text)

# Lemmatize with POS info automatically
lemmatized = [token.lemma_ for token in doc]

print(lemmatized)
# Output: ['the', 'striped', 'bat', 'be', 'hang', 'on', 'their', 'foot', 'for', 'good']


[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.
['the', 'striped', 'bat', 'be', 'hang', 'on', 'their', 'foot', 'for', 'good']
