## Wordnet Lemmatizer
Lemmatization technique is like stemming. The output we will get after lemmatization is called ‘lemma’, which is a root word rather than root stem, the output of stemming. After lemmatization, we will be getting a valid word that means the same thing.

NLTK provides WordNetLemmatizer class which is a thin wrapper around the wordnet corpus. This class uses morphy() function to the WordNet CorpusReader class to find a lemma. Let us understand it with an example −


### Why use lemmatization?
- Needed when word meaning matters (e.g., sentiment analysis, machine translation, question answering).
- Handles irregular forms better (e.g., went → go, better → good).

In [1]:
## Q&A,chatbots,text summarization
import nltk
nltk.download('punkt')   # for sentence tokenization
nltk.download('punkt_tab')
nltk.download('wordnet')      # main WordNet corpus
nltk.download('omw-1.4')      # optional: multilingual WordNet (for better coverage)

from nltk.stem import WordNetLemmatizer

[nltk_data] Downloading package punkt to /Users/shyamsonu/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to
[nltk_data]     /Users/shyamsonu/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/shyamsonu/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     /Users/shyamsonu/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


In [None]:
lemmatizer = WordNetLemmatizer()

In [3]:
'''
POS- Noun-n
verb-v
adjective-a
adverb-r
'''
lemmatizer.lemmatize("going",pos='v')

'go'

In [4]:
words=["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

In [5]:
for word in words:
    print(word+"---->"+lemmatizer.lemmatize(word,pos='v'))

eating---->eat
eats---->eat
eaten---->eat
writing---->write
writes---->write
programming---->program
programs---->program
history---->history
finally---->finally
finalized---->finalize


In [6]:
lemmatizer.lemmatize("goes",pos='v')

'go'

In [7]:
lemmatizer.lemmatize("fairly",pos='v'),lemmatizer.lemmatize("sportingly")

('fairly', 'sportingly')

### Typical modern NLP pipeline

**If using classical ML (Bag-of-Words, TF-IDF, Naive Bayes, SVM):**

- Lowercase

- Remove punctuation & special chars

- Tokenize

- Lemmatize (or stem)

- Remove stopwords



**If using deep learning with embeddings (Transformers, LSTMs):**

- Lowercase (optional, depending on model)

- Minimal cleaning (keep meaning intact)

- Tokenize (model-specific tokenizer)

- No stemming/lemmatization — let the model handle it



 **Rule of thumb:**

- Classical ML models → Lemmatization is preferred (more accurate, less distortion than stemming).

- Transformer-based / pretrained embeddings → Usually skip stemming/lemmatization.