# Lemmatization - Text Processing

Lemmatization is the process of reducing words to their base or root form, known as a lemma. This is particularly useful in natural language processing (NLP) tasks, as it helps in normalizing text data and improving the performance of various algorithms.
Lemmatization considers the context and the part of speech of a word, making it more accurate than stemming, which simply truncates words.

Examples: Q&A, Chatbot, Text Summerzation

### Wordnet Lemmatizer
- WordNet is a large lexical database of English that groups words into sets of synonyms called synsets. It provides short definitions and usage examples, making it a valuable resource for natural language processing tasks.
- The WordNet Lemmatizer uses the WordNet database to find the lemma of a word based on its part of speech (POS). This allows for more accurate lemmatization compared to simpler methods that do not consider context.


In [1]:
from nltk.stem import WordNetLemmatizer

lemmitizer = WordNetLemmatizer()

In [2]:

"""
Parameters
word : str
The input word to lemmatize.

pos : str
The Part Of Speech tag. Valid options are "n" for nouns, "v" for verbs, "a" for adjectives, "r" for adverbs and "s" for satellite adjectives.
"""
lemmitizer.lemmatize('going')  ## by default it treat as noun

'going'

In [None]:
lemmitizer.lemmatize('going', pos='v')  ### now pass verb

'go'

In [4]:
lemmitizer.lemmatize('going', pos='a')

'going'

In [5]:
lemmitizer.lemmatize('going', pos='r')

'going'

In [6]:
words = ['eats', 'eaten', 'eating', 'liked', 'liking', 'likes', 'programming', 'programmed', 'programs', 'programmer', 'running', 'runner', 'ran', 'better', 'best', 'good', 'history']

In [7]:
for word in words:
    print(word + " -----> "+ lemmitizer.lemmatize(word)) ## by default noun

eats -----> eats
eaten -----> eaten
eating -----> eating
liked -----> liked
liking -----> liking
likes -----> like
programming -----> programming
programmed -----> programmed
programs -----> program
programmer -----> programmer
running -----> running
runner -----> runner
ran -----> ran
better -----> better
best -----> best
good -----> good
history -----> history


In [8]:
for word in words:
    print(word + " -----> "+ lemmitizer.lemmatize(word, pos='v')) ## using verb tag

eats -----> eat
eaten -----> eat
eating -----> eat
liked -----> like
liking -----> like
likes -----> like
programming -----> program
programmed -----> program
programs -----> program
programmer -----> programmer
running -----> run
runner -----> runner
ran -----> run
better -----> better
best -----> best
good -----> good
history -----> history


In [17]:
lemmitizer.lemmatize('congratulations', pos='n')

'congratulation'

In [None]:
lemmitizer.lemmatize('fairly', pos='r'), lemmitizer.lemmatize('sportingly', pos='v')   

('fairly', 'sportingly')