### <center><b><i>Lemmatization</i></b></center>

        Lemmatization is a process used in natural language processing (NLP) to reduce words to their base or dictionary form, known as the lemma. The main objective of lemmatization is to normalize words so that different inflected forms of the same word are treated as the same token.

    For example, consider the words "run," "running," and "ran." The lemma for all of these words is "run." By lemmatizing these words, we can treat them as the same token, which can simplify tasks such as text analysis, sentiment analysis, and document classification.

    Lemmatization typically involves identifying the base form of a word by removing affixes such as suffixes and prefixes, as well as performing dictionary lookup to map the word to its lemma. Unlike stemming, which simply chops off affixes to derive the root form of a word, lemmatization considers the word's context and part of speech (e.g., noun, verb, adjective) to determine the correct lemma.

    For example:

        * The lemma of "running" (verb) is "run."
        * The lemma of "better" (adjective) is "good."
        * The lemma of "mice" (noun) is "mouse."

    Lemmatization helps improve the accuracy of text analysis tasks by reducing vocabulary size and grouping together related words. It is commonly used in various NLP applications such as text preprocessing, information retrieval, and machine translation. Libraries like NLTK (Natural Language Toolkit) and spaCy provide lemmatization functionality for use in Python-based NLP projects.

#### 1. Wordnet Lemmatizer

        The WordNet Lemmatizer is a lemmatization tool provided by the NLTK (Natural Language Toolkit) library in Python. WordNet is a lexical database of the English language that organizes words into synsets (sets of synonyms) and provides semantic relationships between words.

    The WordNet Lemmatizer utilizes WordNet's information to lemmatize words by mapping them to their base or dictionary forms (lemmas). It considers the part of speech (POS) of each word to determine the appropriate lemma.

Here's an example of how to use the WordNet Lemmatizer in Python with NLTK:

In [1]:
from nltk.stem import WordNetLemmatizer

In [2]:
lemmatizer = WordNetLemmatizer()

In [5]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\DELL\AppData\Roaming\nltk_data...


True

In [7]:
'''
POS
Noun - n
verb - v
adjective - a
adverb - r
'''
### by default it will take n
lemmatizer.lemmatize('going', pos = 'v')

'go'

In [8]:
words = ["running", "runner", "runs", "walked", "Universal", "University", "walking", "eats", "eating", "jumped", "jumping", "swimmer"]

In [12]:
for word in words:
    print(word + " ---> " + lemmatizer.lemmatize(word , 'v'))

running ---> run
runner ---> runner
runs ---> run
walked ---> walk
Universal ---> Universal
University ---> University
walking ---> walk
eats ---> eat
eating ---> eat
jumped ---> jump
jumping ---> jump
swimmer ---> swimmer


Lemmatizers are better usefull in Q&A , Chatbots and also in Text summarisation.