# Lemmatization using NLTK

Converts words to their dictionary form
     (e.g., “running” → “run”, “better” → “good”).

Method: Considers context and part of speech.

Tool: 𝑊𝑜𝑟𝑑𝑁𝑒𝑡𝐿𝑒𝑚𝑚𝑎𝑡𝑖𝑧𝑒𝑟 in NLTK, 𝑛𝑙𝑝.𝑙𝑒𝑚𝑚𝑎𝑡𝑖𝑧𝑒𝑟 in SpaCy.

Example: “done” → “do”.

Actually, Stemming cuts off word endings, often creating incomplete words, while lemmatization returns proper base forms using dictionaries

In [15]:
# Import necessary libraries
import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer


In [16]:
# Download necessary NLTK resources
nltk.download('punkt_tab', quiet=True)
nltk.download('wordnet', quiet=True)
nltk.download('omw-1.4', quiet=True)  # For additional wordnet support

True

In [17]:
# Initialize the Lemmatizer
lemmatizer = WordNetLemmatizer()

def lemmatize_text(text: str) -> str:
    """
    Lemmatize the text using WordNetLemmatizer from NLTK.
    Parameters: text (str): The text to be lemmatized.
    Returns: str: The lemmatized text.
    """
    # Tokenize the text
    tokens = word_tokenize(text)

    # Lemmatize each token
    lemmatized_tokens = [lemmatizer.lemmatize(token) for token in tokens]

    # Join the lemmatized tokens back into a single string
    return " ".join(lemmatized_tokens)



In [18]:
# Example usage
if __name__ == "__main__":
    sample_text = "The cats are running faster than the dogs, but they are better at playing."
    lemmatized_text = lemmatize_text(sample_text)

    print("Original Text:", sample_text)
    print("Lemmatized Text:", lemmatized_text)


Original Text: The cats are running faster than the dogs, but they are better at playing.
Lemmatized Text: The cat are running faster than the dog , but they are better at playing .
