# Stemming using NLTK

Stemming is a process in Natural Language Processing (NLP) that reduces words to their base or root form by removing suffixes.

For example, words like "running", "runner", and "ran" are reduced to "run". The purpose of stemming is to simplify the text for further analysis, often in tasks like search, information retrieval, or text classification.

In [8]:
import nltk
import spacy

from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer

In [9]:
# Download NLTK resources if not already downloaded
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


True

In [10]:
# Initialize the Porter Stemmer
stemmer = PorterStemmer()

def stem_text(text: str) -> str:
    """
    Stem the text using the Porter Stemmer.
    Parameters: text (str): The text to be stemmed.
    Returns: str: The stemmed text.
    """
    # Tokenize the text
    tokens = word_tokenize(text)

    # Apply stemming to each token
    stemmed_tokens = [stemmer.stem(token) for token in tokens]

    # Join the stemmed tokens back into a single string
    return " ".join(stemmed_tokens)



In [11]:
# Example usage
if __name__ == "__main__":
    sample_text = "I am running and playing in the park, but I ran out of time."
    stemmed_text = stem_text(sample_text)

    print("Original Text:", sample_text)
    print("Stemmed Text:", stemmed_text)

Original Text: I am running and playing in the park, but I ran out of time.
Stemmed Text: i am run and play in the park , but i ran out of time .
