Stemming is an NLP technique that reduces words to their root or base form by removing prefixes or suffixes.
It helps treat different word forms as the same token (e.g., running, runs → run).

Stemming improves text normalization before tasks like word counting or vectorization.
It is widely used in search engines, text classification, sentiment analysis, and information retrieval.
While fast and simple, stemming can be less accurate than lemmatization because stems may not be valid words.



Here are real-world, practical examples where stemming is actually used in AI/NLP systems:

Search Engines (Google-like search)
When you search “running shoes”, the system also matches run or runs so you don’t miss relevant pages.

Chatbots & Virtual Assistants
User inputs like “booking”, “booked”, “book” are stemmed to understand the same intent.

Spam Email Filtering
Words like “win”, “winning”, “winner” are stemmed so spam patterns are detected more reliably.

Sentiment Analysis
Reviews containing “liked”, “liking”, “likes” are normalized to improve sentiment accuracy.

Document Classification & Topic Modeling
News articles with “connect”, “connected”, “connection” are grouped under the same topic.

Information Retrieval Systems (Enterprise search, RAG pipelines)
Stemming improves recall when retrieving documents from large knowledge bases.

Log & Ticket Analysis in IT Systems
Error logs with “failed”, “failing”, “failure” are normalized to identify common issues.

In [1]:
import nltk
from nltk.stem import PorterStemmer
from collections import Counter


In [2]:

# Download required tokenizer
nltk.download('punkt')

text = "Running runners run and played playing plays"

# Tokenize text into words
words = nltk.word_tokenize(text.lower())

# Initialize stemmer
stemmer = PorterStemmer()

# Apply stemming
stemmed_words = [stemmer.stem(word) for word in words]

# Count stemmed words
word_counts = Counter(stemmed_words)

print("Original words:")
print(words)

print("\nStemmed words:")
print(stemmed_words)

print("\nWord counts after stemming:")
print(word_counts)


Original words:
['running', 'runners', 'run', 'and', 'played', 'playing', 'plays']

Stemmed words:
['run', 'runner', 'run', 'and', 'play', 'play', 'play']

Word counts after stemming:
Counter({'play': 3, 'run': 2, 'runner': 1, 'and': 1})


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\admin\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
