# Text Pre-Processing (Stemming)

#### This practice file is designed to provide a hands-on introduction to stemming, an essential preprocessing step in Natural Language Processing (NLP). 

1) Stemming: Stemming is the process of reducing words to their root form by stripping prefixes or suffixes. The root form may not always be a valid word, but it effectively groups similar words together for text processing tasks. For example: Running, runs, and ran → run


2) Stemmer: A stemmer is an algorithm or tool used to perform stemming.


3) Lemmatization: Often confused with stemming, lemmatization reduces words to their base form (lemma) by considering the word's meaning and grammatical structure. For example: Running → run (same as stemming) , Better → good (different from stemming)

## PorterStemmer 

In [2]:
from nltk.stem import PorterStemmer

In [4]:
words = ['run', 'runner', 'running', 'ran', 'runs', 'easily', 'fairly']

In [3]:
stemming = PorterStemmer()

In [5]:
for word in words:
    print(word + ' -> ' + stemming.stem(word))

run -> run
runner -> runner
running -> run
ran -> ran
runs -> run
easily -> easili
fairly -> fairli


## RegexpStemmer

In [6]:
from nltk.stem import RegexpStemmer

In [8]:
# For example, remove common suffixes like 'ing', 'ed', 'ly', etc.
stemmer = RegexpStemmer('ing$|er$|ly$|es$|s$', min=4)

In [9]:
words = ['run', 'runner', 'running', 'ran', 'runs', 'easily', 'fairly']

In [10]:
stemmed_words = [stemmer.stem(word) for word in words]

print("Original Words:", words)
print("Stemmed Words:", stemmed_words)

Original Words: ['run', 'runner', 'running', 'ran', 'runs', 'easily', 'fairly']
Stemmed Words: ['run', 'runn', 'runn', 'ran', 'run', 'easi', 'fair']


## Snowball Stemmer

In [11]:
from nltk.stem import SnowballStemmer

In [12]:
snowballstemmer=SnowballStemmer('english')

In [13]:
for word in words:
    print(word + ' -> ' + snowballstemmer.stem(word))

run -> run
runner -> runner
running -> run
ran -> ran
runs -> run
easily -> easili
fairly -> fair
