## Text Preprocessing for Machine Learning - Stemming

Stemming is the process of reducing a word to its base or root form by chopping off the end (and sometimes the beginning) of the word.

The goal is to group together different forms of the same word, even if the root isn't a real word itself.

Simple Analogy:
   Think of a gardener trimming a bush back to its main stems. They cut off the branches and leaves (the suffixes and prefixes) to get to the core structure.

Key Point:
    Stemming is a crude but fast method. It often creates roots that are not actual words.

Examples:

- running → run

- happily → happili (Note: this isn't a real word, but it's the stem)

- cats → cat

- argued, arguing, argues → argu

Why it's used:

Its main purpose is in search engines and text analysis. 

If you search for "running," stemming helps the engine also return results for "run" and "ran," because they all reduce to the same root.

## Stemming code Implementation 

In [None]:
words = ["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

### PorterStemmer

In [2]:
from nltk.stem import PorterStemmer

stemming=PorterStemmer()

for word in words:
    print(word+"---->"+stemming.stem(word))

RuntimeError: CPU dispatcher tracer already initlized

In [6]:
stemming.stem('congratulations')

'congratul'

In [7]:
stemming.stem("sitting")

'sit'

### Snowball Stemmer

It is a stemming algorithm which is also known as the Porter2 stemming algorithm as 

it is a better version of the Porter Stemmer since some issues of it were fixed in this stemmer.

In [5]:
from nltk.stem import SnowballStemmer

snowballsstemmer=SnowballStemmer('english')

for word in words:
    print(word+"---->"+snowballsstemmer.stem(word))

RuntimeError: CPU dispatcher tracer already initlized

In [11]:
stemming.stem("fairly")

'fairli'

In [12]:
stemming.stem("sportingly")

'sportingli'

In [13]:
snowballsstemmer.stem("fairly")

'fair'

In [14]:
snowballsstemmer.stem("sportingly")

'sport'

In [15]:
snowballsstemmer.stem('goes')

'goe'

In [16]:
stemming.stem('goes')

'goe'