## What is Stemming ?
Stemming is a text processing technique used in Natural Language Processing (NLP) to reduce words to their base or root form. The goal is to group together different forms of a word so they can be analyzed as a single item. For example, the words "running," "runner," and "ran" can all be reduced to the root word "run." Stemming helps in simplifying text data and improving the performance of NLP tasks like search and text analysis by treating related words as equivalent.

### Usecase : Classify the comments of any products in positive review or negative review
If the review consists of these words -> eating, eat, going, gone, go. These words are not useful in our classification also all these words have the same meaning ( context ) instead of using all these words we can just use the base word only which is we are going to the stem/root of each word this is stemming

In [3]:
WORDS = [
    "Running", "Runner", "Ran", "Runs",
    "Happily", "Happiness", "Happy",
    "Studies", "Studying", "Studied", "Study",
    "Cars", "Car",
    "Caring", "Cared", "Cares",
    "Leaves", "Leaving", "Left", "Leaf"
]

In [4]:
from nltk.stem import PorterStemmer
stemming = PorterStemmer()
for word in WORDS:
    print(f"{word} ----> {stemming.stem(word)}")

Running ----> run
Runner ----> runner
Ran ----> ran
Runs ----> run
Happily ----> happili
Happiness ----> happi
Happy ----> happi
Studies ----> studi
Studying ----> studi
Studied ----> studi
Study ----> studi
Cars ----> car
Car ----> car
Caring ----> care
Cared ----> care
Cares ----> care
Leaves ----> leav
Leaving ----> leav
Left ----> left
Leaf ----> leaf


#### Issues with Stemming : for some the words we wont get the exact meaning eg : "Study ----> studi"
#### This we resolve this with the help of lemmetization 

In [6]:
# Disadvantage example - 
stemming.stem("Congragulations") # 'congragul'

'congragul'

### RegexpStemmer :
In NLTK is a customizable stemming tool that uses regular expressions to reduce words to their root form based on user-defined patterns.

In [9]:
from nltk.stem import RegexpStemmer

regexp_stemmer = RegexpStemmer('ing$|s$|e$|able$', min=4) # regular expression passed here will be removed from the og-data words
print(regexp_stemmer.stem('eating'))
print(regexp_stemmer.stem('ingeating')) # since in the regular expression we have kept ing with $ it will remove the last 

eat
ingeat


### Snowball Stemmer : 
In NLTK is a versatile stemming tool that supports multiple languages, using a more advanced algorithm than the Porter stemmer to reduce words to their root form.

In [10]:
from nltk.stem import SnowballStemmer

snowball_stemmer = SnowballStemmer('english')

for word in WORDS:
    print(fr"{word} ---> {snowball_stemmer.stem(word)}")

Running ---> run
Runner ---> runner
Ran ---> ran
Runs ---> run
Happily ---> happili
Happiness ---> happi
Happy ---> happi
Studies ---> studi
Studying ---> studi
Studied ---> studi
Study ---> studi
Cars ---> car
Car ---> car
Caring ---> care
Cared ---> care
Cares ---> care
Leaves ---> leav
Leaving ---> leav
Left ---> left
Leaf ---> leaf


#### Stemmer vs SnowballStemmer

In [17]:
print(stemming.stem('fairly'), stemming.stem('sportingly'))
print(snowball_stemmer.stem('fairly'), snowball_stemmer.stem('sportingly'))


fairli sportingli
fair sport
