## Stemming in NLTK  
Stemming is a process of reducing word to its root word that affixes to suffixes and prefixes.  
It is important for Natural Language Understanding (NLU) and Natural Language Processing (NLP).

Let's say i have words like -  
- eat, eating, eaten,  
- go, going, gone  

All these words have root word as "eat" and "go" respectively. So Stemming is done to find those root words which crucial to form a context. Beacuse at the end of the day, all other forms of the root words are just only increasing the parameters of the model. It tries to get common base form without forming the real dictionary.  
The more the parameters of the model, the more it takes to train.

It is good for classification problems like Positive feedback or Negative feedback for a product, Email is Spam or Ham, etc.

In [2]:
words = ["go", "going", "gone", "eat", "eating", "eaten", "programming", "fairly", "fairness", "history"]

#### PorterStemmer

In [3]:
from nltk.stem import PorterStemmer
stemming = PorterStemmer()

for word in words:
    print(f"{word} ---> {stemming.stem(word)}")

go ---> go
going ---> go
gone ---> gone
eat ---> eat
eating ---> eat
eaten ---> eaten
programming ---> program
fairly ---> fairli
fairness ---> fair
history ---> histori


When stemming is applied, for some of the words, we won't get the correct word. For example:  
- fairly ---> fairli
- history ---> histori  

This is the major problem with stemming because it has changed the entire meaning of the word.

#### RegexpStemmer  
NLTK has RegexpStemmer class which helps us to implement Regular Expression Stemmer Algorithms. It takes a Regular Expression and removes any suffixes or prefixes present in the word that matches the expression. 

In [4]:
from nltk.stem import RegexpStemmer
reg_stemmer = RegexpStemmer("ing$|s$|e$|able$|ed$|ness$|ly$", min = 5)

## use of min parameter is to avoid stemming small words like "go", "eat" etc.
## example: "go" ---> "g" if min is not used. if min = 5, it will not stem "go" because its length is less than 5.

for word in words:
    ans = reg_stemmer.stem(word)
    print(f"{word} --> {ans}")


go --> go
going --> go
gone --> gone
eat --> eat
eating --> eat
eaten --> eaten
programming --> programm
fairly --> fair
fairness --> fair
history --> history


#### Snowball Stemmer


In [None]:
from nltk.stem import SnowballStemmer
snow_stemmer = SnowballStemmer("english")

In [6]:
for word in words:
    print(f"{word} ---> {snow_stemmer.stem(word)}")

go ---> go
going ---> go
gone ---> gone
eat ---> eat
eating ---> eat
eaten ---> eaten
programming ---> program
fairly ---> fair
fairness ---> fair
history ---> histori
