# Text Pre-processing using NLTK
We can refer to the notes at **Notion** for the same topic.<br>
Firstly, we will declare a list containing some words which might belong to same *Stems*.

In [5]:
words = ["eat", "eats", "eaten", "writing", "writes", "programming", "programs", "history", "finally", "finalize"]

There can be different other methods from which, we can perform the *Stemming*.<br>
#### PorterStemmer
First one is **PorterStemmer**.

In [6]:
from nltk.stem import PorterStemmer

stemming = PorterStemmer()
for word in words:
    print(word + " --> " + stemming.stem(word=word))

eat --> eat
eats --> eat
eaten --> eaten
writing --> write
writes --> write
programming --> program
programs --> program
history --> histori
finally --> final
finalize --> final


The above might not work for some of the words. This is the disadvantage of **Stemming**.

In [7]:
stemming.stem("congratulations")

'congratul'

We have other Stemming techniques. One of them is **RegexStemmer**.<br>
#### RegexStemmer class
NLTK has RegexStemmer class with the help of which we can easily implement Regular Expression Stemmer algorithms. It basically takes a single regular expression and removes any prefix or suffix that matches the expression. Let us see an example.

In [9]:
from nltk.stem import RegexpStemmer

reg_stemmer = RegexpStemmer('ing$|s$|able$', min=4)
reg_stemmer.stem("eating")

'eat'

In [10]:
reg_stemmer.stem("ingeating")

'ingeat'

In the above, we can see that the words ending with *ing*, *s* and *able* can get the **Stem**. Otherwise, we have to manually mention while creating the `RegexStemmer`.<br>
#### Snowball Stemmer
Another stemmer, that we are going to use is `Snowball Stemmer`. It performs better than the `PorterStemmer` in terms of accuracy.

In [12]:
from nltk.stem import SnowballStemmer

snowball_stemmer = SnowballStemmer("english")
for word in words:
    print(word + " --> " + snowball_stemmer.stem(word))

eat --> eat
eats --> eat
eaten --> eaten
writing --> write
writes --> write
programming --> program
programs --> program
history --> histori
finally --> final
finalize --> final


Comparison between `PorterStemmer` and `SnowballStemmer`.

In [14]:
stemming.stem("fairly"), stemming.stem("sportingly")

('fairli', 'sportingli')

In [15]:
snowball_stemmer.stem("fairly"), snowball_stemmer.stem("sportingly")

('fair', 'sport')

In [16]:
stemming.stem("goes"), snowball_stemmer.stem("goes")

('goe', 'goe')

Sometimes, **Stemming** doesn't work great when processing the text in NLP.<br>
We can definitely not use this technique for the *Chat Bots*. The better technique is **Lemmatization**.