# Stemming using NLTK for text preprocessing :

### Explanation of Stemming in NLP :

Stemming is the process of reducing words to their base or root form by removing suffixes and prefixes. This is useful in Natural Language Processing (NLP) and Natural Language Understanding (NLU) to help analyze text efficiently by grouping similar words together.

### How Stemming Works :

Stemming removes affixes (prefixes and suffixes) to obtain the root form of a word. However, this root form may not always be a valid word in the dictionary.

### Common Stemming Algorithms :

##### 1.Porter Stemmer (Most common, rule-based)
##### 2.RegexStemmer (matcing pattern)
##### 3.Snowball Stemmer (Improved version of Porter)

In [23]:
## Classification Problem
## Comments of product is a positive review or negative review
## Reviews----> eating, eat,eaten [going,gone,goes]--->go

words=["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

### PorterStemmer

In [24]:
from nltk.stem import PorterStemmer
stemming=PorterStemmer()

In [25]:
for word in words:
    print(word+"---->"+stemming.stem(word))

eating---->eat
eats---->eat
eaten---->eaten
writing---->write
writes---->write
programming---->program
programs---->program
history---->histori
finally---->final
finalized---->final


In [26]:
stemming.stem('congratulations')

'congratul'

In [27]:
stemming.stem("sitting")

'sit'

In [28]:
## sometimes this PorterStemmer produces word which lost its actual meaning

## RegexpStemmer class :

NLTK has RegexpStemmer class with the help of which we can easily implement Regular Expression Stemmer algorithms. It basically takes a single regular expression and removes any prefix or suffix that matches the expression. Let us see an example

In [29]:
from nltk.stem import RegexpStemmer

In [30]:
reg_stemmer=RegexpStemmer('ing$|s$|e$|able$', min=4)

In [31]:
reg_stemmer.stem('eating')

'eat'

In [32]:
reg_stemmer.stem('ingeating')

'ingeat'

### Snowball Stemmer :

It is a stemming algorithm which is also known as the Porter2 stemming algorithm as it is a better version of the Porter Stemmer since some issues of it were fixed in this stemmer.

In [33]:
from nltk.stem import SnowballStemmer
snowballsstemmer=SnowballStemmer('english')

In [34]:
for word in words:
    print(word+"---->"+snowballsstemmer.stem(word))

eating---->eat
eats---->eat
eaten---->eaten
writing---->write
writes---->write
programming---->program
programs---->program
history---->histori
finally---->final
finalized---->final


In [35]:
print(stemming.stem("fairly"))
print(stemming.stem("sportingly"))

fairli
sportingli


In [36]:
print(snowballsstemmer.stem("fairly"))
print(snowballsstemmer.stem("sportingly"))

fair
sport


In [37]:
snowballsstemmer.stem('goes')

'goe'

In [38]:
stemming.stem('goes')

'goe'