# Stemming

Stemming is the process of reducing a word into its 'word stem' that affixes to suffixes and prefixes to the roots of word known as lemma. Stemming is important in natural language understanding (NLU) and natural language processing (NLP).

In [14]:
from nltk.stem import PorterStemmer

In [15]:
stemming = PorterStemmer()

In [16]:
words = ["eating", "eats", "eaten", "writing", "writes", "programming", "programs", "history", "finally", "finalized"]

In [17]:
for word in words:
    print(word+"---->"+stemming.stem(word))

eating---->eat
eats---->eat
eaten---->eaten
writing---->write
writes---->write
programming---->program
programs---->program
history---->histori
finally---->final
finalized---->final


In [18]:
stemming.stem('congratulation')

'congratul'

In [19]:
stemming.stem("understanding")

'understand'

## Lancaster Stemming Algorithm

In [20]:
### LAncaster Stemming Algorithm
from nltk.stem import LancasterStemmer

In [21]:
lancaster = LancasterStemmer()

In [22]:
 for word in words:
    print(word+"---->"+lancaster.stem(word))

eating---->eat
eats---->eat
eaten---->eat
writing---->writ
writes---->writ
programming---->program
programs---->program
history---->hist
finally---->fin
finalized---->fin


# RegexpStemmer Class

NLTK has RegxpStemmer class with the help of which we can easily implement Regular Expression Stemmer algorithm. 
It basically take a single regular expression and remove any perfic or suffix that match the expression.

In [23]:
from nltk.stem import RegexpStemmer

In [24]:
reg_stemmer = RegexpStemmer('ing$|s$|e$|able$', min=4)

In [25]:
reg_stemmer.stem("eating")

'eat'

In [26]:
reg_stemmer.stem("ingplaying")

'ingplay'

In [27]:
#Snowball Stemmer

In [28]:
from nltk.stem import SnowballStemmer

In [29]:
snowballStemmer = SnowballStemmer('english', ignore_stopwords=False)

In [30]:
for word in words:
    print(word+"---->"+snowballStemmer.stem(word))

eating---->eat
eats---->eat
eaten---->eaten
writing---->write
writes---->write
programming---->program
programs---->program
history---->histori
finally---->final
finalized---->final


In [31]:
stemming.stem("fairly"), stemming.stem("sportingly")

('fairli', 'sportingli')

In [32]:
snowballStemmer.stem("fairly"), snowballStemmer.stem("sportingly")

('fair', 'sport')

### Advantage of Stemming

Stemming reduces words to their root form, which helps in reducing vocabulary size and improving processing speed in NLP tasks.

### Disadvantage of Stemming

Stemming may produce non-meaningful or incorrect root words because it follows rule-based chopping without understanding context.

### Use Case of Stemming

Stemming is commonly used in search engines and information retrieval systems, where exact word meaning is less important than matching similar word forms.