## Stemming
    Stemming is a text preprocessing technique used in Natural Language Processing (NLP) to reduce words to their root or base form, removing prefixes and suffixes. The primary goal of stemming is to simplify and standardize words, making it easier for NLP models to recognize and process text data.

In [7]:
## classification problem
## comments of product is a poistive review / negative review
## reviews --------> eating, eat, eaten, (here 'eat' is the stem/root word), and 
##  we can change it to 'eat'
## going, goes, gone, (here 'go' is the stem/root word) ---> we can change to 'go'

words = ['eating', 'eats', 'eaten', 'writing', 'write', 'writes', 'programming',
         'programmer', 'programs', 'finally', 'finalized', 'history']

### PorterStemmer
    The Porter Stemmer, also known as the Porter Stemming Algorithm, is a widely used stemming algorithm in Natural Language Processing (NLP). Developed by Martin Porter in the 1980s, it is designed to reduce words to their base or root form, known as the stem, by removing common morphological and inflexional endings.

##### How it works
    The Porter Stemmer uses a set of predefined rules to identify and remove suffixes from words. These rules are based on the observation that many English suffixes can be broken down into smaller, simpler components. The algorithm iteratively applies these rules to a word, reducing it to its stem.

In [4]:
## porterstemmer
from nltk.stem import PorterStemmer
stemming = PorterStemmer()

In [8]:
for word in words:
    print(f'{word}  ---------->  {stemming.stem(word)}')

eating  ---------->  eat
eats  ---------->  eat
eaten  ---------->  eaten
writing  ---------->  write
write  ---------->  write
writes  ---------->  write
programming  ---------->  program
programmer  ---------->  programm
programs  ---------->  program
finally  ---------->  final
finalized  ---------->  final
history  ---------->  histori


In [10]:
stemming.stem('congratulation')

'congratul'

In [12]:
stemming.stem('sitting')

'sit'

##### Disadvantage
    It does not give proper meaning of the word sometime.

### Regex Stemmer Class
    The Regex Stemmer class is a type of stemmer that uses regular expressions to identify and remove morphological affixes from words. It is a part of the NLTK (Natural Language Toolkit) library in Python.

In [1]:
from nltk.stem import RegexpStemmer
reg_stemmer = RegexpStemmer('ing$|s$|e$|able$', min=4)

In [2]:
reg_stemmer.stem('eating')

'eat'

In [3]:
reg_stemmer.stem('eatable')

'eat'

In [4]:
reg_stemmer.stem('history')

'history'

In [5]:
reg_stemmer.stem('congratulations')

'congratulation'

In [8]:
for word in words:
    print(f"{word}  ---------->  {reg_stemmer.stem(word)}")

eating  ---------->  eat
eats  ---------->  eat
eaten  ---------->  eaten
writing  ---------->  writ
write  ---------->  writ
writes  ---------->  write
programming  ---------->  programm
programmer  ---------->  programmer
programs  ---------->  program
finally  ---------->  finally
finalized  ---------->  finalized
history  ---------->  history


## Snowball Stemmer
    The Snowball Stemmer, also known as the Porter2 Stemmer, is a stemming algorithm used in Natural Language Processing (NLP) to reduce words to their base or root form, known as the stem. It is an updated version of the Porter Stemmer, developed by Martin Porter, and designed to handle multiple languages, including English, French, German, Spanish, and others.

In [9]:
from nltk.stem import SnowballStemmer
ball = SnowballStemmer(language = 'english')

In [10]:
for word in words:
    print(f'{word}   ----------->     {ball.stem(word)}')

eating   ----------->     eat
eats   ----------->     eat
eaten   ----------->     eaten
writing   ----------->     write
write   ----------->     write
writes   ----------->     write
programming   ----------->     program
programmer   ----------->     programm
programs   ----------->     program
finally   ----------->     final
finalized   ----------->     final
history   ----------->     histori


In [25]:
stemming.stem('fairly'), stemming.stem('sportingly')

('fairli', 'sportingli')

In [26]:
ball.stem('fairly'), ball.stem('sportingly')

('fair', 'sport')

In [28]:
ball.stem('going'), ball.stem('goes')

('go', 'goe')

In [29]:
stemming.stem('going'), stemming.stem('goes')

('go', 'goe')