#**Stemming**
* Stemming is the process of reducing a word to its word stem that affixes to suffixes and prefixes or to the roots of words known as a "lemma" .
* Stemming is important in Natural Language Understanding(NLU) and Natural Language Processing(NLP).

In [None]:
## Classification of Problem
## Comments of product is a positive review or negative review
## Reviews ----> eating, eat, eaten [going, gone, goes] ---> go

words = ["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

## **1. Porter Stemmer**
 * It is based on the idea that the suffixes in the English language are made up of a combination of smaller and simpler suffixes.
 * The main applications of Porter Stemmer include data mining and Information retrieval.

In [None]:
from nltk.stem import PorterStemmer

In [None]:
stemming = PorterStemmer()

In [None]:
for word in words:
  print(word + '---->' + stemming.stem(word))

eating---->eat
eats---->eat
eaten---->eaten
writing---->write
writes---->write
programming---->program
programs---->program


In [None]:
stemming.stem('congratulations')

'congratul'

In [None]:
stemming.stem('sitting')

'sit'

## **2. RegExpStemmer class**

* NLTK has RegExpStemmer class with the help of which we can easily implement Regular Expression Stemmer algorithms.
* It basically takes a single regular expression and removes any prefix or suffix that matches the expression.
* Let us see an example :

In [None]:
from nltk.stem import RegexpStemmer

In [None]:
reg_stemmer = RegexpStemmer('ing$|s$|e$|able$',min = 4)

In [None]:
reg_stemmer.stem('eating')

'eat'

In [None]:
reg_stemmer.stem('ineating')

'ineat'

## **3. Snowball Stemmer**
* The Snowball Stemmer, compared to the Porter Stemmer, is multi-lingual as it can handle non-English words.
* It is known for efficient processing of small strings.
* The Snowball stemmer is way more aggressive than Porter Stemmer and is also referred to as Porter2 Stemmer.

In [None]:
from nltk.stem import SnowballStemmer

In [None]:
snowballstemmer = SnowballStemmer('english')

In [None]:
for word in words:
  print(word + "---->" + snowballstemmer.stem(word))

eating---->eat
eats---->eat
eaten---->eaten
writing---->write
writes---->write
programming---->program
programs---->program


In [None]:
stemming.stem("fairly"),stemming.stem("sportingly")

('fairli', 'sportingli')

In [None]:
snowballstemmer.stem("fairly"),snowballstemmer.stem("sportingly")

('fair', 'sport')

In [None]:
snowballstemmer.stem('goes')

'goe'

In [None]:
stemming.stem('goes')

'goe'

# Applications of Stemming :
1. Stemming is used in Info. Retrieval Systems like Search Engines.
2. It is used to determine domain vocabularies in domain analysis.
3. Sentiment Analysis
4. A method of group analysis used on textual materials is called Document Clustering (also known as Text Clustering).


# Advantages of Stemming:
1. Stemming in natural language processing offers advantages such as text normalization, simplifying word variations to a common base form.
2. It aids in information retrieval, text mining, and reduces feature dimensionality in machine learning.


# Disadvantage of Stemming :
**1. Over-Stemming:** Over-stemming in natural language processing occurs when a stemmer produces incorrect root forms or non-valid words. This can result in a loss of meaning and readability.

**2. Under-Stemming:** Under-stemming in natural language processing arises when a stemmer fails to produce accurate root forms or reduce words to their base form. This can result in a loss of information and hinder text analysis.