# Stemming
- Stemming is a process in Natural Language Processing (NLP) that reduces words to their base or root form, typically by stripping suffixes.
- The goal is to simplify text by unifying different forms of a word (e.g., "running," "ran," "runner") into a common stem (e.g., "run").
- Stemming is particularly useful in tasks like text mining, information retrieval, and search engines, where understanding the core meaning of a word is more important than its grammatical form.

### Types of Stemming Algorithms

#### Porter Stemmer:
- One of the most popular stemming algorithms, developed by Martin Porter in 1980.
- It uses a set of rules to iteratively reduce words to their root form.

In [1]:
from nltk.stem import PorterStemmer
porter_stemmer = PorterStemmer()
print(porter_stemmer.stem("running"))  # Output: run
print(porter_stemmer.stem("connected"))  # Output: connect


run
connect


#### Lancaster Stemmer:
- A more aggressive stemming algorithm compared to the Porter Stemmer, which sometimes results in very short stems.

In [2]:
from nltk.stem import LancasterStemmer
lancaster_stemmer = LancasterStemmer()
print(lancaster_stemmer.stem("running"))  # Output: run
print(lancaster_stemmer.stem("connected"))  # Output: connect
print(lancaster_stemmer.stem("responsiveness"))  # Output: respond


run
connect
respond


#### Snowball Stemmer:
- Also known as the "Porter2" stemmer, it is an improvement over the original Porter Stemmer.
- It supports multiple languages and is considered more consistent and efficient.


In [3]:
from nltk.stem import SnowballStemmer
snowball_stemmer = SnowballStemmer("english")
print(snowball_stemmer.stem("running"))  # Output: run
print(snowball_stemmer.stem("connected"))  # Output: connect


run
connect


#### Regex-based Stemmer:
- A simple approach where regular expressions are used to manually define stemming rules.

In [4]:
import re
def simple_stemmer(word):
    return re.sub(r'(ing|ed|s)$', '', word)

print(simple_stemmer("running"))  # Output: runn
print(simple_stemmer("connected"))  # Output: connect


runn
connect


#### Customized Stemmer:
- In some cases, a custom stemmer may be developed to handle specific stemming needs, especially when working with domain-specific texts.

In [5]:
from nltk.stem import PorterStemmer

class CustomStemmer(PorterStemmer):
    def stem(self, word):
        if word.endswith('ness'):
            return word[:-4]
        return super().stem(word)

custom_stemmer = CustomStemmer()
print(custom_stemmer.stem("happiness"))  # Output: happy
print(custom_stemmer.stem("running"))  # Output: run


happi
run


## Use Cases and Considerations

#### *Search Engines:* 
- Stemming helps in improving search results by matching different forms of a word to the same stem.
#### *Text Classification:* 
- Simplifies feature extraction by reducing the vocabulary size, leading to more efficient algorithms.

### *Limitations:* 
- Stemming can sometimes produce non-existent words (e.g., "studies" to "studi") and may not handle irregular forms well. In some cases, lemmatization, which is more sophisticated, might be preferred.