# Stemming

Rowghly speaking, stemming is to find the largest word possible that can serve as a root for several similar words. It can me more complex than this, but essentially it is to form family of similar words that tend to share a common root, i.e., the stem.

## Porter Stemmer
This is the most common stemming algorithm to find stems, developed by [Martin Porter](https://en.wikipedia.org/wiki/Martin_Porter)

In [2]:
import nltk
from nltk.stem.porter import PorterStemmer

p_stemmer = PorterStemmer()
words = ['run', 'runner', 'ran', 'runs', 'easily', 'fairly']

In [5]:
for word in words:
    print(f"{word:{20}}--> {p_stemmer.stem(word)}")

run                 --> run
runner              --> runner
ran                 --> ran
runs                --> run
easily              --> easili
fairly              --> fairli


## Snowball
Snowball is an improvement over the Porter Stemmer.

In [10]:
from nltk.stem.snowball import SnowballStemmer

s_stemmer = SnowballStemmer(language='english')
for word in words:
     print(f"{word:{20}}--> {s_stemmer.stem(word)}")

run                 --> run
runner              --> runner
ran                 --> ran
runs                --> run
easily              --> easili
fairly              --> fair


Notice the difference in the word fair, which is better and more general.

Let's see how it behaves with a harder set of words.

In [11]:
from typing import List

def stem(words :List[str], stemmer) -> List[str]:
    return [stemmer.stem(word) for word in words]

In [15]:
import pandas as pd

# Notice these words share roots but mean totally different things
words = ['generous', 'generation', 'generously', 'generate']
with_porter = stem(words, p_stemmer)
with_snowball = stem(words, s_stemmer)

word_list = list(zip(words, with_porter, with_snowball))
words_df = pd.DataFrame(word_list, columns=['word', 'porter', 'snowball'])
words_df

Unnamed: 0,word,porter,snowball
0,generous,gener,generous
1,generation,gener,generat
2,generously,gener,generous
3,generate,gener,generat


We can observe here how better the snowball stemmer behaves against the porter stemmer.