## Stemming
Stemming is the process of reducing a word to its word stem that affixes to suffixes and prefixes or to the roots of words known as a lemma. Stemming is important in natural language understanding (NLU) and natural language processing (NLP).

In [23]:
words = [
    "running", "runner", "runs",
    "happily", "happiness", "happy",
    "flies", "flying", "fly",
    "denied", "denying", "denies",
    "lying", "lied", "lies",
    "relational", "relation", "relations"
]
print(words)

['running', 'runner', 'runs', 'happily', 'happiness', 'happy', 'flies', 'flying', 'fly', 'denied', 'denying', 'denies', 'lying', 'lied', 'lies', 'relational', 'relation', 'relations']


## Porter Stemmer

In [24]:
from nltk.stem import PorterStemmer

In [25]:
porter_stemmer = PorterStemmer()

In [26]:
for word in words:
    stemmed_word = porter_stemmer.stem(word)
    print(f"Original: {word} --> Stemmed: {stemmed_word}")

Original: running --> Stemmed: run
Original: runner --> Stemmed: runner
Original: runs --> Stemmed: run
Original: happily --> Stemmed: happili
Original: happiness --> Stemmed: happi
Original: happy --> Stemmed: happi
Original: flies --> Stemmed: fli
Original: flying --> Stemmed: fli
Original: fly --> Stemmed: fli
Original: denied --> Stemmed: deni
Original: denying --> Stemmed: deni
Original: denies --> Stemmed: deni
Original: lying --> Stemmed: lie
Original: lied --> Stemmed: lie
Original: lies --> Stemmed: lie
Original: relational --> Stemmed: relat
Original: relation --> Stemmed: relat
Original: relations --> Stemmed: relat


### RegexpStemmer class
NLTK has RegexpStemmer class with the help of which we can easily implement Regular Expression Stemmer algorithms. It basically takes a single regular expression and removes any prefix or suffix that matches the expression. Let us see an example

In [27]:
from nltk.stem import RegexpStemmer

In [28]:
reg_stemmer=RegexpStemmer('ing$|s$|e$|able$', min=4)

In [29]:
reg_stemmer.stem('turnable')

'turn'

In [30]:
reg_stemmer.stem('ableturnable')

'ableturn'

### Snowball Stemmer
 It is a stemming algorithm which is also known as the Porter2 stemming algorithm as it is a better version of the Porter Stemmer since some issues of it were fixed in this stemmer.

In [31]:
from nltk.stem import SnowballStemmer

In [32]:
snowball = SnowballStemmer("english")

In [33]:
for word in words:
    stemmed_word = porter_stemmer.stem(word)
    print(f"Original: {word} --> Stemmed: {stemmed_word}")

Original: running --> Stemmed: run
Original: runner --> Stemmed: runner
Original: runs --> Stemmed: run
Original: happily --> Stemmed: happili
Original: happiness --> Stemmed: happi
Original: happy --> Stemmed: happi
Original: flies --> Stemmed: fli
Original: flying --> Stemmed: fli
Original: fly --> Stemmed: fli
Original: denied --> Stemmed: deni
Original: denying --> Stemmed: deni
Original: denies --> Stemmed: deni
Original: lying --> Stemmed: lie
Original: lied --> Stemmed: lie
Original: lies --> Stemmed: lie
Original: relational --> Stemmed: relat
Original: relation --> Stemmed: relat
Original: relations --> Stemmed: relat


In [38]:
examples = ["sportingly", "fairly", "occasionally", "goes"]

for word in examples:
    porter_result = porter_stemmer.stem(word)
    snowball_result = snowball.stem(word)
    print(f"{word:12} | Porter: {porter_result:10} | Snowball: {snowball_result}")

sportingly   | Porter: sportingli | Snowball: sport
fairly       | Porter: fairli     | Snowball: fair
occasionally | Porter: occasion   | Snowball: occasion
goes         | Porter: goe        | Snowball: goe
