# Stemming

Stemming is the process of reducing a word to its base or root form, known as a stem. The goal is to group together words that have similar meanings by stripping off their suffixes. ✂️ For example, the words "running," "ran," and "runs" all have the same stem: "run.

In [20]:
from nltk.stem import PorterStemmer

In [21]:
ps = PorterStemmer()

In [22]:
connect_tokens = ["connecting", "connected", "connectivity", "connect", "connects"]
for t in connect_tokens: 
    print(t, ": ", ps.stem(t))

connecting :  connect
connected :  connect
connectivity :  connect
connect :  connect
connects :  connect


In [23]:
learn_tokens = ["learning", "learned", "learner", "learns", "learn"]
for t in learn_tokens: 
    print(t, ": ", ps.stem(t))

learning :  learn
learned :  learn
learner :  learner
learns :  learn
learn :  learn


In [24]:
likes_tokens = ["likes", "better", "worse"]
for t in likes_tokens: 
    print(t, ": ", ps.stem(t))

likes :  like
better :  better
worse :  wors


# Lemmatization
Lemmatization is a text processing technique that reduces a word to its base or dictionary form, known as a lemma. It uses a vocabulary and morphological analysis to ensure the resulting word is a valid, meaningful word. For example, the lemma for "running" is "run," and the lemma for "better" is "good."

In [30]:
import nltk
nltk.download('wordnet')
from nltk.stem import WordNetLemmatizer

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\johnn\AppData\Roaming\nltk_data...


In [31]:
lemmatizer = WordNetLemmatizer()

In [33]:
for t in connect_tokens: 
    print(t, ": ", lemmatizer.lemmatize(t))

connecting :  connecting
connected :  connected
connectivity :  connectivity
connect :  connect
connects :  connects


In [34]:
for t in likes_tokens: 
    print(t, ": ", lemmatizer.lemmatize(t))

likes :  like
better :  better
worse :  worse
