<a href="https://colab.research.google.com/github/IrfanKpm/machine-learning-diaries/blob/main/NLP/_003__Stemming_and_Lemmatization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Stemming**

Stemming is a process that reduces words to their root form by chopping off derivational affixes. Itâ€™s often more aggressive and can sometimes produce words that are not real (e.g., "running" becomes "run").

In [1]:
from nltk.stem import PorterStemmer

# Initialize the stemmer
stemmer = PorterStemmer()

# List of words
words = ["talking", "eating", "adjustable", "ate", "running", "flies", "better", "has", "geese", "swimming"]

# Apply stemming
for word in words:
    print(f"Original: {word} -> Stemmed: {stemmer.stem(word)}")

Original: talking -> Stemmed: talk
Original: eating -> Stemmed: eat
Original: adjustable -> Stemmed: adjust
Original: ate -> Stemmed: ate
Original: running -> Stemmed: run
Original: flies -> Stemmed: fli
Original: better -> Stemmed: better
Original: has -> Stemmed: ha
Original: geese -> Stemmed: gees
Original: swimming -> Stemmed: swim


## **Lemmatization**

Lemmatization reduces words to their base or dictionary form (lemma) by considering the context and part of speech. Unlike stemming, it results in meaningful words that exist in the language.

In [3]:
import spacy

# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# List of words to lemmatize
words = ["talking", "eating", "adjustable", "ate", "running", "flies", "better", "has", "geese", "swimming"]

# Process words and apply lemmatization
for word in words:
    doc = nlp(word)
    lemma = doc[0].lemma_
    print(f"Original: {word} -> Lemmatized: {lemma}")

Original: talking -> Lemmatized: talk
Original: eating -> Lemmatized: eat
Original: adjustable -> Lemmatized: adjustable
Original: ate -> Lemmatized: eat
Original: running -> Lemmatized: run
Original: flies -> Lemmatized: fly
Original: better -> Lemmatized: well
Original: has -> Lemmatized: have
Original: geese -> Lemmatized: geese
Original: swimming -> Lemmatized: swim
