### **Stemming and lemmatization are both techniques used in natural language processing to reduce words to their base or root forms. While stemming usually involves chopping off suffixes to obtain the word stem, lemmatization involves reducing words to their dictionary or lemma form. Here's how you can implement both concepts using Python:**

**1. Stemming with NLTK:**
NLTK (Natural Language Toolkit) is a popular library for text processing in Python. It provides various stemming algorithms, including Porter and Lancaster stemmers.

In [1]:
from nltk.stem import PorterStemmer

def stem_text(text):
    stemmer = PorterStemmer()
    stemmed_words = [stemmer.stem(word) for word in text.split()]
    return ' '.join(stemmed_words)

text = "Stemming is the process of reducing words to their base or root forms"
stemmed_text = stem_text(text)
print(stemmed_text)

stem is the process of reduc word to their base or root form


**Porter Stemmer:**
The Porter stemming algorithm is one of the most commonly used stemming algorithms. It's available in NLTK.

In [6]:
from nltk.stem import PorterStemmer

def porter_stem(text):
    stemmer = PorterStemmer()
    stemmed_words = [stemmer.stem(word) for word in text.split()]
    return ' '.join(stemmed_words)

text = "Stemming is the process of reducing words to their base or root forms"
stemmed_text_porter = porter_stem(text)
print(stemmed_text_porter)

stem is the process of reduc word to their base or root form


**Snowball Stemmer (Porter2 or English Stemmer):**
The Snowball stemmer is a more modern and slightly improved version of the Porter stemmer.

In [7]:
from nltk.stem import SnowballStemmer

def snowball_stem(text):
    stemmer = SnowballStemmer("english")
    stemmed_words = [stemmer.stem(word) for word in text.split()]
    return ' '.join(stemmed_words)

text = "Stemming is the process of reducing words to their base or root forms"
stemmed_text_snowball = snowball_stem(text)
print(stemmed_text_snowball)

stem is the process of reduc word to their base or root form


**Lancaster Stemmer:**
The Lancaster stemmer is a more aggressive stemming algorithm compared to the Porter stemmer.

In [8]:
from nltk.stem import LancasterStemmer

def lancaster_stem(text):
    stemmer = LancasterStemmer()
    stemmed_words = [stemmer.stem(word) for word in text.split()]
    return ' '.join(stemmed_words)

text = "Stemming is the process of reducing words to their base or root forms"
stemmed_text_lancaster = lancaster_stem(text)
print(stemmed_text_lancaster)

stem is the process of reduc word to their bas or root form


**2. Lemmatization with NLTK:**
NLTK also provides lemmatization capabilities using WordNet.

In [4]:
!pip install nltk
nltk.download('averaged_perceptron_tagger')



[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


True

In [5]:
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet

nltk.download('wordnet')

def get_wordnet_pos(word):
    """Map POS tag to first character lemmatize() accepts"""
    tag = nltk.pos_tag([word])[0][1][0].upper()
    tag_dict = {"J": wordnet.ADJ,
                "N": wordnet.NOUN,
                "V": wordnet.VERB,
                "R": wordnet.ADV}
    return tag_dict.get(tag, wordnet.NOUN)

def lemmatize_text(text):
    lemmatizer = WordNetLemmatizer()
    lemmatized_words = [lemmatizer.lemmatize(word, get_wordnet_pos(word)) for word in text.split()]
    return ' '.join(lemmatized_words)

text = "Lemmatization reduces words to their dictionary or lemma form"
lemmatized_text = lemmatize_text(text)
print(lemmatized_text)

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


Lemmatization reduces word to their dictionary or lemma form
