# Stemming using NLTK
Stemming is the process of reducing a word to its root without considering context. It often produces words that are not real dictionary words.

In [1]:
!pip install nltk


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
from nltk.stem import PorterStemmer

# initialize the PorterStemmer Class
stemmer = PorterStemmer()

words = ["running", "studies", "flying", "better", "geese", "children"]
stemmed_words = [stemmer.stem(word) for word in words]

#print the results
print(stemmed_words)


['run', 'studi', 'fli', 'better', 'gees', 'children']


# Lemmatization
Lemmatization is the process of reducing a word to its base or dictionary form (lemma) while considering its context and meaning. Unlike stemming, lemmatization produces real words.

## Lemmatization with NLTK

In [3]:
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
import nltk

nltk.download('wordnet')  # Ensure you have WordNet data

lemmatizer = WordNetLemmatizer()

# Example words
words = ["running", "flies", "better", "geese", "mice", "studies", "children", "was", "went"]

# Lemmatizing words
lemmatized_words = [lemmatizer.lemmatize(word, wordnet.VERB) for word in words]

print(lemmatized_words)


[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/bilgesipal/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


['run', 'fly', 'better', 'geese', 'mice', 'study', 'children', 'be', 'go']


## Lemmatization with Spacy

In [None]:
!pip install spacy

In [5]:
import spacy
#spacy.cli.download("en_core_web_sm")

In [6]:
#Smaller Model
nlp_sm = spacy.load("en_core_web_sm")
# Large Model
nlp_lg = spacy.load("en_core_web_lg")

text = "The children were running faster than the mice, but the geese had already flown."

doc_sm = nlp_sm(text)
doc_lg = nlp_lg(text)

# Lemmatized words
lemmatized_text_sm = " ".join([token.lemma_ for token in doc_sm])
lemmatized_text_lg = " ".join([token.lemma_ for token in doc_lg])

print('Result of the small model:', lemmatized_text_sm)

print('Result of the large model:', lemmatized_text_lg)

Result of the small model: the child be run fast than the mouse , but the geese have already fly .
Result of the large model: the child be run fast than the mouse , but the goose have already fly .
