# Stemming:

<p> Stemming reduces words to their base or root form by removing suffixes.
It uses simple rules to remove common suffixes like "-s," "-es," "-ing," "-ed," etc.
The resulting stemmed words may not always be actual words or have a meaningful semantic interpretation.
Stemming is generally faster than lemmatization but may produce less accurate results.
Example: The word "running" would be stemmed to "run," and "jumps" would become "jump."</p>

# Lemmatization:

<p> Lemmatization also reduces words to their base or root form, but it considers the word's context and part of speech (POS) tag.
It applies more complex rules and uses a vocabulary or morphological analysis to transform words to their base form.
The resulting lemmatized words are usually actual words and have a meaningful semantic interpretation.
Lemmatization typically takes more computational time compared to stemming but can produce more accurate results.
Example: The word "running" would be lemmatized to "run," and "jumps" would remain "jump."
Both stemming and lemmatization have their advantages and disadvantages, and the choice depends on the specific task and requirements. Stemming is a simpler and faster approach, suitable for applications like search engines or text mining, where the exact base form of words is not critical. Lemmatization provides more accurate results by preserving the meaning of words and is commonly used in applications like text classification or sentiment analysis.</p>

-  In the Python programming language, the NLTK (Natural Language Toolkit) library provides functionality for stemming and lemmatization. 
- It includes classes such as PorterStemmer for stemming and WordNetLemmatizer for lemmatization.



In [2]:
import nltk

In [3]:
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to C:\Users\MUHAMMED AJMAL
[nltk_data]     G\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [6]:
from nltk.stem import PorterStemmer, WordNetLemmatizer

In [7]:
stemmed_text = PorterStemmer()
lemmed_text = WordNetLemmatizer()

In [8]:
text = ['I', 'love', 'natural', 'language', 'processing', '!']
type(text)

list

In [9]:
for i in text:
    print(i, '==>',stemmed_text.stem(i))
    

I ==> i
love ==> love
natural ==> natur
language ==> languag
processing ==> process
! ==> !


In [11]:
for i in text:
    print(i, '==>',lemmed_text.lemmatize(i))
    

I ==> I
love ==> love
natural ==> natural
language ==> language
processing ==> processing
! ==> !


In [12]:
stemmed_list = []
lemmed_list = []
for i in text:
    stemmed_list.append(stemmed_text.stem(i))
    lemmed_list.append(lemmed_text.lemmatize(i))
    

In [13]:
display(stemmed_list)
display(lemmed_list)

['i', 'love', 'natur', 'languag', 'process', '!']

['I', 'love', 'natural', 'language', 'processing', '!']

In [14]:
import pandas as pd

In [15]:
df = pd.DataFrame(zip(text, stemmed_list, lemmed_list), columns=['Orginal_text', 'Stemmed_text', 'Lemmatized_text'])

In [16]:
df

Unnamed: 0,Orginal_text,Stemmed_text,Lemmatized_text
0,I,i,I
1,love,love,love
2,natural,natur,natural
3,language,languag,language
4,processing,process,processing
5,!,!,!
