## Stemming and lemmatization are techniques in natural language processing (NLP) used to reduce words to their base or root forms. 

### Example: "Running," "runs," and "ran" would all be stemmed to --> "run". 
### Approach: Uses predefined rules to remove suffixes or prefixes. 

Stemming reduces a word to its root form by chopping off suffixes (e.g., "running" → "run"), while lemmatization returns the base form of a word using linguistic rules (e.g., "running" → "run" and "better" → "good").


![alt text](image.png) 
source: https://images.app.goo.gl/WRoCZruJLRuCBqqq5

### Stemming

In [1]:
from nltk.stem import PorterStemmer

In [2]:
porter = PorterStemmer()

In [3]:
print(porter.stem("Replacement"))
print(porter.stem("Replaced"))
print(porter.stem("Replacing"))

replac
replac
replac


In [4]:
print(porter.stem("Walked"))
print(porter.stem("Walking"))
print(porter.stem("Walk"))


walk
walk
walk


## Lemmatization

In [6]:
import nltk
nltk.download('punkt')  # For downloading specific datasets

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Dips\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [7]:
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet

A corpus is a collection of written or spoken texts used for linguistic analysis, NLP tasks, or training language models.

Think of WordNet as a dictionary + thesaurus + relationship map all rolled into one. It's like a tool that organizes words by their meanings and shows how they are connected.

Example:
Take the word "dog":
- Similar words (synonyms): "canine", "pooch", "domestic dog"
- Bigger category (hypernym): "mammal" (a dog is a type of mammal)
- Smaller category (hyponym): "puppy" (a puppy is a type of dog)


It helps computers understand these relationships so they can process language better—for example, knowing that "dog" and "puppy" are closely related in meaning.


In [8]:
lemmatizer = WordNetLemmatizer()

In [11]:
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Dips\AppData\Roaming\nltk_data...


True

In [12]:
nltk.download('omw-1.4')

[nltk_data] Downloading package omw-1.4 to
[nltk_data]     C:\Users\Dips\AppData\Roaming\nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


True

In [13]:
lemmatizer.lemmatize("mice")

'mouse'

In [17]:
lemmatizer.lemmatize("Going")

'Going'

In [18]:
lemmatizer.lemmatize("Going", pos=wordnet.VERB)

'Going'

In [7]:
print(porter.stem("Replacement"))
print(porter.stem("Replaced"))
print(porter.stem("Replacing"))

replac
replac
replac


In [9]:
from nltk.stem import PorterStemmer, LancasterStemmer, SnowballStemmer

lancaster = LancasterStemmer()
porter = PorterStemmer()
snow = SnowballStemmer("english")

a = "hobby"
b = "hobbies"
c = "computation"
d = "computer"
e = "going"
f = "gone"

In [11]:
# Lancatser
print(lancaster.stem(a))
print(lancaster.stem(b))
print(lancaster.stem(c))
print(lancaster.stem(d))
print(lancaster.stem(e))
print(lancaster.stem(f))

hobby
hobby
comput
comput
going
gon


In [12]:
# PorterStemmer
print(porter.stem(a))
print(porter.stem(b))
print(porter.stem(c))
print(porter.stem(d))
print(porter.stem(e))
print(porter.stem(f))

hobbi
hobbi
comput
comput
go
gone


In [None]:
# SnowballStemmer
print(snow.stem(a))
print(snow.stem(b))
print(snow.stem(c))
print(snow.stem(d))
print(snow.stem(e))
print(snow.stem(f))

hobbi
hobbi
comput
comput
go
gone
