## **What is Stemming?**
Stemming is a process in natural language processing where we reduce a word to its base or root form. 
For example, the words "playing", "played", and "plays" can all be reduced to the root word "play".
Stemming helps computers understand that these different forms of a word have a similar meaning.
- Sentiment Analysis-- stemming


 ### **Popular Stemming Algorithms in NLTK**

- **PorterStemmer**:  
   One of the most widely used stemming algorithms. It uses a set of rules to iteratively remove common suffixes from English words. The Porter Stemmer is known for its simplicity and effectiveness, but sometimes it can be aggressive and may not always produce real words.
 
- **LancasterStemmer**:  
   A more aggressive stemmer compared to Porter. It applies a larger set of rules and can reduce words to very short stems, sometimes at the cost of accuracy. It is faster but may over-stem words, resulting in stems that are not always meaningful.
 
- **RegexpStemmer**:  
   This stemmer uses regular expressions to identify and remove suffixes from words. It is highly customizable, allowing users to define their own stemming rules using regex patterns. It is useful for specific use cases where custom stemming is required.
 
 - **SnowballStemmer**:  
   Also known as the "English Stemmer," it is an improvement over the Porter Stemmer. Snowball supports multiple languages and provides a balance between accuracy and aggressiveness. It is more flexible and often preferred for multilingual stemming tasks.


In [3]:
#random words for checking the stemmer
words = ["eating", "eats", "eaten", "eat","historian","history","historical","historically","fruits","fruit","fruitful","fruitfulness","fruitful"]

In [5]:
from nltk.stem import PorterStemmer
porter_stemmer = PorterStemmer()

for word in words:
    print(f"{word} ---> {porter_stemmer.stem(word)}")

eating ---> eat
eats ---> eat
eaten ---> eaten
eat ---> eat
historian ---> historian
history ---> histori
historical ---> histor
historically ---> histor
fruits ---> fruit
fruit ---> fruit
fruitful ---> fruit
fruitfulness ---> fruit
fruitful ---> fruit


In [9]:
from nltk.stem import LancasterStemmer
lancaster = LancasterStemmer()

for word in words:
    print(f"{word} ---> {lancaster.stem(word)}")

eating ---> eat
eats ---> eat
eaten ---> eat
eat ---> eat
historian ---> hist
history ---> hist
historical ---> hist
historically ---> hist
fruits ---> fruit
fruit ---> fruit
fruitful ---> fruit
fruitfulness ---> fruit
fruitful ---> fruit


In [13]:
from nltk.stem import RegexpStemmer
regex = RegexpStemmer('ing|s$|en$|ally$|fulness$|ful$', min=6)

for word in words:
    print(f"{word} ---> {regex.stem(word)}")

eating ---> eat
eats ---> eats
eaten ---> eaten
eat ---> eat
historian ---> historian
history ---> history
historical ---> historical
historically ---> historic
fruits ---> fruit
fruit ---> fruit
fruitful ---> fruit
fruitfulness ---> fruit
fruitful ---> fruit


In [14]:
from nltk.stem import SnowballStemmer
snowball = SnowballStemmer('english' ,  ignore_stopwords = False)

for word in words:
    print(f"{word} ---> {regex.stem(word)}")

eating ---> eat
eats ---> eats
eaten ---> eaten
eat ---> eat
historian ---> historian
history ---> history
historical ---> historical
historically ---> historic
fruits ---> fruit
fruit ---> fruit
fruitful ---> fruit
fruitfulness ---> fruit
fruitful ---> fruit


## Wordnet Lemmatizer
Lemmatization technique is like stemming. The output we will get after lemmatization is called ‘lemma’, which is a root word rather than root stem, the output of stemming. After lemmatization, we will be getting a valid word that means the same thing.

NLTK provides WordNetLemmatizer class which is a thin wrapper around the wordnet corpus. This class uses morphy() function to the WordNet CorpusReader class to find a lemma.

- Chatbot---lemmatization

In [1]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Zainab\AppData\Roaming\nltk_data...


True

In [4]:
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

'''
POS- Noun-n
verb-v
adjective-a
adverb-r
'''
for word in words:
    print(f"{word} --> {lemmatizer.lemmatize(word, pos='v')}")

eating --> eat
eats --> eat
eaten --> eat
eat --> eat
historian --> historian
history --> history
historical --> historical
historically --> historically
fruits --> fruit
fruit --> fruit
fruitful --> fruitful
fruitfulness --> fruitfulness
fruitful --> fruitful
