### Code Demo for Multiple Stemmers 

#### Snowball or Porter2 Stemmer

The Snowball Stemmer is also called the Porter2 Stemmer as itâ€™s a modification over the original Porter algorithm. Snowball stemmer has improved rules for greater accuracy and consistency. It can be used across multiple languages, avoiding over- and under-stemming. 

Below is its a code demo, in which we will try to demonstrate the difference between Porter and Snowball stemmers.

In [5]:
# Import necessary libraries
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer, SnowballStemmer
import nltk

# Download usual nltk resources.
nltk.download('punkt')

# Initialize the stemmers.
porter_stemmer = PorterStemmer()
snowball_stemmer = SnowballStemmer("english")

# Create the sample text.
text = "In the summer, the researchers were analyzing various hypotheses about the future development."

# Tokenize the sample text.
tokens = word_tokenize(text)

# Apply stemmers.
porter_tokens = [porter_stemmer.stem(token) for token in tokens]
snowball_tokens = [snowball_stemmer.stem(token) for token in tokens]

# Display results
print("Original Tokens: ", tokens)
print("Porter Stemmed Tokens: ", porter_tokens)
print("Snowball Stemmed Tokens: ", snowball_tokens)

Original Tokens:  ['In', 'the', 'summer', ',', 'the', 'researchers', 'were', 'analyzing', 'various', 'hypotheses', 'about', 'the', 'future', 'development', '.']
Porter Stemmed Tokens:  ['in', 'the', 'summer', ',', 'the', 'research', 'were', 'analyz', 'variou', 'hypothes', 'about', 'the', 'futur', 'develop', '.']
Snowball Stemmed Tokens:  ['in', 'the', 'summer', ',', 'the', 'research', 'were', 'analyz', 'various', 'hypothes', 'about', 'the', 'futur', 'develop', '.']


[nltk_data] Downloading package punkt to C:\Users\Shailendra
[nltk_data]     Kadre\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


- Please note: From the original sentence, for the word "various", Snowball Stemmer it as "various", which is better keeps its meaning. Porter stemmer, on the other hand, reduces the same word as 'variou'.

#### Lancaster Stemmer

The Lancaster Stemmer one of the most aggressive stemming approach because it works on a set of straightforward, recursive rules to cut the input words to their root forms. It is predominantly useful in NLP applications requiring a high level of generalization. For this purpose, it can sacrifice some precision by over-stemming. Applications like Topic Modelling, focus is more on broad themes rather than exact word forms. In such cases, the Lancaster Stemmer is applied to generalise terms to their root forms to help identifying common topics. Below is its code demo along with Porter Stemmer. 

In [7]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer, LancasterStemmer

# Download resources.
nltk.download('punkt')

# Initialize both the stemmers.
porter_stemmer = PorterStemmer()
lancaster_stemmer = LancasterStemmer()

# Sample text.
sample_text = "The scientific methodologies used by researchers were aimed at improving accuracy in predictive analytics."

# Tokenize the text
tokens = word_tokenize(sample_text)

# Apply Porter Stemmer.
porter_tokens = [porter_stemmer.stem(token) for token in tokens]

# Apply Lancaster Stemmer.
lancaster_tokens = [lancaster_stemmer.stem(token) for token in tokens]

# Display results
print("Original Tokens: ", tokens)
print("Porter Stemmed Tokens: ", porter_tokens)
print("Lancaster Stemmed Tokens: ", lancaster_tokens)

Original Tokens:  ['The', 'scientific', 'methodologies', 'used', 'by', 'researchers', 'were', 'aimed', 'at', 'improving', 'accuracy', 'in', 'predictive', 'analytics', '.']
Porter Stemmed Tokens:  ['the', 'scientif', 'methodolog', 'use', 'by', 'research', 'were', 'aim', 'at', 'improv', 'accuraci', 'in', 'predict', 'analyt', '.']
Lancaster Stemmed Tokens:  ['the', 'sci', 'methodolog', 'us', 'by', 'research', 'wer', 'aim', 'at', 'improv', 'acc', 'in', 'predict', 'analys', '.']


[nltk_data] Downloading package punkt to C:\Users\Shailendra
[nltk_data]     Kadre\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


- Porter stemmer Reduces 'were' to 'were' only. It preservs more of the original word form.
- At the same time Lancaster stemmer applies a more aggressive reduction to other words like "were" to "wer."
- Same difference you can notice for 'accuracy.' Porter stemmer reduces it to 'accuraci' while Lancaster stemmer makes it 'acc.'

Code Snippet 5.2