__Stemming in NLP__

Stemming is a widely used technique in Natural Language Processing (NLP) that is used to reduce a word to its base or root form. The goal of stemming is to reduce a word to its base form so that similar words can be treated as the same word. This is particularly useful when we want to perform text analysis or search through large amounts of text data.

There are various algorithms used for stemming, such as the Porter stemming algorithm, the Snowball stemming algorithm, and the Lancaster stemming algorithm. These algorithms work by applying a set of rules to the word in order to remove any suffixes or prefixes and bring the word to its base form.

For example, the word "jumping" would be stemmed to "jump" using the Porter stemming algorithm. Similarly, the word "amazing" would be stemmed to "amaz" using the same algorithm.

While stemming can be a useful technique in NLP, it is not always accurate. For example, the word "run" could be stemmed to "run" or "ran" depending on the context, and the algorithm may not always be able to accurately determine the appropriate base form. This is where lemmatization, a more advanced technique, can be used instead of stemming.

Overall, stemming is a useful technique in NLP for reducing words to their base forms and enabling more efficient text analysis and search.







### PorterStemmer

In [1]:
import nltk
from nltk.stem import PorterStemmer

# Instantiate the Porter stemmer
porter = PorterStemmer()

# Define a list of words to be stemmed
words = ['studies', 'studying', 'studied', 'studies', 'wolves', 'cats', 'dogs', 'running', 'runner', 'ran']

# Stem each word in the list
stemmed_words = [porter.stem(word) for word in words]

# Print the stemmed words
print(stemmed_words)


['studi', 'studi', 'studi', 'studi', 'wolv', 'cat', 'dog', 'run', 'runner', 'ran']


In [4]:
import nltk
from nltk.stem import PorterStemmer

# Instantiate the Porter stemmer
porter = PorterStemmer()

# Define a list of words to be stemmed
words = ['studies', 'studying', 'studied', 'studies', 'wolves', 'cats', 'dogs', 'running', 'runner', 'ran']

# Stem each word in the list
stemmed_words = [porter.stem(word) for word in words]

# Print the original words and their stemmed versions
for i in range(len(words)):
    print(words[i], '->', stemmed_words[i])

studies -> studi
studying -> studi
studied -> studi
studies -> studi
wolves -> wolv
cats -> cat
dogs -> dog
running -> run
runner -> runner
ran -> ran


### Snowball Stemming

In [2]:
import nltk
from nltk.stem import SnowballStemmer

# Instantiate the Snowball Stemmer
stemmer = SnowballStemmer(language='english')

# Define a list of words to stem
words = ['running', 'runs', 'runner', 'ran', 'am', 'are', 'is', 'was', 'were']

# Stem each word in the list using the Snowball Stemmer
stemmed_words = [stemmer.stem(word) for word in words]

# Print the original words and their stemmed versions
for i in range(len(words)):
    print(words[i], '->', stemmed_words[i])


running -> run
runs -> run
runner -> runner
ran -> ran
am -> am
are -> are
is -> is
was -> was
were -> were


In [3]:
import nltk
from nltk.stem import SnowballStemmer

# Instantiate the Snowball Stemmer
stemmer = SnowballStemmer(language='english')

# Define a list of words to stem
words = ['running', 'runs', 'runner', 'ran', 'am', 'are', 'is', 'was', 'were']

# Stem each word in the list using the Snowball Stemmer
stemmed_words = [stemmer.stem(word) for word in words]

# Print the original words and their stemmed versions
print(stemmed_words)


['run', 'run', 'runner', 'ran', 'am', 'are', 'is', 'was', 'were']
