In [16]:
from nltk.stem import PorterStemmer


In [17]:
paragraph = '''
Stemming in NLTK (Natural Language Toolkit) is a vital technique used in natural language processing to reduce words to their base or root form. By stripping suffixes and prefixes, stemming helps to unify different forms of a word, allowing models to treat them as equivalent. For example, the words "running," "runner," and "ran" may all be stemmed to "run." This normalization process is crucial for tasks like text classification, sentiment analysis, and information retrieval, as it reduces dimensionality and enhances the model's ability to recognize related terms.

In NLTK, stemming can be easily implemented using classes like PorterStemmer or LancasterStemmer. By applying these stemmers to a corpus of text, you can preprocess your data to improve the performance of machine learning algorithms. The effectiveness of stemming is particularly evident in applications such as search engines, where users might input variations of a word. By stemming the query and the documents in the database, the system can return more relevant results, ensuring that variations of a word lead to comprehensive matches.'''

In [18]:
from nltk.corpus import stopwords

In [19]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\ambig\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [20]:
stopwords.words('english')

['i',
 'me',
 'my',
 'myself',
 'we',
 'our',
 'ours',
 'ourselves',
 'you',
 "you're",
 "you've",
 "you'll",
 "you'd",
 'your',
 'yours',
 'yourself',
 'yourselves',
 'he',
 'him',
 'his',
 'himself',
 'she',
 "she's",
 'her',
 'hers',
 'herself',
 'it',
 "it's",
 'its',
 'itself',
 'they',
 'them',
 'their',
 'theirs',
 'themselves',
 'what',
 'which',
 'who',
 'whom',
 'this',
 'that',
 "that'll",
 'these',
 'those',
 'am',
 'is',
 'are',
 'was',
 'were',
 'be',
 'been',
 'being',
 'have',
 'has',
 'had',
 'having',
 'do',
 'does',
 'did',
 'doing',
 'a',
 'an',
 'the',
 'and',
 'but',
 'if',
 'or',
 'because',
 'as',
 'until',
 'while',
 'of',
 'at',
 'by',
 'for',
 'with',
 'about',
 'against',
 'between',
 'into',
 'through',
 'during',
 'before',
 'after',
 'above',
 'below',
 'to',
 'from',
 'up',
 'down',
 'in',
 'out',
 'on',
 'off',
 'over',
 'under',
 'again',
 'further',
 'then',
 'once',
 'here',
 'there',
 'when',
 'where',
 'why',
 'how',
 'all',
 'any',
 'both',
 'each

In [21]:
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()

In [22]:
sentences= nltk.sent_tokenize(paragraph)

In [23]:
for i in range(len(sentences)):
    words = nltk.word_tokenize(sentences[i])
    words = [stemmer.stem(word) for word in words if word.lower() not in set(stopwords.words('english'))]
    sentences[i] = ' '.join(words)

In [24]:
sentences

['stem nltk ( natur languag toolkit ) vital techniqu use natur languag process reduc word base root form .',
 'strip suffix prefix , stem help unifi differ form word , allow model treat equival .',
 "exampl , word `` run , '' `` runner , '' `` ran '' may stem `` run . ''",
 "normal process crucial task like text classif , sentiment analysi , inform retriev , reduc dimension enhanc model 's abil recogn relat term .",
 'nltk , stem easili implement use class like porterstemm lancasterstemm .',
 'appli stemmer corpu text , preprocess data improv perform machin learn algorithm .',
 'effect stem particularli evid applic search engin , user might input variat word .',
 'stem queri document databas , system return relev result , ensur variat word lead comprehens match .']

In [25]:
from nltk.stem import SnowballStemmer
snowballstemmer = SnowballStemmer('english')

In [26]:
for i in range(len(sentences)):
    words = nltk.word_tokenize(sentences[i])
    words = [snowballstemmer.stem(word) for word in words if word.lower() not in set(stopwords.words('english'))]
    sentences[i] = ' '.join(words)

In [27]:
sentences


['stem nltk ( natur languag toolkit ) vital techniqu use natur languag process reduc word base root form .',
 'strip suffix prefix , stem help unifi differ form word , allow model treat equiv .',
 'exampl , word `` run , `` `` runner , `` `` ran `` may stem `` run . ``',
 "normal process crucial task like text classif , sentiment analysi , inform retriev , reduc dimens enhanc model 's abil recogn relat term .",
 'nltk , stem easili implement use class like porterstemm lancasterstemm .',
 'appli stemmer corpu text , preprocess data improv perform machin learn algorithm .',
 'effect stem particular evid applic search engin , user might input variat word .',
 'stem queri document databa , system return relev result , ensur variat word lead comprehen match .']