<h1 style="text-align: center;">Handling Stopwords</h1>

### Context

- [Porter Stemmer](#Porter-Stemmer)
- [Snowball Stemmer](#Snowball-Stemmer)
- [Word Net Lemmatizer](#Word-Net-Lemmatizer)

In [1]:
import nltk
from nltk.corpus import stopwords

In [2]:
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to C:\Users\Track
[nltk_data]     Computers\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [3]:
paragraph = """
    Your time is limited, so don’t waste it living someone else’s life. 
    Don’t be trapped by dogma — which is living with the results of other people’s thinking. 
    Don’t let the noise of others’ opinions drown out your own inner voice. 
    And most important, have the courage to follow your heart and intuition. 
    They somehow already know what you truly want to become. 
    Everything else is secondary...Stay Hungry. Stay Foolish.
"""

In [4]:
lang = 'english'
stopwords_list = stopwords.words(lang)
print(f"Total {lang.capitalize()} Stopwords: {len(stopwords_list)}\n\n{stopwords_list}")

Total English Stopwords: 179

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 

### Porter Stemmer

In [5]:
from nltk.stem import PorterStemmer

In [6]:
stemmer = PorterStemmer()

In [7]:
sentences = nltk.sent_tokenize(paragraph)
sentences

['\n    Your time is limited, so don’t waste it living someone else’s life.',
 'Don’t be trapped by dogma — which is living with the results of other people’s thinking.',
 'Don’t let the noise of others’ opinions drown out your own inner voice.',
 'And most important, have the courage to follow your heart and intuition.',
 'They somehow already know what you truly want to become.',
 'Everything else is secondary...Stay Hungry.',
 'Stay Foolish.']

In [8]:
for i in range(len(sentences)):
    words = nltk.word_tokenize(sentences[i])
    words = [stemmer.stem(word) for word in words if word not in set(stopwords.words('english'))]
    sentences[i] = ' '.join(words)

In [9]:
sentences

['your time limit , ’ wast live someon els ’ life .',
 'don ’ trap dogma — live result peopl ’ think .',
 'don ’ let nois other ’ opinion drown inner voic .',
 'and import , courag follow heart intuit .',
 'they somehow alreadi know truli want becom .',
 'everyth els secondari ... stay hungri .',
 'stay foolish .']

### Snowball Stemmer

In [10]:
from nltk.stem import SnowballStemmer

In [11]:
stemmer = SnowballStemmer('english')

In [12]:
sentences = nltk.sent_tokenize(paragraph)

for i in range(len(sentences)):
    words = nltk.word_tokenize(sentences[i])
    words = [stemmer.stem(word) for word in words if word not in set(stopwords.words('english'))]
    sentences[i] = ' '.join(words)

In [13]:
sentences

['your time limit , ’ wast live someon els ’ life .',
 'don ’ trap dogma — live result peopl ’ think .',
 'don ’ let nois other ’ opinion drown inner voic .',
 'and import , courag follow heart intuit .',
 'they somehow alreadi know truli want becom .',
 'everyth els secondari ... stay hungri .',
 'stay foolish .']

### Word Net Lemmatizer

In [14]:
from nltk.stem import WordNetLemmatizer

In [15]:
lemmatizer = WordNetLemmatizer()

In [16]:
sentences = nltk.sent_tokenize(paragraph)

for i in range(len(sentences)):
    words = nltk.word_tokenize(sentences[i])
    words = [lemmatizer.lemmatize(word.lower(), pos='v') for word in words if word not in set(stopwords.words('english'))]
    sentences[i] = ' '.join(words)

In [17]:
sentences

['your time limit , ’ waste live someone else ’ life .',
 'don ’ trap dogma — live result people ’ think .',
 'don ’ let noise others ’ opinions drown inner voice .',
 'and important , courage follow heart intuition .',
 'they somehow already know truly want become .',
 'everything else secondary ... stay hungry .',
 'stay foolish .']