# Stop Words

**In natural language processing, useless words (data), are referred to as stop words.**

*we can recognize ourselves that some words carry more meaning than other words. We can also see that some words are just plain useless, and are filler words. We use them in the English language, for example, to sort of "fluff" up the sentence so it is not so strange sounding. An example of one of the most common, unofficial, useless words is the phrase "umm."*

*We would not want these words taking up space in our database, or taking up valuable processing time. As such, we call these words "stop words" because they are useless, and we wish to do nothing with them.*

In [1]:
from nltk.corpus import stopwords

In [2]:
# list of stopwords
stopWords = set(stopwords.words('english'))
print(stopWords)

{'hers', 'did', 'at', 'by', 'against', "isn't", 'mustn', "weren't", 'here', 'ourselves', 'weren', 'her', 'or', 'shan', 've', 'up', 'from', "you're", 'herself', "wasn't", 'yourselves', 'should', 'had', "aren't", 'more', 'then', 'yourself', 'shouldn', 'above', 's', 'has', 'into', 'it', "hasn't", 'off', 'can', 'own', 'over', 'been', 'all', 'be', "should've", "didn't", 'wasn', 'them', 'any', 'below', 'down', 'were', 'same', 're', 'and', 'where', 'because', 'my', 'wouldn', 'for', 'very', 'as', "hadn't", 'isn', "needn't", 'most', 'other', 'will', 'mightn', 'the', 'not', 'our', "couldn't", "you'd", "you'll", 'whom', 'both', 'again', 'between', 'so', "she's", 'their', "shouldn't", 'i', 'too', 'they', 'him', 'before', 'what', 'having', 'to', 'no', 'with', 'himself', 'o', "it's", 'that', 'about', 'during', 'm', 'do', 'after', 'don', 'themselves', "doesn't", "don't", 'she', 'was', 'until', 'have', 'few', 'hasn', "mustn't", 'd', 'further', "haven't", 'this', 'just', 'doing', 'needn', 'than', "woul

## remove the stop words from text

In [3]:
from nltk.tokenize import word_tokenize
Example = """Hello Mr. Maharshi. How are you? Mr. Narendra Modi & Mr. Donald Trump is waiting for you. When will you meet them? They want to collobrate with you for Technology. This is great chance. Let's take it"""

In [4]:
words = word_tokenize(Example)
filtered_words = []

In [5]:
for w in words:
    if w not in stopWords:
        filtered_words.append(w)

In [6]:
print('Before filtering')
print(words)

Before filtering
['Hello', 'Mr.', 'Maharshi', '.', 'How', 'are', 'you', '?', 'Mr.', 'Narendra', 'Modi', '&', 'Mr.', 'Donald', 'Trump', 'is', 'waiting', 'for', 'you', '.', 'When', 'will', 'you', 'meet', 'them', '?', 'They', 'want', 'to', 'collobrate', 'with', 'you', 'for', 'Technology', '.', 'This', 'is', 'great', 'chance', '.', 'Let', "'s", 'take', 'it']


In [7]:
print('After filtering')
print(filtered_words)

After filtering
['Hello', 'Mr.', 'Maharshi', '.', 'How', '?', 'Mr.', 'Narendra', 'Modi', '&', 'Mr.', 'Donald', 'Trump', 'waiting', '.', 'When', 'meet', '?', 'They', 'want', 'collobrate', 'Technology', '.', 'This', 'great', 'chance', '.', 'Let', "'s", 'take']
