# Stop Words
Words like "a" and "the" appear so **frequently** that they don't require tagging as thoroughly as nouns, verbs and modifiers. We call these `stop words`, and they can be **filtered** from the text to be processed. `spaCy` holds **a built-in list of some 305 English stop words.**

In [1]:
# Perform standard imports:
import spacy
nlp = spacy.load('en_core_web_md')

In [2]:
# Print the set of spaCy's default stop words (remember that sets are unordered):
print(nlp.Defaults.stop_words)

{'doing', 'ours', 'wherever', 'herein', 'side', 'anyone', 'about', 'full', 'their', 'always', 'others', 'already', 'between', 'in', 'me', 'her', 'further', 'thus', "n't", 'who', 'show', 'done', 'ever', 'anyway', 'then', 'been', 'twenty', 'whoever', 'yet', 'being', 'other', 'alone', 'move', 'never', 'at', 'per', 'had', 'indeed', 'most', 'himself', 'my', 'not', 'we', '‘ll', 'seems', 'more', 'n‘t', 'where', 'beforehand', 'third', 'unless', 'anyhow', 'another', 'else', 'after', 'seem', 'whereupon', 'bottom', 'cannot', 'get', 'again', 'by', 'during', 'into', 'sometime', 'therein', 'towards', 'every', 'front', 'much', 'through', 'will', 'everything', 'both', 'thru', 'mostly', 'ourselves', 'elsewhere', 'hereby', 'go', 'above', 'thence', 'which', 'an', 'regarding', "'ve", 'under', 'moreover', 'say', 'our', 'everyone', 'that', 'us', 'be', 'he', 'whither', 'nine', 'down', 'so', 'themselves', 'why', 'fifteen', 'take', 'would', 'somehow', 'if', 'have', 'as', 'neither', 'very', 'with', 'without', '

In [3]:
len(nlp.Defaults.stop_words)

326

## To see if a word is a stop word

It seems that there are some bugs within stop words of spacy under current version, just run the following two chunks code to manual set `is_stop` of STOP_WORDS to True

In [4]:
from spacy.lang.en.stop_words import STOP_WORDS

doc = nlp(u'The cat ran over the hill and to my lap')

for word in doc:
    print(f'{word} | {str(word) in STOP_WORDS}')

The | False
cat | False
ran | False
over | True
the | True
hill | False
and | True
to | True
my | True
lap | False


In [8]:
for word in STOP_WORDS:
    for w in (word, word[0].capitalize(), word.upper()):
        lex = nlp.vocab[w]
        lex.is_stop = True

In [5]:
nlp.vocab['myself'].is_stop

True

In [6]:
nlp.vocab['mystery'].is_stop

False

## To add a stop word
There may be times when you wish to add a stop word to the default set. Perhaps you decide that `'btw'` (common shorthand for "by the way") should be considered a stop word.

In [7]:
# Add the word to the set of stop words. Use lowercase!
nlp.Defaults.stop_words.add('btw')

# Set the stop_word tag on the lexeme
nlp.vocab['btw'].is_stop = True

In [8]:
len(nlp.Defaults.stop_words)

327

In [9]:
nlp.vocab['btw'].is_stop

True

<font color=green>When adding stop words, always use **lowercase**. Lexemes are converted to lowercase before being added to **vocab**.</font>

## To remove a stop word
Alternatively, you may decide that `'beyond'` should not be considered a stop word.

In [10]:
# Remove the word from the set of stop words
nlp.Defaults.stop_words.remove('beyond')

# Remove the stop_word tag from the lexeme
nlp.vocab['beyond'].is_stop = False

In [11]:
len(nlp.Defaults.stop_words)

326

In [12]:
nlp.vocab['beyond'].is_stop

False

Great! Now you should be able to access spaCy's default set of stop words, and add or remove stop words as needed.
## Next up: Vocabulary and Matching