# Stop Words
Words like "a" and "the" appear so frequently that they don't require tagging as thoroughly as nouns, verbs and modifiers. We call these *stop words*, and they can be filtered from the text to be processed. spaCy holds a built-in list of some 305 English stop words.

In [2]:
# Perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')

##### Print the set of spaCy's default stop words (remember that sets are unordered)

In [3]:
# Print the set of spaCy's default stop words (remember that sets are unordered):
print(nlp.Defaults.stop_words)

{'down', 'everyone', 'itself', 'ca', 'several', 'mine', 'me', 'an', 'themselves', 'done', 'somehow', 'fifty', 'take', 'whom', 'than', 'well', 'and', 'make', 'serious', 'beside', 'below', 'thus', 'until', 'fifteen', 'yet', 'via', 'get', 'whence', 'whether', 'any', 'very', 'above', 'here', 'ourselves', 'be', 'together', 'whenever', 'during', 'on', 'show', 'before', 'cannot', 'what', 'will', 'anyone', 'hereby', 'himself', 'where', 'yours', 'others', 'twelve', 'further', 'another', 'most', 'been', 'always', 'sometime', 'in', 'two', 'seeming', 'eleven', 'nine', 'someone', 'among', 'bottom', 'must', 'thru', 'toward', 'former', 'its', 'beforehand', 'latter', 'such', 'thereupon', 'keep', 'whither', 'through', 'they', 'whoever', 'if', 'part', 'nevertheless', 'already', 'some', 'so', 'being', 'go', 'as', 'still', 'towards', 'move', 'were', 'almost', 'empty', 'there', 'give', 'out', 'besides', 'these', 'three', 'around', 'which', 'thence', 'put', 'anyhow', 'it', 'by', 'we', 'back', 'become', 'for

In [4]:
len(nlp.Defaults.stop_words)

305

## To see if a word is a stop word

In [5]:
nlp.vocab['myself'].is_stop

True

In [6]:
nlp.vocab['mystery'].is_stop

False

## To add a stop word
There may be times when you wish to add a stop word to the default set. Perhaps you decide that `'btw'` (common shorthand for "by the way") should be considered a stop word.

In [7]:
nlp.vocab['btw'].is_stop

False

In [8]:
# Add the word to the set of stop words. Use lowercase!
nlp.Defaults.stop_words.add('btw')

# Set the stop_word tag on the lexeme
nlp.vocab['btw'].is_stop = True

In [9]:
len(nlp.Defaults.stop_words)

306

In [10]:
nlp.vocab['btw'].is_stop

True

<font color=green>When adding stop words, always use lowercase. Lexemes are converted to lowercase before being added to **vocab**.</font>

## To remove a stop word
Alternatively, you may decide that `'beyond'` should not be considered a stop word.

In [12]:
# Remove the word from the set of stop words
nlp.Defaults.stop_words.remove('beyond')

# Remove the stop_word tag from the lexeme
nlp.vocab['beyond'].is_stop = False

In [13]:
len(nlp.Defaults.stop_words)

305

In [14]:
nlp.vocab['beyond'].is_stop

False