# Stop Words
Words like "a" and "the" appear so frequently that they don't require tagging as thoroughly as nouns, verbs and modifiers. We call these *stop words*, and they can be filtered from the text to be processed. spaCy holds a built-in list of some 305 English stop words.

In [1]:
# Perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')

In [2]:
# Print the set of spaCy's default stop words (remember that sets are unordered):
print(nlp.Defaults.stop_words)

{'ca', 'just', 'itself', 'own', 'an', 'n’t', 'together', '‘d', 'which', 'have', 'where', 'may', 'herein', '’d', 'myself', 'anyhow', 'thru', 'must', "'m", '’s', 'sometime', 'my', 'almost', 'whether', 'someone', 're', 'hereby', 'himself', 'whereas', 'much', 'what', 'to', 'become', '‘ve', 'us', 'top', 'less', 'had', 'ours', 'via', 'made', 'seeming', 'below', 'third', "n't", 'various', 'towards', 'everyone', 'four', 'with', 'front', 'therefore', 'am', 'else', 'beforehand', 'beside', 'i', 'as', 'twelve', '’ve', 'both', 'will', 'this', 'due', '’ll', 'becoming', 'does', 'hers', 'anyway', 'it', 'if', 'any', 'hereupon', 'sometimes', 'out', 'neither', 'everywhere', 'except', 'your', '‘m', "'re", 'hence', 'many', 'becomes', 'others', 'because', 'can', 'thereafter', 'back', 'whom', 'amount', 'off', 'serious', 'however', 'whereupon', 'anyone', 'until', 'whence', 'a', 'first', 'two', 'something', "'ve", 'seem', 'side', 'me', 'nevertheless', 'while', 'about', 'latter', 'mostly', 'move', 'yours', "'d"

In [3]:
print(len(nlp.Defaults.stop_words))

326


## To see if a word is a stop word

In [4]:
nlp.vocab['gaming'].is_stop

False

In [5]:
nlp.vocab['the'].is_stop

True

## Adding a Custom Stop Word
There may be times when you wish to add a stop word to the default set. Perhaps you decide that `'btw'` (common shorthand for "by the way") should be considered a stop word.

In [6]:
# Add the word to the set of stop words. Use lowercase!
nlp.Defaults.stop_words.add('btw')

# Set the stop_word tag on the lexeme
nlp.vocab['btw'].is_stop = True

In [7]:
len(nlp.Defaults.stop_words)

327

In [8]:
nlp.vocab['btw'].is_stop

True

<font color=lightgreen>When adding stop words, always use lowercase. Lexemes are converted to lowercase before being added to **vocab**.</font>

## Removing a word from the stop words list
Alternatively, you may decide that `'beyond'` should not be considered a stop word.

In [9]:
# Remove the word from the set of stop words
nlp.Defaults.stop_words.remove('beyond')

# Remove the stop_word tag from the lexeme
nlp.vocab['beyond'].is_stop = False

In [10]:
len(nlp.Defaults.stop_words)

326

In [11]:
nlp.vocab['beyond'].is_stop

False