# Stop Words
Words like "a" and "the" appear so frequently that they don't require tagging as thoroughly as nouns, verbs and modifiers. We call these *stop words*, and they can be filtered from the text to be processed. spaCy holds a built-in list of some 305 English stop words

In [1]:
import spacy
nlp = spacy.load('en_core_web_sm')

In [2]:
# Print the set of spaCy's default stop words (remember that sets are unordered):
print(nlp.Defaults.stop_words)

{'rather', 'seeming', 'two', 'therefore', 'whither', 'even', 'someone', 'from', 'last', 'as', 'empty', 'nowhere', 'through', 'using', 'them', 'anyway', 'hence', 'off', 'behind', 'whom', 'none', 'over', 'no', 'ca', 'your', 'please', 'into', 'make', 'i', 'where', 'should', 'something', 'sixty', 'very', 'more', 'neither', 'about', 'itself', 'therein', 'him', '’re', 'wherein', 'themselves', 'both', 'or', 'latterly', 'has', 'nothing', 'any', 'these', 'done', 'hereby', 'via', 'regarding', 'yours', 'n’t', 'bottom', 'he', 'take', 'ours', 'an', 'because', 're', 'top', 'back', 'three', '‘s', 'four', 'then', 'throughout', 'former', 'became', 'meanwhile', 'besides', 'but', 'have', "'s", 'it', 'five', 'against', 'without', 'already', 'just', 'several', 'ourselves', "'re", 'most', 'how', 'you', 'this', 'few', 'nobody', 'than', 'one', '‘m', 'which', 'fifty', 'was', 'down', 'who', 'we', 'himself', 'sometimes', 'can', 'quite', 'is', "'m", 'nevertheless', 'own', 'could', 'up', 'say', 'our', '’d', 'hereu

In [3]:
len(nlp.Defaults.stop_words)

326

## To see if a word is a stop word

In [5]:
nlp.vocab['myself'].is_stop

True

In [7]:
nlp.vocab['a'].is_stop

True

In [8]:
nlp.vocab['mystery'].is_stop

False

## To add a stop word
There may be times when you wish to add a stop word to the default set. Perhaps you decide that `'btw'` (common shorthand for "by the way") should be considered a stop word.

In [13]:
# nlp.Defaults.stop_words.add('btw')

# nlp.vocab['btw'].is_stop = True

In [14]:
nlp.vocab['btw'].is_stop

True

<font color=green>When adding stop words, always use lowercase. Lexemes are converted to lowercase before being added to **vocab**.</font>

## To remove a stop word
Alternatively, you may decide that `'beyond'` should not be considered a stop word.

In [15]:
# Remove the word from the set of stop words
nlp.Defaults.stop_words.remove('beyond')

# Remove the stop_word tag from the lexeme
nlp.vocab['beyond'].is_stop = False

In [16]:
len(nlp.Defaults.stop_words)

326

In [17]:
nlp.vocab['beyond'].is_stop

False