# Stop Words
Words like "a" and "the" appear so frequently that they don't require tagging as thoroughly as nouns, verbs and modifiers. We call these *stop words*, and they can be filtered from the text to be processed. spaCy holds a built-in list of some 326 English stop words.

In [1]:
# Perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')

In [2]:
# Print the set of spaCy's default stop words (remember that sets are unordered):
print(nlp.Defaults.stop_words)

{'therefore', 'for', 'forty', 're', 'him', 'well', 'had', 'while', 'no', 'via', 'between', 'however', 'around', 'third', 'move', 'whenever', 'keep', 'anywhere', 'yourself', 'below', 'been', 'there', 'latter', 'thereupon', 'many', 'too', 'indeed', 'although', 'moreover', 'but', 'until', 'nor', 'becoming', 'though', 'what', 'yet', 'by', 'rather', 'hundred', "'s", 'others', 'per', 'mine', 'ever', 'nevertheless', 'afterwards', 'its', 'much', 'twelve', 'hereafter', 'as', '’d', 'anyway', 'hereby', 'those', 'thus', 'we', 'always', 'bottom', 'whom', 'less', '‘ve', 'itself', 'everywhere', 'further', 'please', 'whole', 'anyone', 'wherein', 'various', 'get', 'she', 'whereby', 'sometimes', 'behind', 'beforehand', 'three', '’s', 'besides', "'ve", 'again', '‘re', 'against', 'upon', 'if', 'hence', 'onto', 'using', 'few', '‘d', 'often', '’ll', 'go', 'only', 'whereas', 'unless', 'since', 'otherwise', 'throughout', 'above', 'take', 'due', 'they', 'would', 'someone', 'either', 'out', 'in', 'nine', 'meanw

In [3]:
len(nlp.Defaults.stop_words)

326

## To see if a word is a stop word

In [4]:
nlp.vocab['myself'].is_stop

True

In [5]:
nlp.vocab['mystery'].is_stop

False

## To add a stop word
There may be times when you wish to add a stop word to the default set. Perhaps you decide that `'btw'` (common shorthand for "by the way") should be considered a stop word.

In [6]:
# Add the word to the set of stop words. Use lowercase!
nlp.Defaults.stop_words.add('btw')

# Set the stop_word tag on the lexeme
nlp.vocab['btw'].is_stop = True

In [7]:
len(nlp.Defaults.stop_words)

327

In [8]:
nlp.vocab['btw'].is_stop

True

When adding stop words, always use lowercase. Lexemes are converted to lowercase before being added to vocab.

In [9]:
nlp.vocab['beyond'].is_stop

True

## To remove a stop word
Alternatively, you may decide that `'beyond'` should not be considered a stop word.

In [10]:
# Remove the word from the set of stop words
nlp.Defaults.stop_words.remove('beyond')

# Remove the stop_word tag from the lexeme
nlp.vocab['beyond'].is_stop = False

In [11]:
len(nlp.Defaults.stop_words)

326