# Stop Words
Words like "a" and "the" appear so frequently that they don't require tagging as thoroughly as nouns, verbs and modifiers. We call these *stop words*, and they can be filtered from the text to be processed. spaCy holds a built-in list of some 326 English stop words.

In [1]:
# Perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')

In [2]:
# Print the set of spaCy's default stop words (remember that sets are unordered):
print(nlp.Defaults.stop_words)

{'anyone', 'wherein', 'fifty', 'never', 'will', 'this', 'nothing', 'its', 'same', 'whether', 'during', 'full', 'nor', 'nevertheless', 'always', 'seeming', 'whole', 'empty', 'for', 'the', 'she', 'him', 'sometime', 'first', 'few', 'go', 'below', 'unless', 'behind', 'n’t', 'often', 'throughout', 'serious', 'and', 'however', '‘re', 'all', 'mostly', 'beforehand', 'each', 'thereupon', 'up', 'five', 'they', 'thereby', 'part', 'off', 'to', 'eight', 'eleven', 'side', 'has', 'us', 'except', '’re', 'many', 'toward', 'did', 'have', 'further', 'top', 'you', 'when', 'became', 'become', 'used', 'itself', 'through', 'towards', 'done', '‘s', 'anything', 'against', 'move', 'me', 'sixty', 'already', 'how', 'only', 'themselves', 'had', 'very', 'see', '’s', 'around', 'less', 'least', 'myself', 'together', 'nowhere', 'or', 'where', 'them', 'even', 'next', 'whence', 'with', 'other', "'d", 'any', 'ever', 'namely', 'moreover', 'former', 'indeed', 'amount', 'their', 'various', 'six', 'here', 'twelve', 'her', 'o

In [3]:
len(nlp.Defaults.stop_words)

326

## To see if a word is a stop word

In [4]:
nlp.vocab['than'].is_stop

True

In [5]:
nlp.vocab['btw'].is_stop

False

## To add a stop word
There may be times when you wish to add a stop word to the default set. Perhaps you decide that `'btw'` (common shorthand for "by the way") should be considered a stop word.

In [6]:
# Add the word to the set of stop words. Use lowercase!
nlp.Defaults.stop_words.add('btw')

# Set the stop_word tag on the lexeme
nlp.vocab['btw'].is_stop = True

In [7]:
nlp.vocab['btw'].is_stop

True

In [8]:
len(nlp.Defaults.stop_words)

327

<font color=green>When adding stop words, always use lowercase. Lexemes are converted to lowercase before being added to **vocab**.</font>

## To remove a stop word
Alternatively, you may decide that `'beyond'` should not be considered a stop word.

In [9]:
# Remove the word from the set of stop words
nlp.Defaults.stop_words.remove('moreover')
nlp.Defaults.stop_words.remove('does')

# Remove the stop_word tag
nlp.vocab['does'].is_stop = False
nlp.vocab['moreover'].is_stop = False

In [10]:
nlp.vocab['does'].is_stop

False

In [11]:
len(nlp.Defaults.stop_words)

325

In [12]:
nlp.vocab['moreover'].is_stop

False

In [13]:
nlp.Defaults.stop_words

{"'d",
 "'ll",
 "'m",
 "'re",
 "'s",
 "'ve",
 'a',
 'about',
 'above',
 'across',
 'after',
 'afterwards',
 'again',
 'against',
 'all',
 'almost',
 'alone',
 'along',
 'already',
 'also',
 'although',
 'always',
 'am',
 'among',
 'amongst',
 'amount',
 'an',
 'and',
 'another',
 'any',
 'anyhow',
 'anyone',
 'anything',
 'anyway',
 'anywhere',
 'are',
 'around',
 'as',
 'at',
 'back',
 'be',
 'became',
 'because',
 'become',
 'becomes',
 'becoming',
 'been',
 'before',
 'beforehand',
 'behind',
 'being',
 'below',
 'beside',
 'besides',
 'between',
 'beyond',
 'both',
 'bottom',
 'btw',
 'but',
 'by',
 'ca',
 'call',
 'can',
 'cannot',
 'could',
 'did',
 'do',
 'doing',
 'done',
 'down',
 'due',
 'during',
 'each',
 'eight',
 'either',
 'eleven',
 'else',
 'elsewhere',
 'empty',
 'enough',
 'even',
 'ever',
 'every',
 'everyone',
 'everything',
 'everywhere',
 'except',
 'few',
 'fifteen',
 'fifty',
 'first',
 'five',
 'for',
 'former',
 'formerly',
 'forty',
 'four',
 'from',
 'front