# Stop Words
Words like "a" and "the" appear so frequently that they don't require tagging as thoroughly as nouns, verbs and modifiers. We call these *stop words*, and they can be filtered from the text to be processed. spaCy holds a built-in list of some 305 English stop words.

In [1]:
# Perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')

2023-05-31 16:22:42.517803: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-31 16:22:45.199366: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-05-31 16:22:45.230853: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-05-

In [2]:
# Print the set of spaCy's default stop words (remember that sets are unordered):
print(nlp.Defaults.stop_words)

{'via', 'towards', '‘re', 'side', 'throughout', 'above', 'then', 'after', 'rather', 'us', 'amongst', 'any', 'being', 'just', 'third', 'myself', 'together', 'his', 'anyone', "n't", 'but', 'nine', "'m", 'and', 'becoming', 'him', '‘ve', 'forty', 'around', 'been', 'front', 'full', 'over', 'give', 'my', 'name', "'ve", 'other', 'ca', 'so', 'at', 'seems', 'very', 'well', 'wherever', 'whence', 'some', '‘s', 'back', 'toward', 'also', 'this', 'whereafter', 'whereupon', 'every', 'everything', 'bottom', 'because', 'elsewhere', 'since', 'once', 'please', 'nowhere', 'namely', 'behind', 'where', 'two', 'most', 'per', 'last', 'what', 'only', 'herself', 'somewhere', 'seeming', '’ve', 'it', 'ourselves', 'while', 'get', 'further', 'perhaps', 'ten', 'your', '‘ll', 'almost', 'do', 'the', 'go', "'s", 'done', 'mine', 'of', 'another', 'made', 'six', 'we', 'her', 'too', 'hundred', 'does', 'is', 'had', 'afterwards', 'seem', 'besides', 'someone', 'whose', 'however', 'enough', 'eight', 'from', 'nobody', 'to', 'wh

In [3]:
len(nlp.Defaults.stop_words)

326

## To see if a word is a stop word

In [4]:
nlp.vocab['myself'].is_stop

True

In [5]:
nlp.vocab['mystery'].is_stop

False

## To add a stop word
There may be times when you wish to add a stop word to the default set. Perhaps you decide that `'btw'` (common shorthand for "by the way") should be considered a stop word.

In [6]:
# Add the word to the set of stop words. Use lowercase!
nlp.Defaults.stop_words.add('btw')

# Set the stop_word tag on the lexeme
nlp.vocab['btw'].is_stop = True

In [7]:
len(nlp.Defaults.stop_words)

327

In [8]:
nlp.vocab['btw'].is_stop

True

<font color=green>When adding stop words, always use lowercase. Lexemes are converted to lowercase before being added to **vocab**.</font>

## To remove a stop word
Alternatively, you may decide that `'however'` should not be considered a stop word.

In [9]:
# Remove the word from the set of stop words
nlp.Defaults.stop_words.remove('however')

# Remove the stop_word tag from the lexeme
nlp.vocab['however'].is_stop = False

In [10]:
len(nlp.Defaults.stop_words)

326

In [11]:
nlp.vocab['however'].is_stop

False

Great! Now you should be able to access spaCy's default set of stop words, and add or remove stop words as needed.
## Next up: Vocabulary and Matching