# Stop Words
Words like "a" and "the" appear so frequently that they don't require tagging as thoroughly as nouns, verbs and modifiers. We call these *stop words*, and they can be filtered from the text to be processed. spaCy holds a built-in list of some 305 English stop words.

In [1]:
# Perform standard imports:
import spacy

In [2]:
print("spacy: ", spacy.__version__)

spacy:  3.5.0


In [3]:
nlp = spacy.load(name='en_core_web_sm')

#### we can also handle stopwords using <b>NLTK</b> library

In [4]:
# Print the set of spaCy's default stop words (remember that sets are unordered):

print(nlp.Defaults.stop_words)

{'four', 'thus', 'was', 'bottom', 'whereas', 'unless', 'up', 'will', 'name', 'until', 'you', 'toward', 'further', 'say', 'everyone', 'done', 'ca', 'been', "'d", "n't", 'twenty', 'his', 'anywhere', 'where', 'them', 'or', 'every', 'enough', 'such', 'within', 'i', 'and', 'next', 'what', 'becomes', 'go', 'themselves', 'is', '’ve', '’re', 'see', 'five', 'around', 'seems', 'put', 'yours', 'her', 'otherwise', '’d', 'own', 'those', 'the', 'thereafter', 'this', 'did', 'except', 'sixty', 'about', 'eleven', 'below', 'down', 'as', 'might', 'under', 'but', 'n‘t', 'already', 'he', 'whole', "'ve", 'somehow', 'during', 'became', 'two', 'back', 'just', 'only', 're', 'seemed', 'more', 'since', 'in', 'call', 'ourselves', 'wherein', 'off', 'somewhere', 'fifteen', 'often', 'whereupon', 'may', 'hundred', 'everything', 'beyond', 'these', 'nevertheless', 'on', 'become', 'everywhere', 'anyone', 'our', 'three', 'which', 'against', 'becoming', 'could', 'while', 'that', "'ll", 'has', 'behind', 'former', 'either',

In [5]:
len(nlp.Defaults.stop_words)

326

In [6]:
type(nlp.Defaults.stop_words)

set

## To see if a word is a stop word

In [7]:
nlp.vocab['myself'].is_stop

True

In [8]:
nlp.vocab['mystery'].is_stop

False

In [9]:
'is' in nlp.Defaults.stop_words

True

## To add a stop word
There may be times when you wish to add a stop word to the default set. Perhaps you decide that `'btw'` (common shorthand for "by the way") should be considered a stop word.

In [10]:
# Add the word to the set of stop words. Use lowercase!

nlp.Defaults.stop_words.add('btw')

# Set the stop_word tag on the lexeme

nlp.vocab['btw'].is_stop = True

In [11]:
len(nlp.Defaults.stop_words)

327

In [12]:
nlp.vocab['btw'].is_stop

True

<font color=green>When adding stop words, always use lowercase. Lexemes are converted to lowercase before being added to **vocab**.</font>

## To remove a stop word
Alternatively, you may decide that `'beyond'` should not be considered a stop word.

In [13]:
# Remove the word from the set of stop words
nlp.Defaults.stop_words.remove('beyond')

# Remove the stop_word tag from the lexeme
nlp.vocab['beyond'].is_stop = False

In [14]:
len(nlp.Defaults.stop_words)

326

In [15]:
nlp.vocab['beyond'].is_stop

False

Great! Now you should be able to access spaCy's default set of stop words, and add or remove stop words as needed.
## Next up: Vocabulary and Matching