# <font color='blue'> Stop Words </font>

**A stop word is a commonly used word (such as “the”, “a”, “an”, “in”) that a search engine has been programmed to ignore, both when indexing entries for searching and when retrieving them as the result of a search query.** 

**Spacy has built in 305 words**

In [1]:
import spacy

In [2]:
nlp=spacy.load('en_core_web_sm')

In [3]:
print(nlp.Defaults.stop_words)

{'has', 'keep', 'first', 'you', 'can', 'about', 'further', 'rather', 'herself', 'sixty', 'only', 'nevertheless', 'still', 'might', 'anything', 'and', 'not', 'something', 'same', 'hers', 'out', 'herein', '‘d', 'eleven', 'put', 'them', 'in', 'part', 'her', 'sometimes', 're', 'these', "'s", 'then', 'how', 'each', 'even', 'serious', 'but', 'doing', '‘ve', 'because', 'yet', 'could', 'his', 'formerly', 'other', 'beyond', 'always', 'top', 'thereby', 'anywhere', 'yours', 'what', 'cannot', 'sometime', "'re", 'yourselves', '’d', 'six', 'below', 'well', 'whither', 'this', 'regarding', 'whenever', 'empty', 'among', 'get', 'may', 'least', 'via', 'thru', 'own', 'meanwhile', 'seemed', 'beforehand', 'take', 'former', 'she', "n't", 'moreover', 'him', 'whereby', 'nobody', 'everywhere', 'wherein', 'used', 'or', 'go', 'thus', 'one', 'i', 'unless', 'besides', 'front', 'forty', 'see', 'again', 'around', 'once', 'perhaps', 'before', 'being', 'thereupon', 'becoming', 'per', 'themselves', 'would', 'n’t', 'towa

**These are the list of all default stop words presnet**

In [4]:
len(nlp.Defaults.stop_words)

326

###  <font color='purple'>Method to check if a word is stop word or not</font>

In [5]:
nlp.vocab['is'].is_stop

True

In [6]:
nlp.vocab['GFG'].is_stop

False

In [7]:
nlp.vocab['hello'].is_stop

False

In [8]:
nlp.vocab['i.e'].is_stop

False

**To add stop words of your own**

suppose we want to add i.e to indicate ' that is '

In [9]:
nlp.Defaults.stop_words.add('i.e')

In [10]:
nlp.vocab['i.e'].is_stop = True

In [11]:
len(nlp.Defaults.stop_words)

327

Previously it was 326 but now it has become 327

In [12]:
nlp.vocab['i.e'].is_stop

True

Therefore i.e has been added to out stop words dictionary

**Lets see how you would remove a stop word**

In [13]:
nlp.vocab['done'].is_stop

True

In [14]:
nlp.Defaults.stop_words.remove('done')

In [15]:
nlp.vocab['done'].is_stop = False

In [16]:
nlp.vocab['done'].is_stop

False

Therefore done has been removed from the stop words category

## <font color='blue'> Removing stop words </font>

In [22]:
s='''
Data science is the study of data. Like biological sciences is a study of biology, physical sciences, it’s the study of physical reactions. Data is real, data has real properties, and we need to study them if we’re going to work on them. Data Science involves data and some signs. 

It is a process, not an event. It is the process of using data to understand too many different things, to understand the world. Let Suppose when you have a model or proposed explanation of a problem, and you try to validate that proposed explanation or model with your data. 

It is the skill of unfolding the insights and trends that are hiding (or abstract) behind data. It’s when you translate data into a story. So use storytelling to generate insight. And with these insights, you can make strategic choices for a company or an institution. 

We can also define data science as a field that is about processes and systems to extract data of various forms and from various resources whether the data is unstructured or structured. 
The definition and the name came up in the 1980s and 1990s when some professors, IT Professionals, scientists were looking into the statistics curriculum, and they thought it would be better to call it data science and then later on data analytics derived. 
'''

s=s.replace('\n',' ')
s=s.strip()
s

'Data science is the study of data. Like biological sciences is a study of biology, physical sciences, it’s the study of physical reactions. Data is real, data has real properties, and we need to study them if we’re going to work on them. Data Science involves data and some signs.   It is a process, not an event. It is the process of using data to understand too many different things, to understand the world. Let Suppose when you have a model or proposed explanation of a problem, and you try to validate that proposed explanation or model with your data.   It is the skill of unfolding the insights and trends that are hiding (or abstract) behind data. It’s when you translate data into a story. So use storytelling to generate insight. And with these insights, you can make strategic choices for a company or an institution.   We can also define data science as a field that is about processes and systems to extract data of various forms and from various resources whether the data is unstruct

In [23]:
s=nlp(s)

In [25]:
l=[]
for token in s:
    if token.is_stop:
        l.append(token)
print(l)

[is, the, of, is, a, of, it, ’s, the, of, is, has, and, we, to, them, if, we, ’re, to, on, them, and, some, It, is, a, not, an, It, is, the, of, using, to, too, many, to, the, when, you, have, a, or, of, a, and, you, to, that, or, with, your, It, is, the, of, the, and, that, are, or, behind, It, ’s, when, you, into, a, So, to, And, with, these, you, can, make, for, a, or, an, We, can, also, as, a, that, is, about, and, to, of, various, and, from, various, whether, the, is, or, The, and, the, name, up, in, the, and, when, some, IT, were, into, the, and, they, it, would, be, to, call, it, and, then, on]


**These are the list of all the stop words present in the text**

In [26]:
len(l)

125

**125 stop words are present**

## <font color='blue'>To get all the words which do not belong in the stop words category</font>

In [27]:
tokens=[token for token in s if not token.is_stop]

In [28]:
print(tokens)

[Data, science, study, data, ., Like, biological, sciences, study, biology, ,, physical, sciences, ,, study, physical, reactions, ., Data, real, ,, data, real, properties, ,, need, study, going, work, ., Data, Science, involves, data, signs, .,   , process, ,, event, ., process, data, understand, different, things, ,, understand, world, ., Let, Suppose, model, proposed, explanation, problem, ,, try, validate, proposed, explanation, model, data, .,   , skill, unfolding, insights, trends, hiding, (, abstract, ), data, ., translate, data, story, ., use, storytelling, generate, insight, ., insights, ,, strategic, choices, company, institution, .,   , define, data, science, field, processes, systems, extract, data, forms, resources, data, unstructured, structured, .,  , definition, came, 1980s, 1990s, professors, ,, Professionals, ,, scientists, looking, statistics, curriculum, ,, thought, better, data, science, later, data, analytics, derived, .]


In [29]:
len(tokens)

129