## **STOPWORDS**

---


### **What are stopwords?**

Stopwords are the most common words in any natural language. For the purpose of analyzing text data and building NLP models, these stopwords might not add much value to the meaning of the document.

---


### **Need of stopwords**

Tasks like text classification, where the text is to be classified into different categories, stopwords are removed from the given text so that more focus can be given to those words which define the meaning of the text.

By removing stopwords,dataset size decreases and the time to train the model also decreases.
Removing stopwords can help improve the performance as there are fewer and only meaningful tokens left. It could increase classification accuracy.

---

### **Methods to remove stopwords**

---

<hr>

### **1. Stopwords removal using NLTK**

<hr/>



In [None]:
import nltk
nltk.download('stopwords')
nltk.download('punkt')
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize 
set(stopwords.words('english'))


# sample sentence
text = "Robofied is a comprehensive Artificial Intelligence platform based in Gurugram,Haryana working towards democratizing safe artificial intelligence towards a common goal of Singularity. At Robofied, we are doing research in speech, natural language, and machine learning. We develop open-source solutions for developers which empowers them so that they can make better products for the world. We educate people about Artificial Intelligence, its scope and impact via resources and tutorials."

# set of stop words
stop_words = set(stopwords.words('english')) 

# tokens of words  
word_tokens = word_tokenize(text) 
    
new_sentence = [] 
  
for w in word_tokens: 
    if w not in stop_words: 
        new_sentence.append(w) 



print("\n\nOriginal Sentence \n\n")
print(" ".join(word_tokens)) 

print("\n\nNew Sentence \n\n")
print(" ".join(new_sentence)) 


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Original Sentence 


Robofied is a comprehensive Artificial Intelligence platform based in Gurugram , Haryana working towards democratizing safe artificial intelligence towards a common goal of Singularity . At Robofied , we are doing research in speech , natural language , and machine learning . We develop open-source solutions for developers which empowers them so that they can make better products for the world . We educate people about Artificial Intelligence , its scope and impact via resources and tutorials .


New Sentence 


Robofied comprehensive Artificial Intelligence platform based Gurugram , Haryana working towards democratizing safe artificial intelligence towards common goal Singularity . At Robofied , research speech , natural language , mach

<hr>

### **2. Stopword Removal using Gensim**

<hr/>





In [None]:
from gensim.parsing.preprocessing import remove_stopwords

# pass the sentence in the remove_stopwords function
result = remove_stopwords("Robofied is a comprehensive Artificial Intelligence platform based in Gurugram,Haryana working towards democratizing safe artificial intelligence towards a common goal of Singularity. At Robofied, we are doing research in speech, natural language, and machine learning. We develop open-source solutions for developers which empowers them so that they can make better products for the world. We educate people about Artificial Intelligence, its scope and impact via resources and tutorials.")

print('\n\n New Sentence \n\n')
print(result)  



 New Sentence 


Robofied comprehensive Artificial Intelligence platform based Gurugram,Haryana working democratizing safe artificial intelligence common goal Singularity. At Robofied, research speech, natural language, machine learning. We develop open-source solutions developers empowers better products world. We educate people Artificial Intelligence, scope impact resources tutorials.


<hr>

### **Stopword Removal using spaCy**

<hr/>

In [None]:
from spacy.lang.en import English

# Load English tokenizer, tagger, parser, NER and word vectors
nlp = English()

text = "Robofied is a comprehensive Artificial Intelligence platform based in Gurugram,Haryana working towards democratizing safe artificial intelligence towards a common goal of Singularity. At Robofied, we are doing research in speech, natural language, and machine learning. We develop open-source solutions for developers which empowers them so that they can make better products for the world. We educate people about Artificial Intelligence, its scope and impact via resources and tutorials."



doc = nlp(text)

# Create list of word tokens
token_list = []
for token in doc:
    token_list.append(token.text)

from spacy.lang.en.stop_words import STOP_WORDS

# Create list of word tokens after removing stopwords
New_sentence =[] 

for word in token_list:
    lexeme = nlp.vocab[word]
    if lexeme.is_stop == False:
        New_sentence.append(word) 
print(token_list)
print(New_sentence)   

['Robofied', 'is', 'a', 'comprehensive', 'Artificial', 'Intelligence', 'platform', 'based', 'in', 'Gurugram', ',', 'Haryana', 'working', 'towards', 'democratizing', 'safe', 'artificial', 'intelligence', 'towards', 'a', 'common', 'goal', 'of', 'Singularity', '.', 'At', 'Robofied', ',', 'we', 'are', 'doing', 'research', 'in', 'speech', ',', 'natural', 'language', ',', 'and', 'machine', 'learning', '.', 'We', 'develop', 'open', '-', 'source', 'solutions', 'for', 'developers', 'which', 'empowers', 'them', 'so', 'that', 'they', 'can', 'make', 'better', 'products', 'for', 'the', 'world', '.', 'We', 'educate', 'people', 'about', 'Artificial', 'Intelligence', ',', 'its', 'scope', 'and', 'impact', 'via', 'resources', 'and', 'tutorials', '.']
['Robofied', 'comprehensive', 'Artificial', 'Intelligence', 'platform', 'based', 'Gurugram', ',', 'Haryana', 'working', 'democratizing', 'safe', 'artificial', 'intelligence', 'common', 'goal', 'Singularity', '.', 'Robofied', ',', 'research', 'speech', ',', 