# Stop Words

##### Stop words are the common words in a language (like “the,” “is,” “in,” “and,” “a”) that are often filtered out before processing natural language data. These words carry little meaningful information for many NLP tasks like search, classification, and sentiment analysis.
##### ✅ Why Remove Stop Words?
#####   Removing stop words:
#####   Reduces noise in the data.
#####   Decreases vocabulary size.
#####   Speeds up processing.
#####   Focuses on important words (nouns, verbs, etc.).

In [None]:
from nltk.corpus import stopwords
stopwords.words('english')  # list of stopwords in English

['a',
 'about',
 'above',
 'after',
 'again',
 'against',
 'ain',
 'all',
 'am',
 'an',
 'and',
 'any',
 'are',
 'aren',
 "aren't",
 'as',
 'at',
 'be',
 'because',
 'been',
 'before',
 'being',
 'below',
 'between',
 'both',
 'but',
 'by',
 'can',
 'couldn',
 "couldn't",
 'd',
 'did',
 'didn',
 "didn't",
 'do',
 'does',
 'doesn',
 "doesn't",
 'doing',
 'don',
 "don't",
 'down',
 'during',
 'each',
 'few',
 'for',
 'from',
 'further',
 'had',
 'hadn',
 "hadn't",
 'has',
 'hasn',
 "hasn't",
 'have',
 'haven',
 "haven't",
 'having',
 'he',
 "he'd",
 "he'll",
 'her',
 'here',
 'hers',
 'herself',
 "he's",
 'him',
 'himself',
 'his',
 'how',
 'i',
 "i'd",
 'if',
 "i'll",
 "i'm",
 'in',
 'into',
 'is',
 'isn',
 "isn't",
 'it',
 "it'd",
 "it'll",
 "it's",
 'its',
 'itself',
 "i've",
 'just',
 'll',
 'm',
 'ma',
 'me',
 'mightn',
 "mightn't",
 'more',
 'most',
 'mustn',
 "mustn't",
 'my',
 'myself',
 'needn',
 "needn't",
 'no',
 'nor',
 'not',
 'now',
 'o',
 'of',
 'off',
 'on',
 'once',
 'on

In [6]:
para = """I have three visions for India. In 3000 years of our history people from all over the world have come and invaded us, captured our lands, conquered our minds. 
From Alexander onwards the Greeks, the Turks, the Moguls, the Portuguese, the British, the French, the Dutch, all of them came and looted us, took over what was ours. 
Yet we have not done this to any other nation. We have not conquered anyone. We have not grabbed their land, their culture and their history and tried to enforce our way of life on them. Why? Because we respect the freedom of others. 
That is why my FIRST VISION is that of FREEDOM. I believe that India got its first vision of this in 1857, when we started the war of Independence. 
It is this freedom that we must protect and nurture and build on. If we are not free, no one will respect us.

We have 10 percent growth rate in most areas. Our poverty levels are falling. 
Our achievements are being globally recognised today. 
Yet we lack the self-confidence to see ourselves as a developed nation, self-reliant and self-assured. 
Isn’t this incorrect? MY SECOND VISION for India is DEVELOPMENT. For fifty years we have been a developing nation. 
It is time we see ourselves as a developed nation. We are among top five nations in the world in terms of GDP.

I have a THIRD VISION. India must stand up to the world. 
Because I believe that unless India stands up to the world, no one will respect us. 
Only strength respects strength. We must be strong not only as a military power but also as an economic power. 
Both must go hand-in-hand. My good fortune was to have worked with three great minds. Dr.Vikram Sarabhai, of the Dept. of Space, Professor Satish Dhawan, who succeeded him and Dr. Brahm Prakash, father of nuclear material. 
I was lucky to have worked with all three of them closely and consider this the great opportunity of my life."""

In [23]:
import nltk
import copy 
sentences = nltk.sent_tokenize(para)  # converting the para into sentences
sentences_2 = copy.deepcopy(sentences)  # making a copy of the sentences
sentences_3 = copy.deepcopy(sentences)  # making a copy of the sentences

In [13]:
len(sentences) # number of sentences in the paragraph

31

In [17]:
from nltk.stem import PorterStemmer
stem = PorterStemmer()

for i in range(len(sentences)):
    words = nltk.word_tokenize(sentences[i]) # converting each sentence into words
    words = [stem.stem(word) for word in words if word not in set(stopwords.words('english'))]  # if the word is not present in stopwords list only then stem it
                                                                                           # also we are using set so that there is no repetition of words and it makes the search faster
    sentences[i] = ' '.join(words)

In [18]:
sentences

['i three vision india .',
 'in 3000 year histori peopl world come invad us , captur land , conquer mind .',
 'from alexand onward greek , turk , mogul , portugues , british , french , dutch , came loot us , took .',
 'yet done nation .',
 'we conquer anyon .',
 'we grab land , cultur histori tri enforc way life .',
 'whi ?',
 'becaus respect freedom other .',
 'that first vision freedom .',
 'i believ india got first vision 1857 , start war independ .',
 'it freedom must protect nurtur build .',
 'if free , one respect us .',
 'we 10 percent growth rate area .',
 'our poverti level fall .',
 'our achiev global recognis today .',
 'yet lack self-confid see develop nation , self-reli self-assur .',
 'isn ’ incorrect ?',
 'my second vision india develop .',
 'for fifti year develop nation .',
 'it time see develop nation .',
 'we among top five nation world term gdp .',
 'i third vision .',
 'india must stand world .',
 'becaus i believ unless india stand world , one respect us .',
 'onl

In [21]:
# using SnowballStemmer
from nltk.stem import SnowballStemmer
snowball_stem = SnowballStemmer("english")
for i in range(len(sentences_2)):
    words = nltk.word_tokenize(sentences[i]) 
    words = [snowball_stem.stem(word) for word in words if word not in set(stopwords.words('english'))]  
    sentences_2[i] = ' '.join(words)

In [22]:
sentences_2

['i three vision india .',
 'in 3000 year histori peopl world come invad us , captur land , conquer mind .',
 'from alexand onward greek , turk , mogul , portugues , british , french , dutch , came loot us , took .',
 'yet done nation .',
 'we conquer anyon .',
 'we grab land , cultur histori tri enforc way life .',
 'whi ?',
 'becaus respect freedom other .',
 'that first vision freedom .',
 'i believ india got first vision 1857 , start war independ .',
 'it freedom must protect nurtur build .',
 'if free , one respect us .',
 'we 10 percent growth rate area .',
 'our poverti level fall .',
 'our achiev global recognis today .',
 'yet lack self-confid see develop nation , self-reli self-assur .',
 'isn ’ incorrect ?',
 'my second vision india develop .',
 'for fifti year develop nation .',
 'it time see develop nation .',
 'we among top five nation world term gdp .',
 'i third vision .',
 'india must stand world .',
 'becaus i believ unless india stand world , one respect us .',
 'onl

In [30]:
# Now using WordNetLemmatizer
from nltk.stem import WordNetLemmatizer
lemm = WordNetLemmatizer()
for i in range(len(sentences_3)):
    sentences_3[i] = sentences_3[i].lower()  # converting the sentence to lower case for better lemmatization
    words = nltk.word_tokenize(sentences_3[i])
    words = [lemm.lemmatize(word) for word in words if word not in set(stopwords.words('english'))]
    sentences_3[i] = ' '.join(words)

In [31]:
sentences_3

['three vision india .',
 '3000 year history people world come invaded u , captured land , conquered mind .',
 'alexander onwards greek , turk , mogul , portuguese , british , french , dutch , came looted u , took .',
 'yet done nation .',
 'conquered anyone .',
 'grabbed land , culture history tried enforce way life .',
 '?',
 'respect freedom others .',
 'first vision freedom .',
 'believe india got first vision 1857 , started war independence .',
 'freedom must protect nurture build .',
 'free , one respect u .',
 '10 percent growth rate area .',
 'poverty level falling .',
 'achievement globally recognised today .',
 'yet lack self-confidence see developed nation , self-reliant self-assured .',
 '’ incorrect ?',
 'second vision india development .',
 'fifty year developing nation .',
 'time see developed nation .',
 'among top five nation world term gdp .',
 'third vision .',
 'india must stand world .',
 'believe unless india stand world , one respect u .',
 'strength respect stre