# Task: Remove Stop Words from Text Using NLTK

## Problem Statement:
Write a Python program using the NLTK library to remove stop words from a given text. Stop words are common words (such as "the", "is", "in", etc.) that are typically ignored in text processing.

## Steps:
1. Install NLTK by running `pip install nltk`.
2. Import necessary functions and data from the NLTK library (e.g., `stopwords`, `word_tokenize`).
3. Download the stopwords data using `nltk.download()`.
4. Tokenize the input text into words using `word_tokenize()`.
5. Filter out the stop words by comparing each token against the NLTK stopwords list.
6. Print the processed text with stop words removed.

In [1]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\MeetRadadiya\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\stopwords.zip.


True

In [2]:
from nltk.corpus import stopwords

In [3]:
stoplist = stopwords.words('english')

In [4]:
text = '''
In computing, stop words are words which are filtered out before or after 
processing of natural language data (text). Though "stop words" usually 
refers to the most common words in a language, there is no single universal 
list of stop words used by all natural language processing tools, and 
indeed not all tools even use such a list. Some tools specifically avoid 
removing these stop words to support phrase search.
'''

In [5]:
print("\nOriginal string:")
print(text)


Original string:

In computing, stop words are words which are filtered out before or after 
processing of natural language data (text). Though "stop words" usually 
refers to the most common words in a language, there is no single universal 
list of stop words used by all natural language processing tools, and 
indeed not all tools even use such a list. Some tools specifically avoid 
removing these stop words to support phrase search.



In [6]:
clean_word_list = [word for word in text.split() if word not in stoplist]
print("\nAfter removing stop words from the said text:")
print(clean_word_list)


After removing stop words from the said text:
['In', 'computing,', 'stop', 'words', 'words', 'filtered', 'processing', 'natural', 'language', 'data', '(text).', 'Though', '"stop', 'words"', 'usually', 'refers', 'common', 'words', 'language,', 'single', 'universal', 'list', 'stop', 'words', 'used', 'natural', 'language', 'processing', 'tools,', 'indeed', 'tools', 'even', 'use', 'list.', 'Some', 'tools', 'specifically', 'avoid', 'removing', 'stop', 'words', 'support', 'phrase', 'search.']
