<a href="https://colab.research.google.com/github/Ehtisham1053/Natural-Language-Processing/blob/main/Stop_word_removal.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#🛑 Stop Word Removal in NLP
Stop words are commonly used words (like is, the, a, an, in, on, and) that do not add much meaning to a sentence and are often removed to improve NLP model performance.

##📌 Why Remove Stop Words?
* Reduces dataset size → Less processing time ⏳
* Improves model efficiency → Focuses on meaningful words 🔍
* Eliminates noise → Avoids unnecessary words

## 1️⃣ Stopword Removal using NLTK

In [1]:
!pip install nltk




In [4]:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

nltk.download('stopwords')
nltk.download('punkt')
nltk.download('punkt_tab')

text = "This is an example of stop word removal in NLP."

words = word_tokenize(text)

stop_words = set(stopwords.words('english'))

filtered_words = [word for word in words if word.lower() not in stop_words]

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


In [5]:
# Print results
print("Original Text:", text)
print("After Stopword Removal:", " ".join(filtered_words))

Original Text: This is an example of stop word removal in NLP.
After Stopword Removal: example stop word removal NLP .


✅ Pros: Supports multiple languages.
❌ Cons: May remove useful words (e.g., not in not bad).

## 2️⃣ Stopword Removal using spaCy

In [8]:
!pip install spacy
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m89.8 MB/s[0m eta [36m0:00:00[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [9]:
import spacy

nlp = spacy.load("en_core_web_sm")

text = "This is an example of stop word removal in NLP."

doc = nlp(text)

# Remove stop words
filtered_words = [token.text for token in doc if not token.is_stop]




✅ Pros: More advanced, supports lemmatization.
❌ Cons: Slightly heavier than nltk.



In [10]:
# Print results
print("Original Text:", text)
print("After Stopword Removal:", " ".join(filtered_words))

Original Text: This is an example of stop word removal in NLP.
After Stopword Removal: example stop word removal NLP .


## 3️⃣ Stopword Removal using Scikit-learn

In [11]:
from sklearn.feature_extraction.text import ENGLISH_STOP_WORDS

# Define a sample text
text = "This is an example of stop word removal in NLP."

# Tokenize text
words = text.split()

# Remove stop words using sklearn's stopword list
filtered_words = [word for word in words if word.lower() not in ENGLISH_STOP_WORDS]

# Print results
print("Original Text:", text)
print("After Stopword Removal:", " ".join(filtered_words))


Original Text: This is an example of stop word removal in NLP.
After Stopword Removal: example stop word removal NLP.


✅ Pros: Works well for vectorization (TF-IDF).
❌ Cons: Stopword list is fixed, not customizable.