## 🛑 Stop Words in NLP

### What are Stop Words?
Stop words are commonly used words in a language (such as *the, is, in, and, to, for*) that **do not add much meaning** to a sentence. These words are often **filtered out** in NLP tasks to improve efficiency and focus on meaningful words.

### Why Remove Stop Words?
- Stop words appear **frequently** in text but carry **little to no contextual meaning**.
- Removing them **reduces computational complexity** and improves **text processing efficiency**.
- Some NLP models and applications **retain stop words** if context matters (e.g., chatbots, language models).

In [None]:
from nltk.corpus import stopwords
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\harsu\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [6]:
# Stop words in english
stopwords.words('english')

['a',
 'about',
 'above',
 'after',
 'again',
 'against',
 'ain',
 'all',
 'am',
 'an',
 'and',
 'any',
 'are',
 'aren',
 "aren't",
 'as',
 'at',
 'be',
 'because',
 'been',
 'before',
 'being',
 'below',
 'between',
 'both',
 'but',
 'by',
 'can',
 'couldn',
 "couldn't",
 'd',
 'did',
 'didn',
 "didn't",
 'do',
 'does',
 'doesn',
 "doesn't",
 'doing',
 'don',
 "don't",
 'down',
 'during',
 'each',
 'few',
 'for',
 'from',
 'further',
 'had',
 'hadn',
 "hadn't",
 'has',
 'hasn',
 "hasn't",
 'have',
 'haven',
 "haven't",
 'having',
 'he',
 "he'd",
 "he'll",
 'her',
 'here',
 'hers',
 'herself',
 "he's",
 'him',
 'himself',
 'his',
 'how',
 'i',
 "i'd",
 'if',
 "i'll",
 "i'm",
 'in',
 'into',
 'is',
 'isn',
 "isn't",
 'it',
 "it'd",
 "it'll",
 "it's",
 'its',
 'itself',
 "i've",
 'just',
 'll',
 'm',
 'ma',
 'me',
 'mightn',
 "mightn't",
 'more',
 'most',
 'mustn',
 "mustn't",
 'my',
 'myself',
 'needn',
 "needn't",
 'no',
 'nor',
 'not',
 'now',
 'o',
 'of',
 'off',
 'on',
 'once',
 'on

In [8]:
# Stop words in Spanish
stopwords.words('spanish')

['de',
 'la',
 'que',
 'el',
 'en',
 'y',
 'a',
 'los',
 'del',
 'se',
 'las',
 'por',
 'un',
 'para',
 'con',
 'no',
 'una',
 'su',
 'al',
 'lo',
 'como',
 'más',
 'pero',
 'sus',
 'le',
 'ya',
 'o',
 'este',
 'sí',
 'porque',
 'esta',
 'entre',
 'cuando',
 'muy',
 'sin',
 'sobre',
 'también',
 'me',
 'hasta',
 'hay',
 'donde',
 'quien',
 'desde',
 'todo',
 'nos',
 'durante',
 'todos',
 'uno',
 'les',
 'ni',
 'contra',
 'otros',
 'ese',
 'eso',
 'ante',
 'ellos',
 'e',
 'esto',
 'mí',
 'antes',
 'algunos',
 'qué',
 'unos',
 'yo',
 'otro',
 'otras',
 'otra',
 'él',
 'tanto',
 'esa',
 'estos',
 'mucho',
 'quienes',
 'nada',
 'muchos',
 'cual',
 'poco',
 'ella',
 'estar',
 'estas',
 'algunas',
 'algo',
 'nosotros',
 'mi',
 'mis',
 'tú',
 'te',
 'ti',
 'tu',
 'tus',
 'ellas',
 'nosotras',
 'vosotros',
 'vosotras',
 'os',
 'mío',
 'mía',
 'míos',
 'mías',
 'tuyo',
 'tuya',
 'tuyos',
 'tuyas',
 'suyo',
 'suya',
 'suyos',
 'suyas',
 'nuestro',
 'nuestra',
 'nuestros',
 'nuestras',
 'vuestro'

### 🛑 Example of Stop Words Removal & Stemming using Porter Stemmer

In [27]:
paragraph= """The quick brown fox jumps over the lazy dog. It was a bright and sunny day, but she didn't feel like going outside. He couldn't understand why the weather affected his mood so much. However, he decided not to let it bother him and continued reading his favorite book."""


In [28]:
sentences=nltk.sent_tokenize(corpus)
sentences

['The quick brown fox jumps over the lazy dog.',
 "It was a bright and sunny day, but she didn't feel like going outside.",
 "He couldn't understand why the weather affected his mood so much.",
 'However, he decided not to let it bother him and continued reading his favorite book.']

In [None]:
from nltk.stem import PorterStemmer
stemmer= PorterStemmer()

# Iterate over each sentence in the list
for i in range(len(sentences)):

    # Tokenize the sentence into words
    words=nltk.word_tokenize(sentences[i])

    # Apply stemming while removing stop words
    words=[stemmer.stem(word) for word in words if word not in set(stopwords.words('english'))]

    # Join the processed words back into a sentence
    sentences[i]= ' '.join(words)
sentences

['quick brown fox jump lazi dog .',
 "bright sunni day , n't feel like go outsid .",
 "could n't understand weather affect mood much .",
 'howev , decid let bother continu read favorit book .']

### 🛑 Example of Stop Words Removal & Stemming using Snowball Stemmer

In [33]:
from nltk.stem import SnowballStemmer
stemmer=SnowballStemmer('english')

sentences=nltk.sent_tokenize(corpus)

# Iterate over each sentence in the list
for i in range(len(sentences)):

    # Tokenize the sentence into words
    words=nltk.word_tokenize(sentences[i])

    # Apply stemming while removing stop words
    words=[stemmer.stem(word) for word in words if word not in set(stopwords.words('english'))]

    # Join the processed words back into a sentence
    sentences[i]= ' '.join(words)
sentences

['the quick brown fox jump lazi dog .',
 "it bright sunni day , n't feel like go outsid .",
 "he could n't understand weather affect mood much .",
 'howev , decid let bother continu read favorit book .']

### 🛑 Example of Stop Words Removal & Lemmatization using WordNet Lemmatizer

In [43]:
from nltk.stem import WordNetLemmatizer
lemmatizer= WordNetLemmatizer()

sentences=nltk.sent_tokenize(corpus)

# Iterate over each sentence in the list
for i in range(len(sentences)):

    # Tokenize the sentence into words
    words=nltk.word_tokenize(sentences[i])

    # Apply stemming while removing stop words
    words=[lemmatizer.lemmatize(word.lower(), pos='v') for word in words if word not in set(stopwords.words('english'))]

    # Join the processed words back into a sentence
    sentences[i]= ' '.join(words)
sentences

['the quick brown fox jump lazy dog .',
 "it bright sunny day , n't feel like go outside .",
 "he could n't understand weather affect mood much .",
 'however , decide let bother continue read favorite book .']

## 🛑 Custom Stop Words List

### Why Customize Stop Words?
While standard stop word lists are useful, they might **remove important words** that impact meaning. For example, words like **"not"**, **"couldn't"**, and **"isn't"** are crucial for **sentiment analysis**, and removing them might change the meaning of a sentence.

### Creating a Custom Stop Words List
Instead of using the default **NLTK stop words**, you can **modify** the list by:
1. **Adding words that are not relevant** to your task.
2. **Removing words that are important** (e.g., *not, never, can't*).

### Example: Custom Stop Words List in Python

In [5]:

# Load default stop words
default_stopwords = set(stopwords.words('english'))

# Define important words to **keep**
important_words = {"not", "couldn't", "isn't", "never", "can't", "won't"}

# Remove important words from stop words list
custom_stopwords = default_stopwords - important_words

# Example sentence
sentence = "I couldn't believe it was not working."

# Removing stop words using the custom list
filtered_words = [word for word in sentence.split() if word.lower() not in custom_stopwords]

print("Filtered Sentence:", " ".join(filtered_words))

Filtered Sentence: couldn't believe not working.
