In [4]:
import spacy
nlp = spacy.load('en_core_web_sm')

In [6]:
nlp

<spacy.lang.en.English at 0x1655886d0>

In [8]:
# Stop words: Even after removing such words the meaning doesn't change
# List of all stopwords
len(nlp.Defaults.stop_words)

326

In [11]:
# Check if the word is a stopword or not
nlp.vocab['is'].is_stop

True

### 1.  Adding custom words into the list of stopwords

In [15]:
# Returns set of all stopwords : cause set contains only unique elements
nlp.Defaults.stop_words.add('i.e')

In [17]:
nlp.vocab['i.e'].is_stop = True

In [18]:
len(nlp.Defaults.stop_words)

327

### 2. Removing custom words from the list of stopwords

In [19]:
nlp.vocab['i.e'].is_stop 

True

In [20]:
nlp.Defaults.stop_words.remove('i.e')
nlp.vocab['i.e'].is_stop = False

In [21]:
nlp.vocab['i.e'].is_stop 

False

### Removing stopwords from corpus

In [5]:
txt = '''
Machine learning (ML) is a branch of artificial intelligence (AI) that enables computers to learn without being explicitly programmed. It involves developing algorithms that can identify patterns and make predictions based on data. Here's a breakdown of its key aspects:
Core principles:
Learning from data: ML algorithms are trained on large datasets, iteratively improving their ability to recognize patterns and make accurate predictions without needing explicit instructions.
No explicit programming: Unlike traditional programming where every step is coded manually, ML algorithms learn on their own based on the data they're exposed to.

Different types of learning: There are various types of ML, each tackling different problems. Some common examples include:
Supervised learning: The model learns from labeled data, where each data point has a specific output associated with it. For example, an image classification model learns to identify objects in pictures by being trained on labeled images (like a picture of a cat labeled as "cat").
Unsupervised learning: The model identifies patterns in unlabeled data, where no specific output is provided. For example, a recommendation system might analyze your past purchases to suggest similar items without needing explicit labels.
Reinforcement learning: The model learns through trial and error, interacting with an environment and receiving rewards for positive actions. For example, an AI playing a game like chess learns by making moves, receiving rewards for winning, and refining its strategy based on these experiences.

Applications of Machine Learning:

ML is used in diverse fields, including:

Image recognition: Identifying objects, faces, and scenes in images and videos (e.g., facial recognition, self-driving cars)
Natural language processing: Understanding and generating human language (e.g., chatbots, machine translation)
Recommendation systems: Suggesting products, movies, or music you might enjoy
Fraud detection: Identifying suspicious financial activity
Medical diagnosis: Assisting doctors in analyzing medical images and data'''

In [23]:
txt = txt.replace('\n','')
txt.strip()

'Machine learning (ML) is a branch of artificial intelligence (AI) that enables computers to learn without being explicitly programmed. It involves developing algorithms that can identify patterns and make predictions based on data. Here\'s a breakdown of its key aspects:Core principles:Learning from data: ML algorithms are trained on large datasets, iteratively improving their ability to recognize patterns and make accurate predictions without needing explicit instructions.No explicit programming: Unlike traditional programming where every step is coded manually, ML algorithms learn on their own based on the data they\'re exposed to.Different types of learning: There are various types of ML, each tackling different problems. Some common examples include:Supervised learning: The model learns from labeled data, where each data point has a specific output associated with it. For example, an image classification model learns to identify objects in pictures by being trained on labeled imag

In [24]:
corpus = nlp(txt)

In [25]:
corpus

Machine learning (ML) is a branch of artificial intelligence (AI) that enables computers to learn without being explicitly programmed. It involves developing algorithms that can identify patterns and make predictions based on data. Here's a breakdown of its key aspects:Core principles:Learning from data: ML algorithms are trained on large datasets, iteratively improving their ability to recognize patterns and make accurate predictions without needing explicit instructions.No explicit programming: Unlike traditional programming where every step is coded manually, ML algorithms learn on their own based on the data they're exposed to.Different types of learning: There are various types of ML, each tackling different problems. Some common examples include:Supervised learning: The model learns from labeled data, where each data point has a specific output associated with it. For example, an image classification model learns to identify objects in pictures by being trained on labeled images 

### 3.1 Finding stopwords from the corpus

In [37]:
stop_word = set()

for token in corpus:
    if token.is_stop:
        stop_word.add(token.text)
print(stop_word)
print(len(stop_word))

{'every', 'own', 'various', 'each', 'For', 'has', 'used', 'you', 'these', 'as', 'being', 'through', 'on', 'might', 'a', 'and', 'is', 'with', 'No', "'re", 'Some', 'an', 'or', 'that', 'for', 'to', 'The', 'it', 'make', 'no', 'from', 'your', 'their', 'Here', 'are', 'the', "'s", 'It', 'its', 'where', 'they', 'of', 'without', 'can', 'by', 'There', 'in'}
47


### 3.2 Finding the words that doesn't belong to stopwords

In [43]:
' '.join([token.text for token in corpus if not token.is_stop])

'Machine learning ( ML ) branch artificial intelligence ( AI ) enables computers learn explicitly programmed . involves developing algorithms identify patterns predictions based data . breakdown key aspects : Core principles : Learning data : ML algorithms trained large datasets , iteratively improving ability recognize patterns accurate predictions needing explicit instructions . explicit programming : Unlike traditional programming step coded manually , ML algorithms learn based data exposed . Different types learning : types ML , tackling different problems . common examples include : Supervised learning : model learns labeled data , data point specific output associated . example , image classification model learns identify objects pictures trained labeled images ( like picture cat labeled " cat").Unsupervised learning : model identifies patterns unlabeled data , specific output provided . example , recommendation system analyze past purchases suggest similar items needing explicit