# Code Demonstrating Bag of Words 

## Import the **`CountVectorizer`** class from the **`sklearn.feature_extraction.text`** module. This class helps us create the bag of words representation.

In [3]:
from sklearn.feature_extraction.text import CountVectorizer

## Define a list named **`sentences`** containing three sample sentences that we want to analyze using the bag of words approach.

In [4]:
sentences = [
    "I love to play cricket",
    "cricket is exciting sport",
    "playing cricket is a great habit"
]

In [5]:
# Initialize the CountVectorizer
vectorizer = CountVectorizer()

## Created an object of the **`CountVectorizer`** class. This object is helpful to convert the text data into a bag of words representation.

In [6]:
# Fit and transform the sentences to create the bag of words representation
bag_of_words = vectorizer.fit_transform(sentences)

## The **`fit_transform`** method is used to convert the list of sentences into a bag of words representation. This method fits the vectorizer to the text data and transforms the sentences into a matrix

In [12]:
# Get the vocabulary (unique words) and their indices. We retrieve the vocabulary, which consists of the unique words found in the sentences, using the **`get_feature_names_out()`** method of the vectorizer.
vocab = vectorizer.get_feature_names_out()
print("Vocabulary:", vocab)

Vocabulary: ['cricket' 'exciting' 'great' 'habit' 'is' 'love' 'play' 'playing' 'sport'
 'to']


In [13]:
# Convert the bag_of_words matrix to an array
bag_of_words_array = bag_of_words.toarray()
print("Bag of Words Representation:\n", bag_of_words_array)

Bag of Words Representation:
 [[1 0 0 0 0 1 1 0 0 1]
 [1 1 0 0 1 0 0 0 1 0]
 [1 0 1 1 1 0 0 1 0 0]]


## Use the **`toarray()`** method to convert the bag of words matrix into a 2D NumPy array named **`bag_of_words_array`**. Each row of this array corresponds to a sentence, and each column corresponds to a word in the vocabulary. The array contains the word counts for each word in each sentence.