<a href="https://colab.research.google.com/github/Chaitanya-Kumaria/Deep-Learning-Notes/blob/main/Continuos_Bag_of_words.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Continuos Bag Of Words Model

In this example, we define a toy training dataset consisting of three sentences: "i love you", "you are amazing", and "i hate you". We set the vocabulary size to 6 and the embedding dimension to 3. We also set the context window size to 1.

In [5]:
import numpy as np

# Define the training data
sentences = [['i', 'love', 'you'], ['you', 'are', 'amazing'], ['i', 'hate', 'you']]

# Define the vocabulary size and embedding dimension
vocab_size = 6
embedding_dim = 3

# Define the context window size
window_size = 1

# Vocabulary
vocab = []
for sentence in sentences:
    for word in sentence:
        if word not in vocab:
            vocab.append(word)


# Initialize the weight matrices
W_in = np.random.randn(vocab_size, embedding_dim) * 0.01
W_out = np.random.randn(embedding_dim, vocab_size) * 0.01

def softmax(x):
    exp_x = np.exp(x - np.max(x))
    return exp_x / np.sum(exp_x)


The cbow function implements the CBOW algorithm by iterating over the sentences and for each center word, computing the context vector and the corresponding probability distribution using the softmax function. It then computes the loss and the gradients with respect to W_in and W_out using backpropagation.

In [6]:
# Defining the Continuos Bag of words Model
def cbow(sentences, W_in, W_out, window_size):
    loss = 0.0
    for sentence in sentences:
        sentence_length = len(sentence)
        for i in range(sentence_length):
            context_words = []
            center_word = sentence[i]
            for j in range(max(0, i - window_size), min(sentence_length, i + window_size + 1)):
                if j != i:
                    context_words.append(sentence[j])
            x = np.zeros((1, vocab_size))
            for context_word in context_words:
                x[0][vocab.index(context_word)] += 1
            h = np.dot(x, W_in)
            y = np.dot(h, W_out)
            p = softmax(y)
            loss += -np.log(p[0][vocab.index(center_word)])
            dy = p.copy()
            dy[0][vocab.index(center_word)] -= 1
            dh = np.dot(dy, W_out.T)
            dW_out = np.dot(h.T, dy)
            dW_in = np.dot(x.T, dh)
            W_out -= learning_rate * dW_out
            W_in -= learning_rate * dW_in
    return loss

The first part of the output shows the loss after each iteration of the training loop. As the number of iterations increases, the loss decreases, indicating that the model is learning to predict the center word given its context.

In [7]:
# Train cbow
learning_rate = 0.01
num_iterations = 1000
for i in range(num_iterations):
    loss = cbow(sentences, W_in, W_out, window_size)
    if i % 100 == 0:
        print("Iteration:", i, "Loss:", loss)

Iteration: 0 Loss: 16.125790362523496
Iteration: 100 Loss: 16.111972333368914
Iteration: 200 Loss: 15.37505074664685
Iteration: 300 Loss: 10.512859285118665
Iteration: 400 Loss: 8.566014166329797
Iteration: 500 Loss: 6.7840958880904765
Iteration: 600 Loss: 6.158123052838845
Iteration: 700 Loss: 5.962475926725965
Iteration: 800 Loss: 5.879701123506617
Iteration: 900 Loss: 5.836980838732513


The second part of the output shows the learned embeddings for each word in the vocabulary. These embeddings are the weights of the input matrix W_in and represent the context of each word. We can see that words with similar meanings or contexts, such as "i" and "you" or "love" and "hate", have embeddings that are similar in magnitude and direction, while words with different meanings, such as "amazing" and "hate", have embeddings that are more dissimilar.

In [8]:
print("Learned Embeddings:")
for word in ['i', 'love', 'you', 'are', 'amazing', 'hate']:
    x = np.zeros((1, vocab_size))
    x[0][vocab.index(word)] = 1
    h = np.dot(x, W_in)
    print(word + ":", h[0])

Learned Embeddings:
i: [-1.73109312  0.17126492 -0.435265  ]
love: [ 2.07548366  0.49181949 -0.10857048]
you: [-0.9239798  -0.94272533  0.62611378]
are: [ 0.24049492  1.89290048 -1.28569674]
amazing: [ 0.83711837 -1.12707923  1.06111104]
hate: [ 2.07798365  0.47997593 -0.12594603]
