## Continuous Bag of Words Implementation

__*The Continuous Bag-of-Words model (CBOW) is frequently used in NLP deep learning. It is a model that tries to predict words given the context of a few words before and a few words after the target word.
The CBOW model is as follows. Given a target word $w_i$ and an $N$ context window on each side,$w_{i-1}, \dots, w_{i-N}$ and $w_{i+1}, \dots, w_{i+N}$, referring to all context words collectively as $C$, CBOW tries to minimize
 $  -\log p(w_i | C) = -\log \text{Softmax}(A(\sum_{w \in C} q_w) + b)$where $q_w$ is the embedding for word $w$.*__


__*This is a solution for the exercise problem of word embedding tutorial https://pytorch.org/tutorials/beginner/nlp/word_embeddings_tutorial.html
<br>In this problem given two context word to the left and two context word to the right, we have to predict the middle target word.*__

In [1]:
import torch
import torch.nn as nn
import numpy as np
import torch.optim as optim

In [2]:
corpus = "Once Narada met Sanath kumara and asked for enlightenment. \
Sanath kumara asked Narada about the special power which Narada had acquired because of his learning.\
To this Narada replied that he knows all that is contained in the four Vedas and the six Shastras.\
Sanath kumara smiled at this reply and said that while it is a matter of great satisfaction that\
Narada had learnt the Vedas and the Shastras but he would like to ask whether he had learnt anything\
of the self and whether he had understood himself. Sanath kumara then told Narada that so long as one\
does not understand one’s self, the knowledge of all the Shastras, all the Vedas, of the Gita and the \
Upanishads becomes quite useless. Your knowledge will become useful only when you are able to realise the\
nature of the self. What is important is the Adwaitha darshana. You should be able to realise and understand\
the non-dual aspect that is pervading the entire universe. Today in the world, without making an effort to \
understand one’s own self, people are imagining that they are achieving many great things with the help of \
modern science, and in the process they are putting their feet into many difficult situations. By saying that \
they are able to travel far into the sky, see the stars, go to the moon and set up camps there, they are only \
building castles in the air. They may partially succeed in doing such things, but if in the process they do not \
understand the Self and if they do not have peace of mind for themselves, they are very foolish indeed. "

In [3]:
CONTEXT_SIZE = 2
EMBEDDING_DIM = 300
EPOCH = 40
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [4]:
# creating Model
class CBOW(nn.Module):
    
    def __init__(self, vocabulary_size,embedding_dim):
        super().__init__()
        self.embedding = nn.Embedding(vocabulary_size,embedding_dim)
        self.linear1 = nn.Linear(embedding_dim,512)
        self.act1 = nn.ReLU()
        self.linear2 = nn.Linear(512,vocabulary_size)
        self.act2 = nn.LogSoftmax()
        
    def forward(self,input_word_index):
        embeddings = sum(self.embedding(input_word_index)) # summing up the embedding of each context word
        flatten = embeddings.view(1,-1) 
        result = self.linear1(flatten)
        result = self.act1(result)
        result = self.linear2(result)
        result = self.act2(result)
        return result


In [5]:
def train(data,vocabulary_size,word2index):
    
    model = CBOW(vocabulary_size,EMBEDDING_DIM).to(device)
    loss_function = nn.NLLLoss()
    optimizer = optim.SGD(model.parameters(),lr = 0.001)
    
    for epoch in range(EPOCH):
        total_loss = 0
        for context, target in data:
            #converting words to their respective index
            context_tensor = torch.cuda.LongTensor([word2index[word] for word in context])
            target_tensor = torch.cuda.LongTensor([word2index[target]])
            
            model.zero_grad()
            
            log_probability = model(context_tensor)
            loss = loss_function(log_probability,target_tensor)
            
            loss.backward()
            optimizer.step()
            
            total_loss+=loss.data
            
        avg_loss = float(total_loss / len(data))
        print("Epoch: {}/{} \t Avg_Loss {:.4f}".format(epoch+1, EPOCH, avg_loss))
            
    return model
        

In [6]:
    Token = corpus.split()
    vocabulary = set(Token) #Removed Duplicates
    vocabulary_size = len(vocabulary)

    #creating word to index dictionary
    word2index = {word:i for i,word in enumerate(vocabulary)}
    index2word = {word:i for word,i in enumerate(vocabulary)}

    #storing context and target pair
    data = []
    for i in range(CONTEXT_SIZE,len(Token)-CONTEXT_SIZE):
        context = [Token[i-2],Token[i-1],Token[i+1],Token[i+2]]
        target = Token[i]
        data.append((context,target))

    # Training the model
    cbow = train(data,vocabulary_size,word2index)
    
    # Saving the model
    torch.save(cbow.state_dict(),"./model_save.pth")     



Epoch: 1/40 	 Avg_Loss 5.02
Epoch: 2/40 	 Avg_Loss 4.37
Epoch: 3/40 	 Avg_Loss 3.90
Epoch: 4/40 	 Avg_Loss 3.47
Epoch: 5/40 	 Avg_Loss 3.08
Epoch: 6/40 	 Avg_Loss 2.72
Epoch: 7/40 	 Avg_Loss 2.37
Epoch: 8/40 	 Avg_Loss 2.05
Epoch: 9/40 	 Avg_Loss 1.76
Epoch: 10/40 	 Avg_Loss 1.49
Epoch: 11/40 	 Avg_Loss 1.25
Epoch: 12/40 	 Avg_Loss 1.04
Epoch: 13/40 	 Avg_Loss 0.86
Epoch: 14/40 	 Avg_Loss 0.71
Epoch: 15/40 	 Avg_Loss 0.59
Epoch: 16/40 	 Avg_Loss 0.49
Epoch: 17/40 	 Avg_Loss 0.42
Epoch: 18/40 	 Avg_Loss 0.36
Epoch: 19/40 	 Avg_Loss 0.31
Epoch: 20/40 	 Avg_Loss 0.28
Epoch: 21/40 	 Avg_Loss 0.25
Epoch: 22/40 	 Avg_Loss 0.22
Epoch: 23/40 	 Avg_Loss 0.20
Epoch: 24/40 	 Avg_Loss 0.18
Epoch: 25/40 	 Avg_Loss 0.17
Epoch: 26/40 	 Avg_Loss 0.16
Epoch: 27/40 	 Avg_Loss 0.15
Epoch: 28/40 	 Avg_Loss 0.14
Epoch: 29/40 	 Avg_Loss 0.13
Epoch: 30/40 	 Avg_Loss 0.12
Epoch: 31/40 	 Avg_Loss 0.11
Epoch: 32/40 	 Avg_Loss 0.11
Epoch: 33/40 	 Avg_Loss 0.10
Epoch: 34/40 	 Avg_Loss 0.10
Epoch: 35/40 	 Avg_Loss

In [7]:
def test(cbow, context_tensor, index2word):    
    target = cbow(context_tensor).cpu().data.numpy() 
    target_index = np.argmax(target) 
    prediction = index2word[target_index]
    print("\nPredicted: {}\n\n".format(prediction))

In [8]:
# Testing the Model

# Loading the saved model
cbow = CBOW(vocabulary_size,EMBEDDING_DIM)
cbow.load_state_dict(torch.load("./model_save.pth"))
cbow.to(device)

print("\n please enter continuous indices, two to the left and right of the word to be predicted respectively \n")
print("\n Input seperated by whitespace\n")

indices = list(map(int,input().split()))

context = [Token[indices[0]], Token[indices[1]], Token[indices[2]], Token[indices[3]]]

print("\nContext: {}".format(context))

context_tensor = torch.LongTensor([word2index[word] for word in context]).to(device)   

test(cbow, context_tensor, index2word)



 please enter continuous indices, two to the left and right of the word to be predicted respectively 


 Input seperated by whitespace

4 5 7 8

Context: ['kumara', 'and', 'for', 'enlightenment.']

Predicted: asked






In [19]:
# Calculating Train Accuracy

score = 0
for context, target in data:
    #converting words to their respective index
    context_tensor = torch.cuda.LongTensor([word2index[word] for word in context])
    target_tensor = torch.cuda.LongTensor([word2index[target]])
    target_pred = cbow(context_tensor).cpu().data.numpy() 
    target_index = np.argmax(target_pred) 
    prediction = index2word[target_index]
    if(prediction == target):
        score+=1

print("Train Accuracy: {:.4f}\n\n".format((score*100)/len(data)))



Train Accuracy: 99.6183


