# CS549 Machine Learning 
# Assignment 8: Word embeddings

**Total points: 10**

In this assignment, you will implement a simple continuous bag-of-words (CBOW) model that uses surrounding context words to predict the target word in the middle.

In [175]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from a8_utils import build_vocab

## Task 1. Prepare data
**3 points**

In this task, you should prepare your data, which is a list of tuples. Each tuple has two elements: a list of context words, and the target word.
Here both context words and target word are represented by word indices.

In [176]:
CONTEXT_SIZE = 3  # Define the context size. Default value 3, which means the context includes 3 words to the left, 3 to the right

raw_text = """We are about to study the idea of a computational process.
Computational processes are abstract beings that inhabit computers.
As they evolve, processes manipulate other abstract things called data.
The evolution of a process is directed by a pattern of rules
called a program. People create programs to direct processes. In effect,
we conjure the spirits of the computer with our spells.""".split()

vocab = build_vocab(raw_text)
vocab_size = len(vocab)
print(vocab_size)
word_to_idx = {word: i for i, word in enumerate(vocab)}
idx_to_word = list(vocab)
word_indices = [word_to_idx[w] for w in raw_text]

def prepare_data(word_indices):
    data = []
    for i in range(CONTEXT_SIZE, len(word_indices) - CONTEXT_SIZE):

        #### START YOUR CODE ####
        # Hint: You can intialize context to an empty list
        # and then use a for loop to append elements to context propoerly.
        context = []
        target = i
        for j in range(i-CONTEXT_SIZE,i):
          context.append(j)
        for j in range(i+1,i+CONTEXT_SIZE+1):
          context.append(j)
        #### END YOUR CODE ####

        data.append((context, target))

    return data

49


In [177]:
# Test Task 1. Do not change the code below.
data = prepare_data(word_indices)
print('data[0]:', data[0])
ctx, tgt = data[0]
print('context words:', [idx_to_word[c] for c in ctx])
print('target word:', idx_to_word[tgt])

data[0]: ([0, 1, 2, 4, 5, 6], 3)
context words: ['We', 'are', 'about', 'study', 'the', 'idea']
target word: to


## Expected output

|&nbsp;|&nbsp;|
|--|--|
|data\[0\]: |(\[0, 1, 2, 4, 5, 6\], 3)|
|context words: |\['We', 'are', 'about', 'study', 'the', 'idea'\]|
|target word: | to|

---



## Task 2: Implement a CBOW model

**4 points**

In this task, you will implement a CBOW model. In the `__init__()` method, define the size of `self.embeddings` and `self.linear` properly.

The `self.linear` takes the average embeddings of all context words as input, and the output size is `vocab_size`.
It is followed by a softmax activation (`nn.LogSoftmax`).

The `forward()` method has a input argument `inputs`, which is the context word indices (in a `torch.long` tensor).
You should get the embeddings of all context words, and compute the average emebdding (into the `embeds` variable).

In [178]:
class CBOW(nn.Module):
    def __init__(self, vocab_size, embedding_dim):
        super(CBOW, self).__init__()

        #### START YOUR CODE ####
        self.embeddings = nn.Embedding(vocab_size,embedding_dim)
        self.linear = nn.Linear(embedding_dim,vocab_size)
        #### END YOUR CODE ####
        
        self.act = nn.LogSoftmax(dim=-1)

    def forward(self, inputs):
        #### START YOUR CODE ####
        embeds = torch.sum(self.embeddings(inputs),dim=0).view(1,-1)
        #### END YOUR CODE ####
        
        out = self.linear(embeds)
        out = self.act(out)
        return out

In [179]:
# Test Task 2. Do not change the code blow
torch.manual_seed(0)

m = CBOW(10, 20)
test_input = torch.tensor([1,2,3], dtype=torch.long)
test_output = m(test_input)

print('test_output.shape', test_output.shape)
print('test_output', test_output.data)

test_output.shape torch.Size([1, 10])
test_output tensor([[-1.6878, -4.2108, -5.0252, -2.9802, -3.1362, -1.5436, -1.4120, -3.2485,
         -1.6490, -4.5009]])


### Expected output
|&nbsp;|&nbsp;|
|--|--|
|test_output.shape| torch.Size(\[1, 10\])|
|test_output|tensor(\[\[-1.6878, -4.2108, -5.0252, -2.9802, -3.1362, -1.5436, -1.4120, -3.2485, -1.6490, -4.5009\]\])|

---

## Task 3. Training loop
**2 points**

In this task, you will complete the training loop. 

You should create `ctx_tensor` and `tgt_tensor` out of `ctx` and `tgt`, respectively. *Hint*: you need to put `tgt` to a list before craeting the `tgt_tensor`, so that the resulting tensor is of the correct dimension that is acceptable to `nn.NLLLoss()`.

`ctx_tensor` is used to compute `output`. `loss_function()` is called upon `output` and `tgt_tensor` to compute the loss.

In [180]:
torch.manual_seed(0)
EMDEDDING_DIM = 100
model = CBOW(65, EMDEDDING_DIM)
loss_function = nn.NLLLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)

# Training
for epoch in range(100):
    total_loss = 0

    for ctx, tgt in data:
        #### START YOUR CODE ####
        #print(ctx,tgt)
        ctx_tensor = torch.tensor(ctx,dtype=torch.long) # Hint: the tensor type should be torch.long
        tgt_tensor = torch.tensor([tgt])
        output = model(ctx_tensor)
        # The try...except code is to help you debug. You can leave them unchanged. 
        try:
          #print(loss_function(output,tgt_tensor))
          total_loss += loss_function(output,tgt_tensor)
        except Exception:
            print(ctx_tensor)
            print(tgt_tensor)
            raise
        #### END YOUR CODE ####

    #optimize at the end of each epoch
    optimizer.zero_grad()
    total_loss.backward()
    optimizer.step()

    # print training information
    if epoch % 5 == 0 and epoch > 0:
        print(f'Loss within epoch {epoch}: ', total_loss.item())

Loss within epoch 5:  128.23919677734375
Loss within epoch 10:  58.24411392211914
Loss within epoch 15:  31.39347267150879
Loss within epoch 20:  20.52120590209961
Loss within epoch 25:  15.03664779663086
Loss within epoch 30:  11.805497169494629
Loss within epoch 35:  9.696945190429688
Loss within epoch 40:  8.219900131225586
Loss within epoch 45:  7.130527496337891
Loss within epoch 50:  6.295182704925537
Loss within epoch 55:  5.634889125823975
Loss within epoch 60:  5.100137710571289
Loss within epoch 65:  4.658383369445801
Loss within epoch 70:  4.287387847900391
Loss within epoch 75:  3.9714505672454834
Loss within epoch 80:  3.69918155670166
Loss within epoch 85:  3.462122678756714
Loss within epoch 90:  3.2538628578186035
Loss within epoch 95:  3.0694527626037598


### Expected output:

You should obeserve the loss decreasing from 100+ (at epoch 5) to around 3.x (at epoch 95).
The absolute values do not matter.

<!-- |&nbsp;|&nbsp;|
|--|--|
|Loss within epoch 5: | 138.33709716796875|
|Loss within epoch 10: | 70.50218963623047|
|Loss within epoch 15: | 38.877227783203125|
|Loss within epoch 20: | 25.064970016479492|
|Loss within epoch 25: | 18.110904693603516|
|Loss within epoch 30: | 14.05634880065918|
|Loss within epoch 35: | 11.44089126586914|
|Loss within epoch 40: | 9.62782096862793|
|Loss within epoch 45: | 8.302525520324707|
|Loss within epoch 50: | 7.293947219848633|
|Loss within epoch 55: | 6.501856327056885|
|Loss within epoch 60: | 5.863919734954834|
|Loss within epoch 65: | 5.339460372924805|
|Loss within epoch 70: | 4.900869846343994|
|Loss within epoch 75: | 4.528764247894287|
|Loss within epoch 80: | 4.209164619445801|
|Loss within epoch 85: | 3.9317333698272705|
|Loss within epoch 90: | 3.688669443130493|
|Loss within epoch 95: | 3.473975896835327| -->

---

## Task 4
**1 point**

In this final task, you will need to find the maximum index among the model output. *Hint*: use `torch.argmax()`.

In [181]:
def get_predicted_word(model_output, idx_to_word):
    #### START YOUR CODE ####
    idx = torch.argmax(model_output)
    #### END YOUR CODE ####

    return idx_to_word[idx]

In [182]:
# Test Task 4. Do not change the code blow
ctx_words = 'evolution of a is directed by'.split()
ctx_indices = [word_to_idx[w] for w in ctx_words]
ctx_tensor = torch.tensor(ctx_indices, dtype=torch.long)

out = model(ctx_tensor)
pred = get_predicted_word(out, idx_to_word)
print(f'The predicted word is: \"{pred}\"')

The predicted word is: "process"


### Expected output

|&nbsp;|&nbsp;|
|--|--|
|The predicted word is: |"process"|

## Congratulation!
Congratulations! You have now understood how to use word embeddings for some basic NLP tasks!