
# Read from here:

### https://pytorch.org/tutorials/beginner/nlp/word_embeddings_tutorial.html

What we really want is a notion of similarity between words.

It is a technique to combat the sparsity of linguistic data, by connecting the dots between what we have seen and what we haven’t. This should be captured by the deep network themselves and not be designed by the programmer 

SO WHY NOT LET THE WORD EMBEDDINGS BE THE PARAMETERS IN OUR MODEL WHICH ARE THEN UPDATED DURING TRAINING 

Word embeddings will probably not be interpretable. 

# Word Embeddings : Encoding Lexical Semantics


In [1]:
"""
word embedding in python

we need to define an index for each word 
when using embeddings. These will be the keys
into a lookup table.

Thus the embeddings will be stored as |V| X D matrix
where D is the dimensionality of the embeddings 

word i --> ith row of the matrix 

the mapping from words to indices is a dictionary
named to word_to_idx

"""

# Author: Robert Guthrie

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)

word_to_ix = {"hello": 0, "world": 1}

print(word_to_ix)

embeds = nn.Embedding(2, 5)  # 2 words in vocab, 5 dimensional embeddings

# lookup_tensor returns the index 
# and converts it into long tensor
lookup_tensor = torch.tensor([word_to_ix["hello"],word_to_ix["world"]], dtype=torch.long)

# returns the row of the embedding vector
hello_embed = embeds(lookup_tensor)

print(hello_embed)
print(hello_embed.shape)

{'hello': 0, 'world': 1}
tensor([[ 0.6614,  0.2669,  0.0617,  0.6213, -0.4519],
        [-0.1661, -1.5228,  0.3817, -1.0276, -0.5631]],
       grad_fn=<EmbeddingBackward>)
torch.Size([2, 5])


In [2]:
"""
let us have a short description of the embedding layer

the input are (num_embeddings, embedding_dim, 
               padding_idx=None, max_norm=None, 
               norm_type=2, scale_grad_by_freq=False, 
               sparse=False, _weight=None)

read it here 
https://pytorch.org/docs/stable/nn.html#torch.nn.Embedding

num_embeddings = size of dictionary 
embedding_dim = dim of embedding

ALSO YOU CAN LOAD AN EMBEDDING
https://pytorch.org/docs/stable/nn.html#torch.nn.Embedding.from_pretrained

"""
print(embeds)

Embedding(2, 5)


# N-Gram Language Modeling

Here given a sequence of words w we want to compute  P( w_i | w_(i-1), ....., w_(i-n+1) ) where w_i is the ith word of the sequence.
<br/>
We compute the loss function on some training examples  and then update the parameters with backpropagation.
<br/>

In [3]:
CONTEXT_SIZE = 2
EMBEDDING_DIM = 10

# We will use Shakespeare Sonnet 2

test_sentence = """When forty winters shall besiege thy brow,
And dig deep trenches in thy beauty's field,
Thy youth's proud livery so gazed on now,
Will be a totter'd weed of small worth held:
Then being asked, where all thy beauty lies,
Where all the treasure of thy lusty days;
To say, within thine own deep sunken eyes,
Were an all-eating shame, and thriftless praise.
How much more praise deserv'd thy beauty's use,
If thou couldst answer 'This fair child of mine
Shall sum my count, and make my old excuse,'
Proving his beauty by succession thine!
This were to be new made when thou art old,
And see thy blood warm when thou feel'st it cold.""".split()

# we should tokenize the input, but we will ignore that for now
# build a list of tuples. 
# Each tuple is ([ word_i-2, word_i-1 ], target word)
# THIS IS WHAT IS MEANT BY CONTEXT 
# WE WILL CONSIDER ONLY TWO PREVIOUS WORDS HENCE CONTEXT SIZE = 2

trigrams = [([test_sentence[i], test_sentence[i + 1]], test_sentence[i + 2])
            for i in range(len(test_sentence) - 2)]

# print the first 3, just so you can see what they look like
print(trigrams[:3])

# get the unique words 
vocab = set(test_sentence)

word_to_ix = {word: i for i, word in enumerate(vocab)}

[(['When', 'forty'], 'winters'), (['forty', 'winters'], 'shall'), (['winters', 'shall'], 'besiege')]


In [4]:
# total no of words 
print(len(test_sentence))

# total no of words in voc
print(len(set(test_sentence)))

print(type(trigrams))
print("length of trigrams: {}".format(len(trigrams)))

for i in range(len(trigrams)):
    print(trigrams[i])
    if i+1==3:
        break

print('the length of vocab')
print(len(vocab))

"""
for i, word in enumerate(vocab):
    print(i, word)
"""

print(word_to_ix)

115
97
<class 'list'>
length of trigrams: 113
(['When', 'forty'], 'winters')
(['forty', 'winters'], 'shall')
(['winters', 'shall'], 'besiege')
the length of vocab
97
{"'This": 52, 'all-eating': 32, 'winters': 53, 'say,': 54, 'small': 0, 'How': 55, 'deep': 1, 'sum': 56, 'praise.': 2, 'an': 57, 'shame,': 3, "feel'st": 4, 'fair': 58, 'worth': 60, 'where': 5, 'old': 6, 'blood': 61, 'being': 8, 'forty': 24, 'a': 9, 'use,': 63, 'beauty': 10, 'answer': 62, 'on': 7, 'count,': 25, 'shall': 11, 'thine!': 64, 'livery': 12, 'Then': 13, 'brow,': 14, 'mine': 15, 'art': 26, 'his': 66, 'cold.': 16, 'within': 67, 'Where': 48, 'praise': 68, 'Thy': 18, 'To': 19, 'days;': 70, 'lies,': 20, 'were': 71, 'When': 72, 'made': 89, 'Were': 21, 'weed': 73, 'be': 69, 'warm': 74, 'by': 75, 'see': 76, 'Proving': 22, "totter'd": 23, "deserv'd": 77, 'my': 78, 'This': 79, 'thriftless': 27, 'dig': 80, 'couldst': 81, 'thy': 28, 'it': 29, 'and': 30, 'besiege': 31, 'much': 33, 'proud': 34, 'thine': 35, 'sunken': 36, 'thou':

In [5]:
"""
a little explanation
embeddings layer converts the inputs into the vector
which is then passed through --> linear1 --> relu 
--> linear2 --> log_softmax

observe that the linear1 layer is like (2*embedding_dim) -->(128)
which basically means it will take the word embeddings of two words 
as input together and then convert to a 128 dim vector.

In this example the context size is two words, that is used to 
predict the third word. 
"""

class NGramLanguageModeler(nn.Module):

    def __init__(self, vocab_size, embedding_dim, context_size):
        super(NGramLanguageModeler, self).__init__()
        self.embeddings = nn.Embedding(vocab_size, embedding_dim)
        print("self.embeddings.shape Embedding(vocab size,dimension):",self.embeddings)
        self.linear1 = nn.Linear(context_size * embedding_dim, 128)
        print("self.linear1.shape: ",self.linear1)
        self.linear2 = nn.Linear(128, vocab_size)
        print("self.linear2.shape :",self.linear2)
        
    def forward(self, inputs):
        dummy_out = self.embeddings(inputs)
        print("dummy_out shape: ",dummy_out)
        embeds = self.embeddings(inputs).view((1, -1))
        print("enmbed")
        out = F.relu(self.linear1(embeds))
        out = self.linear2(out)
        log_probs = F.log_softmax(out, dim=1)
        return log_probs, dummy_out


In [6]:
losses = []
loss_function = nn.NLLLoss()

model = NGramLanguageModeler(
    vocab_size=len(vocab),
    embedding_dim= EMBEDDING_DIM,
    context_size= CONTEXT_SIZE)

optimizer = optim.SGD(model.parameters(), lr=0.001)

self.embeddings.shape Embedding(vocab size,dimension): Embedding(97, 10)
self.linear1.shape:  Linear(in_features=20, out_features=128, bias=True)
self.linear2.shape : Linear(in_features=128, out_features=97, bias=True)


In [7]:
print(model)

for params in model.named_parameters():
    print(params[0])

NGramLanguageModeler(
  (embeddings): Embedding(97, 10)
  (linear1): Linear(in_features=20, out_features=128, bias=True)
  (linear2): Linear(in_features=128, out_features=97, bias=True)
)
embeddings.weight
linear1.weight
linear1.bias
linear2.weight
linear2.bias


In [9]:
import pdb

for epoch in range(10):
    
    total_loss = 0
    
    for context, target in trigrams:

        # Step 1. Prepare the inputs to be passed to the model 
        # (i.e, turn the words into integer indices and wrap 
        # them in tensors)
        print(context, target)
        
        context_idxs = torch.tensor([word_to_ix[w] for w in context], 
                                    dtype=torch.long)
        
        # as context size is "2"
        print(context_idxs.shape)
        
        # Step 2. Recall that torch *accumulates* gradients. Before passing in a
        # new instance, you need to zero out the gradients from the old
        # instance
        model.zero_grad()

        # Step 3. Run the forward pass, getting log probabilities over next
        # words
        
        """
        let us study this part a bit more 
        dummy_output is a torch of tensor size 
        context X embedding_dim
        
        which is converted into a torch tensor of shape
        1 X (2 X embedding_dim)
        which goes to the linear1 layer
        """
        _, dummy_output = model(context_idxs)
        print("dummy_output shape: {}".format(dummy_output.shape))
        #pdb.set_trace()
        dummy_output = dummy_output.view((1, -1))
        print("dummy_output shape: {}".format(dummy_output.shape))
        #pdb.set_trace()
        
        log_probs, _ = model(context_idxs)

        # Step 4. Compute your loss function. (Again, Torch wants the target
        # word wrapped in a tensor)
        loss = loss_function(log_probs, torch.tensor(
            [word_to_ix[target]], dtype=torch.long))

        # Step 5. Do the backward pass and update the gradient
        loss.backward()
        optimizer.step()

        # Get the Python number from a 1-element Tensor by calling tensor.item()
        total_loss += loss.item()
        
    losses.append(total_loss)
    
print(losses)  # The loss decreased every iteration over the training data!


['When', 'forty'] winters
torch.Size([2])
dummy_out shape:  tensor([[ 0.0713, -0.5038,  0.8035,  1.1834,  0.0237,  1.1120, -0.9464,  1.6344,
         -0.3461,  0.2707],
        [ 2.4143,  1.0206, -0.4405, -1.7342, -1.0256,  0.5212, -0.4530, -0.1260,
         -0.5882,  2.1189]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[ 0.0713, -0.5038,  0.8035,  1.1834,  0.0237,  1.1120, -0.9464,  1.6344,
         -0.3461,  0.2707],
        [ 2.4143,  1.0206, -0.4405, -1.7342, -1.0256,  0.5212, -0.4530, -0.1260,
         -0.5882,  2.1189]], grad_fn=<EmbeddingBackward>)
enmbed
['forty', 'winters'] shall
torch.Size([2])
dummy_out shape:  tensor([[ 2.4143,  1.0207, -0.4405, -1.7342, -1.0256,  0.5212, -0.4529, -0.1261,
         -0.5883,  2.1189],
        [ 0.4049,  0.3415, -0.3499, -0.2344, -0.5452,  0.2751, -0.5039,  1.7359,
          1.1365,  0.7353]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output sha

dummy_out shape:  tensor([[ 1.6734,  0.0103,  0.9837,  0.8793, -1.4504, -1.1802, -0.4611, -0.5601,
          0.3955, -0.9823],
        [-0.0369,  0.1820, -1.2673, -0.5943,  0.4527,  0.3155, -0.6901, -0.2829,
          0.0991,  0.4938]], grad_fn=<EmbeddingBackward>)
enmbed
['all', 'thy'] beauty
torch.Size([2])
dummy_out shape:  tensor([[-0.0369,  0.1820, -1.2673, -0.5942,  0.4527,  0.3155, -0.6901, -0.2829,
          0.0990,  0.4938],
        [ 0.7877,  1.3686, -0.8507,  0.5129,  0.3330,  1.1078,  0.1858, -0.2766,
         -0.6116,  0.8160]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[-0.0369,  0.1820, -1.2673, -0.5942,  0.4527,  0.3155, -0.6901, -0.2829,
          0.0990,  0.4938],
        [ 0.7877,  1.3686, -0.8507,  0.5129,  0.3330,  1.1078,  0.1858, -0.2766,
         -0.6116,  0.8160]], grad_fn=<EmbeddingBackward>)
enmbed
['thy', 'beauty'] lies,
torch.Size([2])
dummy_out shape:  tenso

enmbed
['succession', 'thine!'] This
torch.Size([2])
dummy_out shape:  tensor([[ 1.3014, -0.6292, -0.0683,  0.6946, -0.2890,  0.1522, -0.2981, -0.1313,
          1.5366,  0.9193],
        [ 0.0835,  1.7074, -0.2076, -0.2720,  1.8782, -0.7956, -2.4326, -0.2496,
          0.0191,  1.6940]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[ 1.3014, -0.6292, -0.0683,  0.6946, -0.2890,  0.1522, -0.2981, -0.1313,
          1.5366,  0.9193],
        [ 0.0835,  1.7074, -0.2076, -0.2720,  1.8782, -0.7956, -2.4326, -0.2496,
          0.0191,  1.6940]], grad_fn=<EmbeddingBackward>)
enmbed
['thine!', 'This'] were
torch.Size([2])
dummy_out shape:  tensor([[ 0.0834,  1.7074, -0.2077, -0.2720,  1.8782, -0.7955, -2.4325, -0.2496,
          0.0190,  1.6940],
        [-0.1924, -0.6859, -0.7391,  0.2280,  0.9885, -2.1413, -0.2243, -0.0657,
         -1.2895,  0.0591]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_ou

enmbed
['weed', 'of'] small
torch.Size([2])
dummy_out shape:  tensor([[-1.3577, -0.9612, -0.3543,  1.3506,  1.8432, -1.2496, -1.4726, -1.4528,
          0.2564, -0.7529],
        [-0.0158,  1.0397,  0.6431,  0.8007,  0.4090,  0.3017,  1.3822, -1.0322,
          0.7140, -1.5631]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[-1.3577, -0.9612, -0.3543,  1.3506,  1.8432, -1.2496, -1.4726, -1.4528,
          0.2564, -0.7529],
        [-0.0158,  1.0397,  0.6431,  0.8007,  0.4090,  0.3017,  1.3822, -1.0322,
          0.7140, -1.5631]], grad_fn=<EmbeddingBackward>)
enmbed
['of', 'small'] worth
torch.Size([2])
dummy_out shape:  tensor([[-0.0158,  1.0397,  0.6431,  0.8007,  0.4089,  0.3017,  1.3822, -1.0322,
          0.7139, -1.5632],
        [ 0.8657,  0.2444, -0.6629,  0.8073,  0.4392,  1.1711,  1.7675, -0.0953,
          0.0612, -0.6177]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape:

dummy_out shape:  tensor([[ 0.3204,  0.6857, -0.6079, -0.5596,  1.5963, -0.0352,  0.1927,  0.9448,
         -0.0365,  0.9031],
        [-1.5737,  0.9009,  1.0500,  0.4294,  1.7926, -1.1639,  0.4748, -0.5091,
          0.6636, -0.3969]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[ 0.3204,  0.6857, -0.6079, -0.5596,  1.5963, -0.0352,  0.1927,  0.9448,
         -0.0365,  0.9031],
        [-1.5737,  0.9009,  1.0500,  0.4294,  1.7926, -1.1639,  0.4748, -0.5091,
          0.6636, -0.3969]], grad_fn=<EmbeddingBackward>)
enmbed
["'This", 'fair'] child
torch.Size([2])
dummy_out shape:  tensor([[-1.5737,  0.9008,  1.0500,  0.4294,  1.7927, -1.1639,  0.4748, -0.5090,
          0.6637, -0.3970],
        [ 0.4878, -0.3086, -3.0138, -1.2482,  1.3489,  0.2684, -1.1278, -0.5994,
          1.8372, -1.0715]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: t

enmbed
['thy', "beauty's"] field,
torch.Size([2])
dummy_out shape:  tensor([[ 0.7874,  1.3686, -0.8505,  0.5136,  0.3329,  1.1079,  0.1860, -0.2774,
         -0.6117,  0.8159],
        [-0.3418,  0.9475,  0.6223, -0.4479, -0.2855,  0.3884,  0.5150, -1.8474,
         -2.9166, -0.5672]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[ 0.7874,  1.3686, -0.8505,  0.5136,  0.3329,  1.1079,  0.1860, -0.2774,
         -0.6117,  0.8159],
        [-0.3418,  0.9475,  0.6223, -0.4479, -0.2855,  0.3884,  0.5150, -1.8474,
         -2.9166, -0.5672]], grad_fn=<EmbeddingBackward>)
enmbed
["beauty's", 'field,'] Thy
torch.Size([2])
dummy_out shape:  tensor([[-0.3418,  0.9475,  0.6223, -0.4480, -0.2854,  0.3884,  0.5149, -1.8474,
         -2.9166, -0.5671],
        [ 0.4195,  2.2523, -0.0801, -1.3489, -0.8263,  0.2648, -1.6053, -0.1064,
          0.2466,  0.6125]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_ou

dummy_out shape:  tensor([[ 0.7228,  0.1525,  0.1451, -2.3442, -0.4618, -0.3686,  0.3682,  0.4850,
          0.1990,  0.5443],
        [ 0.2468,  1.1841,  0.7626,  0.4417,  1.1651,  2.0155,  0.2151, -0.5242,
         -1.8034, -1.3082]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[ 0.7228,  0.1525,  0.1451, -2.3442, -0.4618, -0.3686,  0.3682,  0.4850,
          0.1990,  0.5443],
        [ 0.2468,  1.1841,  0.7626,  0.4417,  1.1651,  2.0155,  0.2151, -0.5242,
         -1.8034, -1.3082]], grad_fn=<EmbeddingBackward>)
enmbed
['shame,', 'and'] thriftless
torch.Size([2])
dummy_out shape:  tensor([[ 0.2468,  1.1840,  0.7625,  0.4418,  1.1651,  2.0155,  0.2151, -0.5243,
         -1.8033, -1.3082],
        [ 0.1438,  0.2032,  1.0539,  0.1317,  1.4021,  0.4842, -0.7031, -0.8267,
          0.1008,  0.1954]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output sha

dummy_out shape:  tensor([[ 0.2993, -1.0241,  1.0269, -0.1923,  0.8751,  0.6852,  0.2581,  0.5818,
          1.3421, -0.6656],
        [ 0.7873,  1.3686, -0.8504,  0.5137,  0.3327,  1.1079,  0.1859, -0.2776,
         -0.6117,  0.8160]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[ 0.2993, -1.0241,  1.0269, -0.1923,  0.8751,  0.6852,  0.2581,  0.5818,
          1.3421, -0.6656],
        [ 0.7873,  1.3686, -0.8504,  0.5137,  0.3327,  1.1079,  0.1859, -0.2776,
         -0.6117,  0.8160]], grad_fn=<EmbeddingBackward>)
enmbed
['thy', 'blood'] warm
torch.Size([2])
dummy_out shape:  tensor([[ 0.7874,  1.3685, -0.8504,  0.5137,  0.3327,  1.1079,  0.1859, -0.2776,
         -0.6117,  0.8159],
        [ 2.0839,  0.2846, -1.1645, -0.4376,  0.5812,  0.8396,  0.0216,  2.1848,
          0.3705,  0.8323]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: tor

['treasure', 'of'] thy
torch.Size([2])
dummy_out shape:  tensor([[-0.4475,  0.4590,  0.7377, -1.1817,  0.2705, -0.4744,  0.5777,  0.1758,
          1.0470, -1.5918],
        [-0.0161,  1.0401,  0.6429,  0.8013,  0.4085,  0.3014,  1.3825, -1.0325,
          0.7141, -1.5630]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[-0.4475,  0.4590,  0.7377, -1.1817,  0.2705, -0.4744,  0.5777,  0.1758,
          1.0470, -1.5918],
        [-0.0161,  1.0401,  0.6429,  0.8013,  0.4085,  0.3014,  1.3825, -1.0325,
          0.7141, -1.5630]], grad_fn=<EmbeddingBackward>)
enmbed
['of', 'thy'] lusty
torch.Size([2])
dummy_out shape:  tensor([[-0.0162,  1.0401,  0.6429,  0.8014,  0.4085,  0.3014,  1.3825, -1.0325,
          0.7141, -1.5630],
        [ 0.7873,  1.3687, -0.8503,  0.5140,  0.3326,  1.1079,  0.1860, -0.2779,
         -0.6119,  0.8159]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.

dummy_out shape:  tensor([[ 1.3851, -0.1823, -0.0221, -1.5083, -2.7275, -0.8593,  0.9696,  0.2922,
          0.5750,  0.4776],
        [ 1.3011, -0.6292, -0.0686,  0.6947, -0.2890,  0.1524, -0.2983, -0.1312,
          1.5370,  0.9192]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[ 1.3851, -0.1823, -0.0221, -1.5083, -2.7275, -0.8593,  0.9696,  0.2922,
          0.5750,  0.4776],
        [ 1.3011, -0.6292, -0.0686,  0.6947, -0.2890,  0.1524, -0.2983, -0.1312,
          1.5370,  0.9192]], grad_fn=<EmbeddingBackward>)
enmbed
['succession', 'thine!'] This
torch.Size([2])
dummy_out shape:  tensor([[ 1.3010, -0.6291, -0.0686,  0.6948, -0.2890,  0.1524, -0.2983, -0.1312,
          1.5370,  0.9192],
        [ 0.0832,  1.7073, -0.2080, -0.2721,  1.8784, -0.7954, -2.4322, -0.2496,
          0.0189,  1.6940]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output sh

dummy_out shape:  tensor([[ 0.5517, -1.5469,  0.7575, -0.4068, -0.1281,  0.2803,  1.7462,  1.8550,
         -0.7066,  2.5573],
        [ 1.3852, -0.8135, -0.9270,  1.1117,  0.6158,  0.1936, -2.5833,  0.8536,
         -2.1020, -0.6197]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[ 0.5517, -1.5469,  0.7575, -0.4068, -0.1281,  0.2803,  1.7462,  1.8550,
         -0.7066,  2.5573],
        [ 1.3852, -0.8135, -0.9270,  1.1117,  0.6158,  0.1936, -2.5833,  0.8536,
         -2.1020, -0.6197]], grad_fn=<EmbeddingBackward>)
enmbed
['being', 'asked,'] where
torch.Size([2])
dummy_out shape:  tensor([[ 1.3852, -0.8134, -0.9269,  1.1117,  0.6159,  0.1936, -2.5834,  0.8535,
         -2.1019, -0.6197],
        [-1.2944, -0.1281,  1.0664, -2.3534,  1.1657,  0.1703,  0.5288,  2.8889,
          0.1602, -0.3641]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape:

enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[-0.8256, -0.4092, -2.3699, -0.0216, -0.3098, -1.0479, -0.9337,  0.2449,
          0.1794,  0.7636],
        [-0.5417, -2.4597, -0.9501, -0.3095,  1.6635,  0.5048, -0.1964, -0.0338,
          0.7190,  1.0646]], grad_fn=<EmbeddingBackward>)
enmbed
['count,', 'and'] make
torch.Size([2])
dummy_out shape:  tensor([[-0.5417, -2.4599, -0.9501, -0.3094,  1.6636,  0.5048, -0.1963, -0.0338,
          0.7189,  1.0646],
        [ 0.1441,  0.2033,  1.0537,  0.1318,  1.4018,  0.4844, -0.7032, -0.8267,
          0.1003,  0.1957]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[-0.5417, -2.4599, -0.9501, -0.3094,  1.6636,  0.5048, -0.1963, -0.0338,
          0.7189,  1.0646],
        [ 0.1441,  0.2033,  1.0537,  0.1318,  1.4018,  0.4844, -0.7032, -0.8267,
          0.1003,  0.1957]], grad_fn=<Em

dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[-1.3865, -0.1536, -1.1613,  2.7768,  0.4534,  1.0132, -0.0853, -1.2189,
         -1.7903,  0.6998],
        [-0.3412, -0.3004, -1.0480, -0.4708,  0.2916,  1.9909, -0.9247, -0.9298,
          1.4303,  0.4202]], grad_fn=<EmbeddingBackward>)
enmbed
['on', 'now,'] Will
torch.Size([2])
dummy_out shape:  tensor([[-0.3413, -0.3004, -1.0480, -0.4709,  0.2916,  1.9909, -0.9247, -0.9297,
          1.4303,  0.4202],
        [ 0.0819,  1.0833, -0.7217,  0.6989, -0.6599,  0.5682,  0.8871, -0.8953,
          0.1356,  0.6582]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[-0.3413, -0.3004, -1.0480, -0.4709,  0.2916,  1.9909, -0.9247, -0.9297,
          1.4303,  0.4202],
        [ 0.0819,  1.0833, -0.7217,  0.6989, -0.6599,  0.5682,  0.8871, -0.8953,
          0.1356,  0.6582]], grad_fn=<EmbeddingBackward>)
enmbed
['now,', 'Will'] be
torch

dummy_out shape:  tensor([[ 1.2566,  0.5016,  0.0383,  0.4653, -3.3784,  0.9738, -0.6179, -0.6368,
         -0.4240, -2.0270],
        [-0.6602, -1.1968, -0.3025,  0.0819, -0.9749,  0.7979, -0.8462,  0.1581,
          0.8565, -2.0901]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[ 1.2566,  0.5016,  0.0383,  0.4653, -3.3784,  0.9738, -0.6179, -0.6368,
         -0.4240, -2.0270],
        [-0.6602, -1.1968, -0.3025,  0.0819, -0.9749,  0.7979, -0.8462,  0.1581,
          0.8565, -2.0901]], grad_fn=<EmbeddingBackward>)
enmbed
['couldst', 'answer'] 'This
torch.Size([2])
dummy_out shape:  tensor([[-0.6602, -1.1968, -0.3025,  0.0819, -0.9749,  0.7979, -0.8462,  0.1581,
          0.8565, -2.0901],
        [ 0.3202,  0.6858, -0.6080, -0.5596,  1.5962, -0.0354,  0.1925,  0.9449,
         -0.0364,  0.9032]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shap

dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[ 0.7706, -1.0739, -0.2001, -0.5602, -0.6239, -0.9780,  0.8740,  0.9873,
          0.2515, -0.7932],
        [-0.2465,  0.4830,  2.7889,  1.6333, -0.6991, -0.4709, -1.5115,  0.1906,
          0.2044,  0.1635]], grad_fn=<EmbeddingBackward>)
enmbed
['And', 'dig'] deep
torch.Size([2])
dummy_out shape:  tensor([[-0.2464,  0.4831,  2.7889,  1.6332, -0.6990, -0.4710, -1.5114,  0.1906,
          0.2045,  0.1635],
        [ 1.2856, -1.2269, -0.3721,  1.0827, -1.2417,  0.8126,  0.0906, -0.5701,
          1.7715,  1.7482]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[-0.2464,  0.4831,  2.7889,  1.6332, -0.6990, -0.4710, -1.5114,  0.1906,
          0.2045,  0.1635],
        [ 1.2856, -1.2269, -0.3721,  1.0827, -1.2417,  0.8126,  0.0906, -0.5701,
          1.7715,  1.7482]], grad_fn=<EmbeddingBac

enmbed
['eyes,', 'Were'] an
torch.Size([2])
dummy_out shape:  tensor([[ 0.3502,  0.9341,  0.0840,  0.2035, -0.2598, -1.3593, -1.2137,  0.9346,
         -1.3630, -0.5508],
        [ 0.3912,  0.3872,  2.6414, -0.9623, -0.2074, -1.3888,  1.7234, -2.3648,
         -0.9297,  0.2933]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[ 0.3502,  0.9341,  0.0840,  0.2035, -0.2598, -1.3593, -1.2137,  0.9346,
         -1.3630, -0.5508],
        [ 0.3912,  0.3872,  2.6414, -0.9623, -0.2074, -1.3888,  1.7234, -2.3648,
         -0.9297,  0.2933]], grad_fn=<EmbeddingBackward>)
enmbed
['Were', 'an'] all-eating
torch.Size([2])
dummy_out shape:  tensor([[ 0.3913,  0.3872,  2.6414, -0.9623, -0.2075, -1.3888,  1.7233, -2.3648,
         -0.9297,  0.2934],
        [ 0.2193, -1.9545, -1.4054, -0.4881,  0.4708, -0.2067,  0.1375,  0.9243,
          0.2948,  3.0297]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output sh

dummy_out shape:  tensor([[-0.0071,  2.6158, -0.0642,  0.2616, -0.3604, -1.2617,  1.9149, -1.8604,
          0.4745,  0.9990],
        [-0.2465,  0.4831,  2.7889,  1.6333, -0.6990, -0.4710, -1.5115,  0.1907,
          0.2044,  0.1635]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[-0.0071,  2.6158, -0.0642,  0.2616, -0.3604, -1.2617,  1.9149, -1.8604,
          0.4745,  0.9990],
        [-0.2465,  0.4831,  2.7889,  1.6333, -0.6990, -0.4710, -1.5115,  0.1907,
          0.2044,  0.1635]], grad_fn=<EmbeddingBackward>)
enmbed
['And', 'see'] thy
torch.Size([2])
dummy_out shape:  tensor([[-0.2465,  0.4831,  2.7888,  1.6332, -0.6990, -0.4710, -1.5115,  0.1907,
          0.2044,  0.1635],
        [ 0.2989, -1.0244,  1.0268, -0.1923,  0.8748,  0.6851,  0.2582,  0.5817,
          1.3422, -0.6655]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.

enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[-1.2695,  0.3290,  0.3711, -1.0255, -2.2675,  0.0932,  0.4823,  0.0827,
          0.9533, -0.6841],
        [-1.7282, -0.6895, -0.4405,  0.3490, -1.5211,  0.9764,  0.3900,  0.0269,
          0.1161,  0.6113]], grad_fn=<EmbeddingBackward>)
enmbed
['days;', 'To'] say,
torch.Size([2])
dummy_out shape:  tensor([[-1.7282, -0.6895, -0.4405,  0.3490, -1.5210,  0.9763,  0.3899,  0.0268,
          0.1161,  0.6113],
        [-1.1047, -0.6999,  0.2357,  1.9147,  1.8357,  1.3258, -0.0703,  0.3469,
         -0.6532,  1.5589]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[-1.7282, -0.6895, -0.4405,  0.3490, -1.5210,  0.9763,  0.3899,  0.0268,
          0.1161,  0.6113],
        [-1.1047, -0.6999,  0.2357,  1.9147,  1.8357,  1.3258, -0.0703,  0.3469,
         -0.6532,  1.5589]], grad_fn=<Embe

dummy_out shape:  tensor([[-0.4045,  0.9925, -0.8503, -0.5908,  0.9197, -1.7056,  0.6666,  1.6036,
         -1.8168,  0.8746],
        [-0.9319,  0.3064,  1.4978,  0.2784, -0.5355,  1.1389,  1.3487, -0.4646,
         -0.3735,  0.9706]], grad_fn=<EmbeddingBackward>)
enmbed
['be', 'new'] made
torch.Size([2])
dummy_out shape:  tensor([[-0.9320,  0.3065,  1.4978,  0.2784, -0.5355,  1.1390,  1.3487, -0.4647,
         -0.3734,  0.9706],
        [-2.6185,  2.2546, -1.1083, -1.8872,  0.7255,  0.6790, -0.4298, -0.3852,
          0.8563, -0.0031]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[-0.9320,  0.3065,  1.4978,  0.2784, -0.5355,  1.1390,  1.3487, -0.4647,
         -0.3734,  0.9706],
        [-2.6185,  2.2546, -1.1083, -1.8872,  0.7255,  0.6790, -0.4298, -0.3852,
          0.8563, -0.0031]], grad_fn=<EmbeddingBackward>)
enmbed
['new', 'made'] when
torch.Size([2])
dummy_out shape:  tensor([[-2

dummy_out shape:  tensor([[ 1.3852, -0.8131, -0.9264,  1.1114,  0.6161,  0.1934, -2.5836,  0.8531,
         -2.1020, -0.6194],
        [-1.2943, -0.1277,  1.0670, -2.3535,  1.1656,  0.1705,  0.5288,  2.8888,
          0.1603, -0.3644]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[ 1.3852, -0.8131, -0.9264,  1.1114,  0.6161,  0.1934, -2.5836,  0.8531,
         -2.1020, -0.6194],
        [-1.2943, -0.1277,  1.0670, -2.3535,  1.1656,  0.1705,  0.5288,  2.8888,
          0.1603, -0.3644]], grad_fn=<EmbeddingBackward>)
enmbed
['asked,', 'where'] all
torch.Size([2])
dummy_out shape:  tensor([[-1.2942, -0.1276,  1.0671, -2.3535,  1.1656,  0.1705,  0.5288,  2.8888,
          0.1602, -0.3644],
        [ 1.6736,  0.0109,  0.9844,  0.8794, -1.4503, -1.1800, -0.4620, -0.5609,
          0.3950, -0.9824]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: t

dummy_out shape:  tensor([[-0.5413, -2.4604, -0.9501, -0.3093,  1.6638,  0.5046, -0.1956, -0.0342,
          0.7186,  1.0647],
        [ 0.1445,  0.2036,  1.0535,  0.1318,  1.4015,  0.4846, -0.7033, -0.8267,
          0.0996,  0.1960]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[-0.5413, -2.4604, -0.9501, -0.3093,  1.6638,  0.5046, -0.1956, -0.0342,
          0.7186,  1.0647],
        [ 0.1445,  0.2036,  1.0535,  0.1318,  1.4015,  0.4846, -0.7033, -0.8267,
          0.0996,  0.1960]], grad_fn=<EmbeddingBackward>)
enmbed
['and', 'make'] my
torch.Size([2])
dummy_out shape:  tensor([[ 0.1445,  0.2036,  1.0535,  0.1318,  1.4014,  0.4845, -0.7033, -0.8267,
          0.0996,  0.1960],
        [-0.3766, -2.4102, -1.2783, -0.0628, -1.2311, -1.0660, -0.3877,  0.3785,
         -0.2084,  0.6930]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.

enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[-1.3867, -0.1537, -1.1612,  2.7768,  0.4535,  1.0132, -0.0855, -1.2189,
         -1.7904,  0.6999],
        [-0.3410, -0.3005, -1.0478, -0.4707,  0.2920,  1.9911, -0.9247, -0.9295,
          1.4304,  0.4198]], grad_fn=<EmbeddingBackward>)
enmbed
['on', 'now,'] Will
torch.Size([2])
dummy_out shape:  tensor([[-0.3410, -0.3005, -1.0478, -0.4708,  0.2920,  1.9911, -0.9247, -0.9295,
          1.4304,  0.4197],
        [ 0.0823,  1.0835, -0.7221,  0.6985, -0.6594,  0.5679,  0.8875, -0.8952,
          0.1353,  0.6585]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[-0.3410, -0.3005, -1.0478, -0.4708,  0.2920,  1.9911, -0.9247, -0.9295,
          1.4304,  0.4197],
        [ 0.0823,  1.0835, -0.7221,  0.6985, -0.6594,  0.5679,  0.8875, -0.8952,
          0.1353,  0.6585]], grad_fn=<Embed

enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[ 0.0365,  0.5601,  0.1953, -0.9501,  0.1300,  0.8622, -0.3128,  0.0259,
         -0.3927,  0.3110],
        [ 1.2569,  0.5030,  0.0370,  0.4657, -3.3782,  0.9743, -0.6170, -0.6373,
         -0.4240, -2.0269]], grad_fn=<EmbeddingBackward>)
enmbed
['thou', 'couldst'] answer
torch.Size([2])
dummy_out shape:  tensor([[ 1.2569,  0.5030,  0.0369,  0.4657, -3.3782,  0.9743, -0.6170, -0.6373,
         -0.4240, -2.0270],
        [-0.6606, -1.1974, -0.3027,  0.0825, -0.9749,  0.7977, -0.8463,  0.1576,
          0.8567, -2.0904]], grad_fn=<EmbeddingBackward>)
enmbed
dummy_output shape: torch.Size([2, 10])
dummy_output shape: torch.Size([1, 20])
dummy_out shape:  tensor([[ 1.2569,  0.5030,  0.0369,  0.4657, -3.3782,  0.9743, -0.6170, -0.6373,
         -0.4240, -2.0270],
        [-0.6606, -1.1974, -0.3027,  0.0825, -0.9749,  0.7977, -0.8463,  0.1576,
          0.8567, -2.0904]], grad_fn

# Computing Word Embeddins: Continuous Bag-of-Words

In [13]:

CONTEXT_SIZE = 2  # 2 words to the left, 2 to the right
raw_text = """We are about to study the idea of a computational process.
Computational processes are abstract beings that inhabit computers.
As they evolve, processes manipulate other abstract things called data.
The evolution of a process is directed by a pattern of rules
called a program. People create programs to direct processes. In effect,
we conjure the spirits of the computer with our spells.""".split()

# By deriving a set from `raw_text`, we deduplicate the array
vocab = set(raw_text)
vocab_size = len(vocab)

word_to_ix = {word: i for i, word in enumerate(vocab)}


In [14]:

"""
collecting the data in terms of the context
"""

data = []
for i in range(2, len(raw_text) - 2):
    context = [raw_text[i - 2], raw_text[i - 1],
               raw_text[i + 1], raw_text[i + 2]]
    target = raw_text[i]
    data.append((context, target))
    
print(data[:5])

for i in range(len(data)):
    if i==5:
        break
    print(data[i])


[(['We', 'are', 'to', 'study'], 'about'), (['are', 'about', 'study', 'the'], 'to'), (['about', 'to', 'the', 'idea'], 'study'), (['to', 'study', 'idea', 'of'], 'the'), (['study', 'the', 'of', 'a'], 'idea')]
(['We', 'are', 'to', 'study'], 'about')
(['are', 'about', 'study', 'the'], 'to')
(['about', 'to', 'the', 'idea'], 'study')
(['to', 'study', 'idea', 'of'], 'the')
(['study', 'the', 'of', 'a'], 'idea')


In [15]:

class CBOW(nn.Module):

    def __init__(self, vocab_size, embedding_dim, context_size):
        super(CBOW, self).__init__()
        self.embeddings = nn.Embedding(vocab_size, embedding_dim)
        self.linear1 = nn.Linear(context_size * embedding_dim, 128)
        self.linear2 = nn.Linear(128, vocab_size)

    def forward(self, inputs):
        embeds = self.embeddings(inputs).view((1, -1))
        out = F.relu(self.linear1(embeds))
        out = self.linear2(out)
        log_probs = F.log_softmax(out, dim=1)
        return log_probs


In [17]:
losses = []
loss_function = nn.NLLLoss()
model = CBOW(len(vocab), EMBEDDING_DIM, 4)
optimizer = optim.SGD(model.parameters(), lr=0.001)

In [16]:

# create your model and train.  here are some functions to help you make
# the data ready for use by your module


def make_context_vector(context, word_to_ix):
    idxs = [word_to_ix[w] for w in context]
    return torch.tensor(idxs, dtype=torch.long)


print(data[0][0], data[0][1])

# data[0][0] is the input 
# data[0][1] is the target

make_context_vector(data[0][0], word_to_ix)  # example


['We', 'are', 'to', 'study'] about


tensor([11, 30, 10, 25])

In [19]:
for x,y in data:
    print(x,y)

['We', 'are', 'to', 'study'] about
['are', 'about', 'study', 'the'] to
['about', 'to', 'the', 'idea'] study
['to', 'study', 'idea', 'of'] the
['study', 'the', 'of', 'a'] idea
['the', 'idea', 'a', 'computational'] of
['idea', 'of', 'computational', 'process.'] a
['of', 'a', 'process.', 'Computational'] computational
['a', 'computational', 'Computational', 'processes'] process.
['computational', 'process.', 'processes', 'are'] Computational
['process.', 'Computational', 'are', 'abstract'] processes
['Computational', 'processes', 'abstract', 'beings'] are
['processes', 'are', 'beings', 'that'] abstract
['are', 'abstract', 'that', 'inhabit'] beings
['abstract', 'beings', 'inhabit', 'computers.'] that
['beings', 'that', 'computers.', 'As'] inhabit
['that', 'inhabit', 'As', 'they'] computers.
['inhabit', 'computers.', 'they', 'evolve,'] As
['computers.', 'As', 'evolve,', 'processes'] they
['As', 'they', 'processes', 'manipulate'] evolve,
['they', 'evolve,', 'manipulate', 'other'] processes
[

In [23]:
for epoch in range(10):
    total_loss = 0
    for context, target in data:
        context_idxs = torch.tensor([word_to_ix[w] for w in context], dtype=torch.long)
        model.zero_grad()
        log_probs = model(context_idxs)
        loss = loss_function(log_probs, torch.tensor([word_to_ix[target]], dtype=torch.long))
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
        
    losses.append(total_loss)
print(losses)
print(len(losses))

[227.88634824752808, 226.57188200950623, 225.26442575454712, 223.96313405036926, 222.66873049736023, 221.38079118728638, 220.09778118133545, 218.8197877407074, 217.54849791526794, 216.28117442131042, 215.01820874214172, 213.76100087165833, 212.50572800636292, 211.25394320487976, 210.0043339729309, 208.75656127929688, 207.5122447013855, 206.26957321166992, 205.02914881706238, 203.79075407981873, 202.5532431602478, 201.31683111190796, 200.08219170570374, 198.8463978767395, 197.61048579216003, 196.372220993042, 195.1356704235077, 193.8974690437317, 192.65877747535706, 191.4197211265564, 190.1784746646881, 188.9348702430725, 187.6915044784546, 186.4464874267578, 185.20132541656494, 183.95460867881775, 182.705335855484, 181.45462131500244, 180.19985055923462, 178.94440937042236]
40
