In [1]:
%matplotlib inline


Word Embeddings: Encoding Lexical Semantics
===========================================

Word embeddings are dense vectors of real numbers, one per word in your
vocabulary. In NLP, it is almost always the case that your features are
words! But how should you represent a word in a computer? You could
store its ascii character representation, but that only tells you what
the word *is*, it doesn't say much about what it *means* (you might be
able to derive its part of speech from its affixes, or properties from
its capitalization, but not much). Even more, in what sense could you
combine these representations? We often want dense outputs from our
neural networks, where the inputs are $|V|$ dimensional, where
$V$ is our vocabulary, but often the outputs are only a few
dimensional (if we are only predicting a handful of labels, for
instance). How do we get from a massive dimensional space to a smaller
dimensional space?

How about instead of ascii representations, we use a one-hot encoding?
That is, we represent the word $w$ by

\begin{align}\overbrace{\left[ 0, 0, \dots, 1, \dots, 0, 0 \right]}^\text{|V| elements}\end{align}

where the 1 is in a location unique to $w$. Any other word will
have a 1 in some other location, and a 0 everywhere else.

There is an enormous drawback to this representation, besides just how
huge it is. It basically treats all words as independent entities with
no relation to each other. What we really want is some notion of
*similarity* between words. Why? Let's see an example.

Suppose we are building a language model. Suppose we have seen the
sentences

* The mathematician ran to the store.
* The physicist ran to the store.
* The mathematician solved the open problem.

in our training data. Now suppose we get a new sentence never before
seen in our training data:

* The physicist solved the open problem.

Our language model might do OK on this sentence, but wouldn't it be much
better if we could use the following two facts:

* We have seen  mathematician and physicist in the same role in a sentence. Somehow they
  have a semantic relation.
* We have seen mathematician in the same role  in this new unseen sentence
  as we are now seeing physicist.

and then infer that physicist is actually a good fit in the new unseen
sentence? This is what we mean by a notion of similarity: we mean
*semantic similarity*, not simply having similar orthographic
representations. It is a technique to combat the sparsity of linguistic
data, by connecting the dots between what we have seen and what we
haven't. This example of course relies on a fundamental linguistic
assumption: that words appearing in similar contexts are related to each
other semantically. This is called the `distributional
hypothesis <https://en.wikipedia.org/wiki/Distributional_semantics>`__.


Getting Dense Word Embeddings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

How can we solve this problem? That is, how could we actually encode
semantic similarity in words? Maybe we think up some semantic
attributes. For example, we see that both mathematicians and physicists
can run, so maybe we give these words a high score for the "is able to
run" semantic attribute. Think of some other attributes, and imagine
what you might score some common words on those attributes.

If each attribute is a dimension, then we might give each word a vector,
like this:

\begin{align}q_\text{mathematician} = \left[ \overbrace{2.3}^\text{can run},
   \overbrace{9.4}^\text{likes coffee}, \overbrace{-5.5}^\text{majored in Physics}, \dots \right]\end{align}

\begin{align}q_\text{physicist} = \left[ \overbrace{2.5}^\text{can run},
   \overbrace{9.1}^\text{likes coffee}, \overbrace{6.4}^\text{majored in Physics}, \dots \right]\end{align}

Then we can get a measure of similarity between these words by doing:

\begin{align}\text{Similarity}(\text{physicist}, \text{mathematician}) = q_\text{physicist} \cdot q_\text{mathematician}\end{align}

Although it is more common to normalize by the lengths:

\begin{align}\text{Similarity}(\text{physicist}, \text{mathematician}) = \frac{q_\text{physicist} \cdot q_\text{mathematician}}
   {\| q_\text{\physicist} \| \| q_\text{mathematician} \|} = \cos (\phi)\end{align}

Where $\phi$ is the angle between the two vectors. That way,
extremely similar words (words whose embeddings point in the same
direction) will have similarity 1. Extremely dissimilar words should
have similarity -1.


You can think of the sparse one-hot vectors from the beginning of this
section as a special case of these new vectors we have defined, where
each word basically has similarity 0, and we gave each word some unique
semantic attribute. These new vectors are *dense*, which is to say their
entries are (typically) non-zero.

But these new vectors are a big pain: you could think of thousands of
different semantic attributes that might be relevant to determining
similarity, and how on earth would you set the values of the different
attributes? Central to the idea of deep learning is that the neural
network learns representations of the features, rather than requiring
the programmer to design them herself. So why not just let the word
embeddings be parameters in our model, and then be updated during
training? This is exactly what we will do. We will have some *latent
semantic attributes* that the network can, in principle, learn. Note
that the word embeddings will probably not be interpretable. That is,
although with our hand-crafted vectors above we can see that
mathematicians and physicists are similar in that they both like coffee,
if we allow a neural network to learn the embeddings and see that both
mathematicians and physicisits have a large value in the second
dimension, it is not clear what that means. They are similar in some
latent semantic dimension, but this probably has no interpretation to
us.


In summary, **word embeddings are a representation of the *semantics* of
a word, efficiently encoding semantic information that might be relevant
to the task at hand**. You can embed other things too: part of speech
tags, parse trees, anything! The idea of feature embeddings is central
to the field.


Word Embeddings in Pytorch
~~~~~~~~~~~~~~~~~~~~~~~~~~

Before we get to a worked example and an exercise, a few quick notes
about how to use embeddings in Pytorch and in deep learning programming
in general. Similar to how we defined a unique index for each word when
making one-hot vectors, we also need to define an index for each word
when using embeddings. These will be keys into a lookup table. That is,
embeddings are stored as a $|V| \times D$ matrix, where $D$
is the dimensionality of the embeddings, such that the word assigned
index $i$ has its embedding stored in the $i$'th row of the
matrix. In all of my code, the mapping from words to indices is a
dictionary named word\_to\_ix.

The module that allows you to use embeddings is torch.nn.Embedding,
which takes two arguments: the vocabulary size, and the dimensionality
of the embeddings.

To index into this table, you must use torch.LongTensor (since the
indices are integers, not floats).




In [2]:
# Author: Robert Guthrie

import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)

<torch._C.Generator at 0x11afcaee8>

In [3]:
word_to_ix = {"hello": 0, "world": 1}
embeds = nn.Embedding(2, 5)  # 2 words in vocab, 5 dimensional embeddings
lookup_tensor = torch.LongTensor([word_to_ix["hello"]])
hello_embed = embeds(autograd.Variable(lookup_tensor))
print(hello_embed)

Variable containing:
-2.9718  1.7070 -0.4305 -2.2820  0.5237
[torch.FloatTensor of size 1x5]



An Example: N-Gram Language Modeling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Recall that in an n-gram language model, given a sequence of words
$w$, we want to compute

\begin{align}P(w_i | w_{i-1}, w_{i-2}, \dots, w_{i-n+1} )\end{align}

Where $w_i$ is the ith word of the sequence.

In this example, we will compute the loss function on some training
examples and update the parameters with backpropagation.




In [6]:
CONTEXT_SIZE = 2
EMBEDDING_DIM = 10
# We will use Shakespeare Sonnet 2
test_sentence = """When forty winters shall besiege thy brow,
And dig deep trenches in thy beauty's field,
Thy youth's proud livery so gazed on now,
Will be a totter'd weed of small worth held:
Then being asked, where all thy beauty lies,
Where all the treasure of thy lusty days;
To say, within thine own deep sunken eyes,
Were an all-eating shame, and thriftless praise.
How much more praise deserv'd thy beauty's use,
If thou couldst answer 'This fair child of mine
Shall sum my count, and make my old excuse,'
Proving his beauty by succession thine!
This were to be new made when thou art old,
And see thy blood warm when thou feel'st it cold.""".split()
# we should tokenize the input, but we will ignore that for now
# build a list of tuples.  Each tuple is ([ word_i-2, word_i-1 ], target word)
trigrams = [([test_sentence[i], test_sentence[i + 1]], test_sentence[i + 2])
            for i in range(len(test_sentence) - 2)]
# print the first 3, just so you can see what they look like
print(trigrams[:3])

vocab = set(test_sentence)
word_to_ix = {word: i for i, word in enumerate(vocab)}


class NGramLanguageModeler(nn.Module):

    def __init__(self, vocab_size, embedding_dim, context_size):
        super(NGramLanguageModeler, self).__init__()
        self.embeddings = nn.Embedding(vocab_size, embedding_dim)
        self.linear1 = nn.Linear(context_size * embedding_dim, 128)
        self.linear2 = nn.Linear(128, vocab_size)

    def forward(self, inputs):
        embeds = self.embeddings(inputs).view((1, -1))
        out = F.relu(self.linear1(embeds))
        out = self.linear2(out)
        log_probs = F.log_softmax(out)
        return log_probs


losses = []
loss_function = nn.NLLLoss()
model = NGramLanguageModeler(len(vocab), EMBEDDING_DIM, CONTEXT_SIZE)
optimizer = optim.SGD(model.parameters(), lr=0.001)

for epoch in range(10):
    total_loss = torch.Tensor([0])
    for context, target in trigrams:

        # Step 1. Prepare the inputs to be passed to the model (i.e, turn the words
        # into integer indices and wrap them in variables)
        context_idxs = [word_to_ix[w] for w in context]
        context_var = autograd.Variable(torch.LongTensor(context_idxs))

        # Step 2. Recall that torch *accumulates* gradients. Before passing in a
        # new instance, you need to zero out the gradients from the old
        # instance
        model.zero_grad()

        # Step 3. Run the forward pass, getting log probabilities over next
        # words
        log_probs = model(context_var)

        # Step 4. Compute your loss function. (Again, Torch wants the target
        # word wrapped in a variable)
        print("Aqui:")
        print(log_probs)
        print(autograd.Variable(
            torch.LongTensor([word_to_ix[target]])))
        loss = loss_function(log_probs, autograd.Variable(
            torch.LongTensor([word_to_ix[target]])))

        # Step 5. Do the backward pass and update the gradient
        loss.backward()
        optimizer.step()

        total_loss += loss.data
    losses.append(total_loss)
print(losses)  # The loss decreased every iteration over the training data!

[(['When', 'forty'], 'winters'), (['forty', 'winters'], 'shall'), (['winters', 'shall'], 'besiege')]
Aqui:
Variable containing:

Columns 0 to 9 
-4.4976 -4.7090 -4.2959 -4.4168 -4.4247 -4.1748 -4.5000 -4.6714 -4.7830 -4.7873

Columns 10 to 19 
-4.6161 -4.8706 -4.6467 -4.5819 -4.2162 -4.5375 -4.4919 -4.3795 -4.6089 -4.6605

Columns 20 to 29 
-4.6262 -4.9004 -5.1655 -4.5679 -4.5403 -4.5243 -4.9701 -4.9099 -4.6623 -4.2624

Columns 30 to 39 
-4.7256 -4.7841 -5.1036 -4.8016 -5.2189 -4.5850 -4.5578 -4.1879 -4.5940 -4.1050

Columns 40 to 49 
-4.6522 -4.8306 -4.3934 -4.4447 -4.2385 -4.5536 -4.5361 -4.2050 -4.6103 -4.7566

Columns 50 to 59 
-4.6924 -4.3480 -5.0925 -4.8312 -4.3782 -4.0577 -4.7006 -4.9623 -4.6938 -5.0591

Columns 60 to 69 
-4.4941 -4.3185 -4.5957 -4.6284 -4.5323 -4.4679 -4.4046 -4.3823 -4.4535 -4.6223

Columns 70 to 79 
-5.0675 -4.4731 -4.6582 -4.2774 -4.8151 -4.7065 -4.5267 -4.7558 -4.5324 -4.7040

Columns 80 to 89 
-4.4992 -4.5879 -4.6649 -4.6238 -4.4791 -4.9777 -4.5961 -4.3772

Variable containing:

Columns 0 to 9 
-4.3639 -4.4804 -4.4174 -4.5171 -4.5248 -4.2180 -4.7293 -4.3492 -4.3674 -4.4802

Columns 10 to 19 
-4.4841 -4.5122 -4.6753 -4.7061 -4.4478 -4.6494 -4.4199 -4.5564 -4.6416 -4.4917

Columns 20 to 29 
-4.3790 -4.9937 -4.7789 -4.1288 -4.5009 -4.6025 -4.8480 -4.6718 -4.3323 -4.5263

Columns 30 to 39 
-4.9890 -4.6628 -4.7875 -4.5909 -5.0413 -4.5061 -4.8579 -4.3953 -4.3750 -4.6537

Columns 40 to 49 
-4.2200 -4.9587 -4.8526 -4.7705 -4.4823 -4.5517 -4.8113 -4.5104 -4.4910 -4.7241

Columns 50 to 59 
-4.5422 -4.3878 -4.7397 -4.5856 -4.4874 -4.3070 -4.6227 -4.6212 -4.5695 -4.8038

Columns 60 to 69 
-4.4263 -4.3921 -4.5317 -4.6449 -4.5485 -4.6966 -4.2208 -4.6623 -4.3331 -4.5878

Columns 70 to 79 
-4.8325 -4.6911 -4.8323 -4.3867 -4.5483 -4.7328 -4.6506 -5.0079 -4.5853 -4.6809

Columns 80 to 89 
-4.6196 -4.6920 -4.7565 -4.8483 -4.4282 -4.5328 -4.6150 -4.6510 -4.5619 -4.9046

Columns 90 to 96 
-4.6272 -4.4364 -4.6138 -4.7752 -4.6831 -4.5056 -4.4646
[torch.FloatTen

Aqui:
Variable containing:

Columns 0 to 9 
-4.5157 -4.4933 -4.7280 -4.4124 -4.4127 -4.5464 -4.8413 -4.2310 -4.5244 -4.7665

Columns 10 to 19 
-4.4596 -4.3997 -4.5882 -4.8459 -4.3657 -4.5830 -4.2376 -4.6128 -4.6682 -4.6255

Columns 20 to 29 
-4.3572 -4.8971 -4.7073 -4.5299 -4.4255 -4.7375 -4.9516 -4.6807 -4.1494 -4.8244

Columns 30 to 39 
-4.7548 -4.3761 -4.9281 -4.9450 -4.5491 -4.5151 -4.7746 -4.7226 -4.3690 -4.7093

Columns 40 to 49 
-4.7012 -4.6343 -4.7416 -4.8799 -4.1903 -4.6248 -4.5912 -4.4192 -4.5451 -4.5429

Columns 50 to 59 
-4.0450 -4.4600 -4.5679 -4.9163 -4.5768 -4.8206 -4.7341 -4.7196 -4.2253 -4.5676

Columns 60 to 69 
-4.6169 -4.5467 -4.1667 -4.1913 -4.5374 -4.7585 -4.5814 -4.8798 -4.7716 -4.3674

Columns 70 to 79 
-4.6854 -4.9260 -4.2368 -4.4730 -4.8267 -4.7439 -4.4039 -4.9956 -4.6642 -4.5290

Columns 80 to 89 
-4.3856 -4.4894 -5.0148 -4.5659 -4.4445 -4.6318 -5.1548 -4.6305 -4.7851 -4.8932

Columns 90 to 96 
-4.2955 -4.6408 -4.6109 -4.8295 -4.7887 -4.4544 -4.3253
[torch.Fl

Variable containing:

Columns 0 to 9 
-4.1685 -4.7768 -4.6304 -4.5898 -4.3235 -4.6348 -4.6474 -4.6435 -4.4787 -4.8685

Columns 10 to 19 
-4.6702 -4.4585 -4.8773 -4.7093 -4.5104 -4.3439 -4.2605 -4.5823 -4.8299 -4.3990

Columns 20 to 29 
-4.2284 -4.7808 -4.5763 -4.5697 -4.6055 -4.5292 -4.3820 -4.7022 -4.6310 -4.7908

Columns 30 to 39 
-4.7855 -4.5429 -4.6386 -4.8808 -4.5752 -4.4278 -4.4858 -4.5133 -4.6486 -4.6282

Columns 40 to 49 
-4.7888 -4.3331 -4.8767 -4.7089 -4.3925 -4.5881 -4.8729 -4.5436 -4.5793 -4.5712

Columns 50 to 59 
-4.4127 -4.4040 -4.3929 -4.6436 -4.4048 -4.4760 -4.7295 -4.6581 -4.2282 -4.7492

Columns 60 to 69 
-4.6580 -4.3062 -4.2459 -4.7485 -4.7365 -4.6453 -4.1602 -4.8953 -4.7307 -4.6388

Columns 70 to 79 
-4.5381 -4.7333 -4.2789 -4.7773 -4.5579 -4.5115 -4.4902 -4.6085 -4.5345 -4.4728

Columns 80 to 89 
-4.3905 -4.8337 -4.9927 -4.8147 -4.2342 -4.6952 -4.9707 -4.7341 -4.8134 -4.6481

Columns 90 to 96 
-4.4800 -4.6194 -4.6439 -4.6868 -4.6458 -4.6897 -4.5960
[torch.FloatTen

Aqui:
Variable containing:

Columns 0 to 9 
-4.5672 -4.7513 -4.7851 -4.6282 -4.2634 -4.5512 -5.0127 -4.5308 -4.7634 -5.0198

Columns 10 to 19 
-4.9623 -4.5811 -4.5196 -4.8013 -4.6187 -4.6540 -3.9047 -4.6436 -4.8267 -4.6009

Columns 20 to 29 
-4.7795 -5.4245 -5.0428 -4.5517 -4.2209 -4.5358 -5.3671 -4.6168 -4.1691 -4.4454

Columns 30 to 39 
-4.3089 -4.0784 -4.9685 -4.7175 -4.8218 -4.9416 -4.4424 -4.2561 -4.2183 -4.1283

Columns 40 to 49 
-4.9220 -5.0831 -4.5555 -4.7455 -4.1359 -4.4775 -4.8290 -4.5209 -4.8397 -4.4839

Columns 50 to 59 
-4.0860 -4.8644 -4.8137 -5.1278 -4.8803 -4.4117 -4.8355 -4.7257 -4.2736 -4.8780

Columns 60 to 69 
-4.2120 -4.3031 -4.2092 -4.4235 -4.6854 -4.7603 -4.2560 -4.7569 -4.8401 -4.3714

Columns 70 to 79 
-5.1926 -4.4540 -4.5031 -4.7668 -4.7854 -5.0907 -4.1612 -4.9065 -4.5236 -4.3338

Columns 80 to 89 
-4.2131 -4.1009 -5.0348 -4.3591 -4.3853 -4.7055 -4.8381 -4.5319 -4.8327 -4.6659

Columns 90 to 96 
-4.6850 -4.2855 -4.5976 -4.8632 -4.5885 -4.5894 -4.7990
[torch.Fl

Aqui:
Variable containing:

Columns 0 to 9 
-4.6027 -4.4432 -4.3128 -4.7481 -4.4659 -4.4176 -4.8390 -4.8369 -4.0979 -4.3985

Columns 10 to 19 
-4.2989 -4.3059 -4.6279 -4.4642 -4.5621 -4.4696 -4.3265 -4.5308 -4.5346 -4.7460

Columns 20 to 29 
-4.5477 -5.5392 -4.6677 -4.5927 -4.5083 -4.7881 -5.0379 -4.9210 -4.0037 -4.8885

Columns 30 to 39 
-4.7721 -4.5002 -4.4707 -4.7684 -4.7363 -4.3477 -5.2114 -4.3715 -4.0900 -4.4839

Columns 40 to 49 
-4.9202 -5.1316 -4.8286 -4.9513 -4.3369 -4.3899 -5.0582 -4.6854 -4.6210 -5.0712

Columns 50 to 59 
-4.4880 -4.3894 -5.2230 -4.8440 -3.8510 -4.3593 -4.6193 -4.2276 -4.3819 -4.8597

Columns 60 to 69 
-4.4210 -4.3435 -4.5535 -4.3073 -4.7376 -4.9418 -4.2127 -4.5276 -4.3881 -4.6938

Columns 70 to 79 
-4.9484 -4.7657 -4.4804 -4.6560 -4.6077 -4.7885 -4.4272 -5.1294 -4.5536 -4.9259

Columns 80 to 89 
-4.3076 -4.4380 -4.8569 -4.8444 -4.3495 -4.3441 -4.8878 -5.0430 -4.5261 -4.9835

Columns 90 to 96 
-4.9558 -4.7631 -4.2244 -4.8887 -4.7101 -4.2839 -4.4869
[torch.Fl

Aqui:
Variable containing:

Columns 0 to 9 
-4.4689 -4.7121 -4.2686 -4.4251 -4.4291 -4.2030 -4.5024 -4.6677 -4.8091 -4.7972

Columns 10 to 19 
-4.6228 -4.8785 -4.6369 -4.5742 -4.2342 -4.5635 -4.5039 -4.3907 -4.6186 -4.6732

Columns 20 to 29 
-4.6269 -4.9023 -5.1500 -4.5810 -4.5479 -4.5358 -4.9453 -4.9039 -4.6605 -4.2373

Columns 30 to 39 
-4.7184 -4.8024 -5.1108 -4.7947 -5.2101 -4.5715 -4.5646 -4.2151 -4.5890 -4.1096

Columns 40 to 49 
-4.6713 -4.8195 -4.3959 -4.4426 -4.1940 -4.5410 -4.5497 -4.2100 -4.6474 -4.7471

Columns 50 to 59 
-4.7019 -4.3417 -5.0962 -4.8345 -4.3837 -4.0976 -4.6958 -4.9587 -4.6950 -5.0574

Columns 60 to 69 
-4.5085 -4.3259 -4.6127 -4.6393 -4.5189 -4.4816 -4.4206 -4.3811 -4.4626 -4.6282

Columns 70 to 79 
-5.0619 -4.4573 -4.6218 -4.2852 -4.8176 -4.7154 -4.5438 -4.7613 -4.5491 -4.7149

Columns 80 to 89 
-4.5094 -4.4224 -4.6724 -4.6148 -4.4840 -4.9520 -4.6033 -4.3750 -4.3278 -4.8977

Columns 90 to 96 
-5.1379 -4.1370 -4.4580 -4.7193 -4.9118 -4.4407 -4.5711
[torch.Fl


Aqui:
Variable containing:

Columns 0 to 9 
-4.5328 -4.6058 -4.3035 -4.7882 -4.4789 -4.3307 -4.5486 -4.8645 -4.1407 -4.4338

Columns 10 to 19 
-4.5108 -4.7730 -4.7824 -4.6148 -4.4957 -4.4867 -4.5249 -4.4579 -4.4031 -4.4442

Columns 20 to 29 
-4.4322 -4.8911 -4.8144 -4.5258 -4.7517 -4.6755 -4.5825 -4.6353 -4.6047 -4.6250

Columns 30 to 39 
-4.8992 -4.6553 -4.8581 -4.7451 -4.7964 -4.2692 -4.8587 -4.3849 -4.6327 -4.4515

Columns 40 to 49 
-4.7285 -4.7318 -4.5730 -4.7206 -4.2738 -4.6624 -4.8857 -4.5607 -4.5228 -4.8441

Columns 50 to 59 
-4.4685 -4.2646 -4.9771 -4.6888 -4.1098 -4.3535 -4.6520 -4.4786 -4.2635 -4.9385

Columns 60 to 69 
-4.3914 -4.3951 -4.6047 -4.8218 -4.9010 -4.6836 -4.2311 -4.6883 -4.3756 -4.6788

Columns 70 to 79 
-4.8109 -4.7771 -4.3878 -4.4413 -4.6097 -4.6207 -4.5277 -4.8157 -4.3697 -4.5320

Columns 80 to 89 
-4.5351 -4.6678 -4.7040 -4.9689 -4.1667 -4.7787 -4.6904 -4.8322 -4.5527 -4.6838

Columns 90 to 96 
-4.7165 -4.5149 -4.4588 -4.4452 -4.9165 -4.5328 -4.5363
[torch.F

Aqui:
Variable containing:

Columns 0 to 9 
-4.2560 -4.8576 -4.5143 -4.5572 -4.2558 -4.2848 -4.5909 -4.5600 -4.3726 -4.8930

Columns 10 to 19 
-4.3154 -4.7393 -4.7763 -4.5521 -4.4986 -4.3584 -4.4316 -4.5032 -4.7848 -4.1900

Columns 20 to 29 
-4.3333 -4.7247 -4.4930 -4.6021 -4.7361 -4.4467 -4.5234 -4.6300 -4.6177 -4.7911

Columns 30 to 39 
-4.7760 -4.7086 -4.5643 -4.9456 -4.6152 -4.3999 -4.7193 -4.4639 -4.3945 -4.6661

Columns 40 to 49 
-4.6333 -4.5014 -4.8419 -4.5990 -4.3881 -4.6860 -4.7834 -4.7457 -4.6193 -4.6126

Columns 50 to 59 
-4.3019 -4.4938 -4.8983 -4.5965 -4.1716 -4.5843 -4.8861 -4.8251 -4.6173 -4.7076

Columns 60 to 69 
-4.7068 -4.2777 -4.3669 -4.5826 -4.6663 -4.7337 -4.3423 -4.6791 -4.6930 -4.6786

Columns 70 to 79 
-4.9574 -4.7400 -4.6163 -4.5674 -4.7386 -4.4660 -4.6064 -4.5729 -4.5987 -4.7204

Columns 80 to 89 
-4.6036 -4.3308 -4.8362 -4.8958 -4.2319 -4.8503 -4.6926 -4.4700 -4.5983 -4.3327

Columns 90 to 96 
-4.5911 -4.5143 -4.4455 -4.5816 -4.8589 -4.6159 -4.6944
[torch.Fl


Aqui:
Variable containing:

Columns 0 to 9 
-4.7054 -4.6175 -4.7171 -4.6091 -4.5636 -4.3389 -4.8748 -4.3229 -4.7170 -4.9428

Columns 10 to 19 
-4.4576 -4.3671 -4.5657 -4.5413 -4.3663 -4.6737 -4.1414 -4.6951 -4.7285 -4.6677

Columns 20 to 29 
-4.2579 -4.9688 -4.8362 -4.6526 -4.5672 -4.5504 -4.9097 -4.4936 -4.3938 -4.7628

Columns 30 to 39 
-4.6532 -4.3860 -4.7522 -4.7152 -4.9359 -4.6549 -4.5986 -4.5332 -4.4768 -4.5802

Columns 40 to 49 
-4.4549 -4.6437 -4.4177 -4.6895 -4.2955 -4.7592 -4.5919 -4.3928 -4.9823 -4.6794

Columns 50 to 59 
-4.2466 -4.6314 -4.6644 -4.8939 -4.6197 -4.7893 -4.5744 -4.5080 -4.3735 -4.5913

Columns 60 to 69 
-4.5866 -4.4198 -4.3444 -4.2088 -4.8204 -4.8968 -4.3942 -4.6547 -4.7892 -4.3046

Columns 70 to 79 
-4.7442 -4.7323 -4.4722 -4.4020 -4.3961 -4.8336 -4.4127 -4.7660 -4.8043 -4.4225

Columns 80 to 89 
-4.2908 -4.1547 -4.9255 -4.5916 -4.5674 -4.4056 -5.0946 -4.7150 -4.7798 -4.5891

Columns 90 to 96 
-4.3656 -4.7106 -4.6136 -4.7905 -4.6213 -4.4235 -4.5611
[torch.F

Aqui:
Variable containing:

Columns 0 to 9 
-4.5282 -4.7632 -4.2902 -4.7235 -4.3162 -4.3829 -4.9066 -4.7441 -4.1597 -4.6584

Columns 10 to 19 
-4.0937 -4.6837 -4.7715 -4.5479 -4.6541 -4.2136 -4.4945 -4.4027 -4.5136 -4.7180

Columns 20 to 29 
-4.4686 -4.7449 -4.4116 -4.6059 -4.4403 -4.6866 -4.5016 -5.0093 -4.4689 -4.7759

Columns 30 to 39 
-4.9031 -4.6224 -4.7839 -4.9792 -4.7346 -3.7746 -5.0629 -4.6360 -4.3044 -4.5261

Columns 40 to 49 
-4.8181 -4.6122 -4.9748 -4.4833 -4.3400 -4.6338 -4.7279 -4.6787 -4.5288 -4.2328

Columns 50 to 59 
-4.3287 -4.6022 -5.4078 -4.7457 -4.2999 -4.3615 -4.8272 -4.9447 -4.6947 -4.9225

Columns 60 to 69 
-4.5406 -4.0492 -4.6733 -4.5611 -4.7992 -4.7406 -4.4319 -4.8118 -4.8262 -4.8135

Columns 70 to 79 
-5.0457 -4.9364 -4.6461 -4.6435 -4.5574 -4.3552 -4.2304 -4.7726 -4.6309 -4.5501

Columns 80 to 89 
-4.6531 -4.3652 -4.8925 -4.9041 -4.3743 -4.9229 -4.7911 -4.6385 -4.3808 -4.3267

Columns 90 to 96 
-4.6670 -4.6522 -4.1645 -4.5622 -4.7301 -4.5108 -4.5820
[torch.Fl

Aqui:
Variable containing:

Columns 0 to 9 
-4.4839 -4.4235 -4.7651 -4.8839 -4.1296 -4.6189 -4.8435 -4.4838 -4.6003 -5.0124

Columns 10 to 19 
-5.0837 -4.2185 -4.4446 -4.8301 -4.3664 -4.6650 -3.9789 -4.7027 -5.0210 -4.6170

Columns 20 to 29 
-4.4010 -5.4291 -4.8529 -4.4058 -4.4638 -4.4345 -5.3664 -4.4209 -4.0765 -4.9938

Columns 30 to 39 
-4.3012 -4.4993 -5.0648 -4.6962 -4.8518 -4.9819 -4.6946 -4.4639 -3.8468 -4.2730

Columns 40 to 49 
-4.8269 -5.1971 -4.6896 -5.0744 -4.0719 -4.3805 -4.9387 -4.5778 -4.9988 -4.4444

Columns 50 to 59 
-4.3347 -5.3795 -4.8059 -5.1928 -4.8304 -4.6958 -4.6614 -4.4261 -4.5305 -4.6857

Columns 60 to 69 
-4.1812 -3.9790 -4.4345 -4.6055 -5.0594 -4.7959 -4.4125 -4.7657 -4.5869 -4.3105

Columns 70 to 79 
-5.0138 -4.6650 -4.4325 -4.6603 -4.6040 -5.3283 -4.2444 -5.2156 -4.5917 -4.5170

Columns 80 to 89 
-4.3759 -3.5865 -4.7794 -4.3756 -4.2716 -4.5554 -4.5864 -4.9035 -4.9586 -4.6434

Columns 90 to 96 
-4.6351 -4.3005 -4.7008 -4.8399 -4.6169 -4.4714 -5.0034
[torch.Fl

Aqui:
Variable containing:

Columns 0 to 9 
-4.5644 -4.7157 -4.6079 -4.6379 -4.4967 -4.3494 -4.7070 -4.3432 -4.5832 -4.9370

Columns 10 to 19 
-4.6189 -4.4886 -4.4887 -4.4163 -4.3359 -4.5285 -4.1879 -4.6742 -4.8510 -4.4583

Columns 20 to 29 
-4.4933 -5.0819 -4.5605 -4.4984 -4.3774 -4.5737 -4.9568 -4.4341 -4.4496 -4.7782

Columns 30 to 39 
-4.5695 -4.2340 -4.7563 -4.8472 -4.5704 -4.6833 -4.5541 -4.4705 -4.5355 -4.3478

Columns 40 to 49 
-4.7481 -4.8241 -4.5645 -4.7030 -4.1168 -4.6925 -4.5460 -4.6555 -4.8191 -4.5854

Columns 50 to 59 
-4.4526 -4.6920 -4.9097 -5.0319 -4.6780 -4.6898 -4.6714 -4.8195 -4.2036 -4.7364

Columns 60 to 69 
-4.3804 -4.5356 -4.2764 -4.3761 -4.5693 -4.7278 -4.3231 -4.5938 -4.6559 -4.3894

Columns 70 to 79 
-5.0401 -4.4259 -4.3502 -4.5272 -4.6419 -4.8368 -4.5271 -4.7976 -4.6752 -4.5887

Columns 80 to 89 
-4.3557 -4.1233 -5.0977 -4.5786 -4.5388 -4.6226 -4.9877 -4.5264 -4.4641 -4.7692

Columns 90 to 96 
-4.6504 -4.5141 -4.7242 -4.6482 -4.7230 -4.5190 -4.7424
[torch.Fl

Aqui:
Variable containing:

Columns 0 to 9 
-4.4601 -4.4284 -4.7642 -4.8903 -4.1389 -4.6237 -4.8519 -4.4890 -4.5981 -5.0090

Columns 10 to 19 
-5.0901 -4.2279 -4.4497 -4.8165 -4.3736 -4.6675 -3.9890 -4.7070 -5.0285 -4.6186

Columns 20 to 29 
-4.3937 -5.4299 -4.8596 -4.4037 -4.4629 -4.4416 -5.3706 -4.4256 -4.0860 -4.9891

Columns 30 to 39 
-4.3078 -4.5072 -5.0705 -4.7041 -4.8469 -4.9897 -4.7008 -4.4730 -3.8583 -4.2569

Columns 40 to 49 
-4.8355 -5.1970 -4.6894 -5.0796 -4.0651 -4.3808 -4.9319 -4.5665 -5.0149 -4.4507

Columns 50 to 59 
-4.3416 -5.3843 -4.8066 -5.1981 -4.8194 -4.7119 -4.6607 -4.3972 -4.5190 -4.6891

Columns 60 to 69 
-4.1925 -3.9671 -4.4385 -4.6161 -5.0637 -4.7950 -4.4223 -4.7680 -4.5960 -4.3144

Columns 70 to 79 
-5.0129 -4.6610 -4.4301 -4.6564 -4.5942 -5.3393 -4.2460 -5.2182 -4.5966 -4.5206

Columns 80 to 89 
-4.3766 -3.5269 -4.7814 -4.3801 -4.2789 -4.5485 -4.5905 -4.9110 -4.9479 -4.6471

Columns 90 to 96 
-4.6364 -4.3049 -4.7063 -4.8422 -4.6111 -4.4670 -5.0091
[torch.Fl

Aqui:
Variable containing:

Columns 0 to 9 
-4.5349 -4.5689 -4.5888 -5.0518 -4.3339 -4.3254 -4.5318 -4.3037 -4.1250 -4.5695

Columns 10 to 19 
-4.0227 -4.6886 -4.6296 -4.2508 -4.7206 -4.6040 -4.1810 -4.7727 -4.6600 -4.2059

Columns 20 to 29 
-4.1522 -4.7851 -4.8129 -4.1164 -4.5273 -4.9055 -4.5416 -4.7400 -4.5140 -4.6955

Columns 30 to 39 
-4.5761 -4.6090 -4.2887 -5.1647 -4.6538 -4.6034 -5.1337 -4.7264 -4.5683 -4.8311

Columns 40 to 49 
-4.8439 -4.6497 -5.5175 -4.6058 -4.4834 -4.9593 -4.8443 -4.8031 -4.4539 -4.6923

Columns 50 to 59 
-4.1224 -4.5262 -4.7796 -4.3863 -4.4429 -4.5670 -4.6515 -4.9201 -4.8101 -4.6053

Columns 60 to 69 
-4.5794 -4.3737 -4.4544 -4.6839 -4.4950 -4.8762 -4.3162 -4.5730 -4.5643 -4.8161

Columns 70 to 79 
-5.3438 -4.6176 -4.7413 -4.5309 -4.6572 -4.3987 -4.6488 -4.8473 -4.3380 -4.5974

Columns 80 to 89 
-4.5686 -4.3503 -4.6173 -5.1035 -4.3734 -4.4911 -4.6327 -4.5794 -4.4856 -4.4824

Columns 90 to 96 
-4.5045 -4.7992 -4.4675 -4.7195 -4.7423 -4.4964 -4.5627
[torch.Fl

Aqui:
Variable containing:

Columns 0 to 9 
-4.5806 -4.7520 -4.2627 -4.6431 -4.5042 -3.9563 -4.8227 -4.5104 -4.0245 -4.8344

Columns 10 to 19 
-3.8732 -4.7655 -4.5164 -4.2738 -4.6904 -4.1896 -4.5569 -4.2900 -4.4848 -4.6348

Columns 20 to 29 
-4.5752 -5.1438 -4.4768 -4.6606 -4.6773 -4.9000 -4.7911 -5.0624 -4.4544 -4.8626

Columns 30 to 39 
-4.9194 -4.5777 -4.6457 -4.9453 -4.6444 -4.1578 -5.0970 -4.5703 -4.0981 -4.4647

Columns 40 to 49 
-4.6399 -4.9668 -5.1868 -4.5569 -4.2494 -4.7765 -4.7375 -4.7049 -4.7999 -4.4000

Columns 50 to 59 
-4.2978 -4.6090 -5.7296 -4.9738 -4.1704 -4.5345 -4.8727 -4.7386 -4.7654 -5.0107

Columns 60 to 69 
-4.5726 -3.9780 -4.6645 -4.2542 -4.8125 -4.7024 -4.3980 -4.6390 -4.8877 -4.8089

Columns 70 to 79 
-5.4576 -4.9412 -4.6559 -4.3685 -4.7514 -4.2997 -4.5829 -4.9204 -4.8230 -4.6807

Columns 80 to 89 
-4.4400 -3.9862 -4.9931 -5.0032 -4.5283 -4.8821 -4.8091 -4.6831 -4.2795 -4.4852

Columns 90 to 96 
-4.6696 -4.5965 -4.2542 -4.5579 -4.7071 -4.0737 -4.7760
[torch.Fl

Aqui:
Variable containing:

Columns 0 to 9 
-4.5467 -4.7470 -4.2782 -4.9195 -4.6320 -4.5181 -4.5536 -4.9521 -4.1492 -4.7622

Columns 10 to 19 
-4.3699 -4.6731 -4.9277 -4.3873 -4.6551 -4.5336 -4.6133 -4.3293 -4.3815 -4.3948

Columns 20 to 29 
-4.4025 -4.8389 -4.3313 -4.9706 -4.6741 -4.5755 -4.4304 -4.7743 -4.7167 -4.6848

Columns 30 to 39 
-5.1303 -4.8159 -5.1194 -4.2828 -4.7848 -4.2267 -4.6599 -4.3325 -4.5003 -4.7466

Columns 40 to 49 
-4.3071 -4.6911 -4.7603 -4.6561 -4.0443 -4.6465 -4.7734 -4.3816 -4.9822 -4.8339

Columns 50 to 59 
-4.6670 -4.3601 -5.0587 -4.8450 -4.3188 -4.8321 -4.8924 -4.4436 -4.1377 -4.9138

Columns 60 to 69 
-4.1846 -4.3380 -4.5941 -4.2406 -5.1347 -4.9915 -4.2541 -4.7770 -4.5736 -4.6534

Columns 70 to 79 
-4.7889 -5.0467 -4.1272 -4.5257 -4.2825 -4.6120 -4.6553 -5.1483 -4.6826 -4.4601

Columns 80 to 89 
-4.2652 -4.3065 -5.2013 -5.3867 -4.4697 -4.8453 -4.9588 -4.8728 -4.7166 -4.4128

Columns 90 to 96 
-4.2874 -4.4427 -4.3625 -4.4509 -4.4853 -4.3712 -4.6414
[torch.Fl

Aqui:
Variable containing:

Columns 0 to 9 
-4.4921 -4.6907 -4.7037 -4.8365 -4.2784 -4.5384 -4.5850 -4.3037 -4.8772 -4.6427

Columns 10 to 19 
-4.5401 -4.6473 -4.6209 -4.8881 -4.2727 -4.7659 -4.2505 -5.0839 -4.7797 -4.5046

Columns 20 to 29 
-4.0812 -4.5761 -4.7990 -4.5748 -4.3087 -4.7028 -4.7460 -4.8693 -4.4284 -4.3072

Columns 30 to 39 
-4.5889 -4.6640 -4.9340 -4.8019 -5.1870 -4.6205 -4.7396 -4.6820 -4.7341 -4.4623

Columns 40 to 49 
-4.7659 -4.4021 -4.3818 -4.4540 -4.4697 -4.5321 -4.5931 -4.2095 -4.5408 -4.5815

Columns 50 to 59 
-4.6675 -4.6279 -4.4306 -4.5565 -4.5539 -4.5664 -4.6362 -4.9172 -4.7273 -4.5354

Columns 60 to 69 
-4.4484 -4.4674 -4.3707 -4.3793 -4.5811 -4.6376 -4.4751 -4.4360 -4.2494 -4.8272

Columns 70 to 79 
-4.6097 -4.8110 -4.5627 -4.5970 -4.5974 -4.6098 -4.2285 -4.6935 -4.4844 -4.5517

Columns 80 to 89 
-4.2213 -4.6659 -4.7337 -4.6554 -4.1793 -4.5226 -4.7543 -4.8015 -4.8624 -4.7737

Columns 90 to 96 
-4.6137 -4.5499 -4.8665 -5.1431 -4.7595 -4.5249 -4.3255
[torch.Fl


Aqui:
Variable containing:

Columns 0 to 9 
-4.3493 -4.9142 -4.8362 -4.9888 -4.5664 -4.6188 -4.6921 -4.1867 -4.5490 -4.5700

Columns 10 to 19 
-4.1820 -4.7203 -4.7380 -4.6799 -4.3202 -4.5117 -4.2407 -4.5677 -4.9529 -4.0923

Columns 20 to 29 
-4.2525 -4.6157 -4.5071 -4.6317 -4.7194 -4.5947 -4.4749 -4.4923 -4.6884 -4.4504

Columns 30 to 39 
-4.3466 -4.5247 -4.8190 -4.8623 -5.0919 -4.8478 -5.0985 -4.7794 -4.5020 -4.8472

Columns 40 to 49 
-4.4967 -4.2900 -4.9665 -5.0917 -4.3822 -4.9687 -4.5720 -4.2929 -4.2872 -4.6892

Columns 50 to 59 
-3.9834 -4.4596 -4.7473 -4.0575 -4.5837 -4.5626 -5.0034 -5.3292 -4.8250 -4.6539

Columns 60 to 69 
-4.7336 -4.1757 -4.5384 -4.5654 -4.6611 -4.4969 -4.5885 -4.7098 -4.6846 -5.1043

Columns 70 to 79 
-5.1190 -4.7980 -4.4261 -4.5999 -4.3885 -4.4387 -4.8385 -4.9962 -4.3379 -4.5172

Columns 80 to 89 
-4.3130 -4.6864 -4.4313 -5.2010 -4.1762 -4.4813 -4.6264 -4.5963 -4.8539 -4.5056

Columns 90 to 96 
-4.1214 -4.9942 -4.5345 -4.8577 -4.5558 -4.3530 -4.6121
[torch.F

Aqui:
Variable containing:

Columns 0 to 9 
-4.5236 -4.8379 -4.5623 -4.5762 -4.5067 -4.2608 -4.4681 -4.6132 -4.0110 -5.3131

Columns 10 to 19 
-4.4259 -4.5665 -4.5856 -4.2722 -4.4303 -4.4725 -4.3018 -4.5993 -4.7502 -4.6409

Columns 20 to 29 
-4.2574 -5.1409 -4.3821 -4.7953 -4.4066 -4.6965 -4.8907 -5.0028 -4.4309 -4.8226

Columns 30 to 39 
-4.6433 -4.5505 -4.7171 -4.6699 -4.4870 -4.5521 -4.6752 -4.7684 -4.5940 -4.2686

Columns 40 to 49 
-5.2999 -4.9445 -4.9011 -4.6268 -4.2546 -4.8385 -4.2723 -4.4698 -4.3988 -5.0294

Columns 50 to 59 
-4.2735 -4.7348 -5.3054 -4.9605 -4.1535 -4.8155 -5.1564 -4.8436 -4.4483 -5.0285

Columns 60 to 69 
-4.5486 -4.1074 -4.4256 -3.9756 -4.6856 -4.2704 -4.2450 -4.5656 -4.6995 -4.8825

Columns 70 to 79 
-5.0462 -4.6119 -4.2867 -4.9328 -4.6673 -4.6784 -4.4769 -4.4917 -4.6369 -4.7206

Columns 80 to 89 
-4.4121 -4.2882 -4.9838 -4.9490 -4.3773 -4.5895 -4.6743 -4.3535 -4.1732 -4.7361

Columns 90 to 96 
-5.0036 -4.8402 -4.4605 -4.5500 -4.5039 -4.5835 -4.8167
[torch.Fl


Aqui:
Variable containing:

Columns 0 to 9 
-3.8927 -4.7272 -4.7876 -4.5328 -4.3958 -4.5752 -4.6819 -4.6249 -4.6603 -4.8270

Columns 10 to 19 
-4.3975 -4.3329 -4.5557 -4.6188 -4.6226 -4.5881 -4.1872 -4.6113 -4.7646 -4.4632

Columns 20 to 29 
-4.5389 -5.0732 -4.7320 -4.4329 -4.4820 -4.5916 -4.7160 -4.7191 -4.4838 -4.4849

Columns 30 to 39 
-4.7280 -4.2090 -4.6664 -4.8004 -4.6467 -4.5484 -4.7180 -4.5076 -4.2192 -4.4706

Columns 40 to 49 
-4.9087 -4.6972 -4.8373 -4.6785 -4.1634 -4.7796 -4.8696 -4.7790 -4.6135 -4.6049

Columns 50 to 59 
-4.5250 -4.7246 -4.7598 -4.5913 -4.4416 -4.6577 -4.7097 -4.6555 -4.4718 -4.8033

Columns 60 to 69 
-4.3484 -4.4932 -4.5005 -4.5737 -4.7327 -4.6966 -4.3766 -4.5800 -4.5825 -4.6676

Columns 70 to 79 
-4.7357 -4.5363 -4.4567 -4.9048 -4.6042 -4.5930 -4.4037 -4.6347 -4.6003 -4.7539

Columns 80 to 89 
-4.4977 -4.2737 -4.9954 -4.4945 -4.3770 -4.4142 -4.6280 -4.8166 -4.5086 -4.6851

Columns 90 to 96 
-4.7379 -4.3720 -4.4550 -4.9775 -4.4074 -4.5522 -4.6787
[torch.F

Exercise: Computing Word Embeddings: Continuous Bag-of-Words
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The Continuous Bag-of-Words model (CBOW) is frequently used in NLP deep
learning. It is a model that tries to predict words given the context of
a few words before and a few words after the target word. This is
distinct from language modeling, since CBOW is not sequential and does
not have to be probabilistic. Typcially, CBOW is used to quickly train
word embeddings, and these embeddings are used to initialize the
embeddings of some more complicated model. Usually, this is referred to
as *pretraining embeddings*. It almost always helps performance a couple
of percent.

The CBOW model is as follows. Given a target word $w_i$ and an
$N$ context window on each side, $w_{i-1}, \dots, w_{i-N}$
and $w_{i+1}, \dots, w_{i+N}$, referring to all context words
collectively as $C$, CBOW tries to minimize

\begin{align}-\log p(w_i | C) = -\log \text{Softmax}(A(\sum_{w \in C} q_w) + b)\end{align}

where $q_w$ is the embedding for word $w$.

Implement this model in Pytorch by filling in the class below. Some
tips:

* Think about which parameters you need to define.
* Make sure you know what shape each operation expects. Use .view() if you need to
  reshape.




In [None]:
CONTEXT_SIZE = 2  # 2 words to the left, 2 to the right
raw_text = """We are about to study the idea of a computational process.
Computational processes are abstract beings that inhabit computers.
As they evolve, processes manipulate other abstract things called data.
The evolution of a process is directed by a pattern of rules
called a program. People create programs to direct processes. In effect,
we conjure the spirits of the computer with our spells.""".split()

# By deriving a set from `raw_text`, we deduplicate the array
vocab = set(raw_text)
vocab_size = len(vocab)

word_to_ix = {word: i for i, word in enumerate(vocab)}
data = []
for i in range(2, len(raw_text) - 2):
    context = [raw_text[i - 2], raw_text[i - 1],
               raw_text[i + 1], raw_text[i + 2]]
    target = raw_text[i]
    data.append((context, target))
print(data[:5])


class CBOW(nn.Module):

    def __init__(self):
        pass

    def forward(self, inputs):
        pass

# create your model and train.  here are some functions to help you make
# the data ready for use by your module


def make_context_vector(context, word_to_ix):
    idxs = [word_to_ix[w] for w in context]
    tensor = torch.LongTensor(idxs)
    return autograd.Variable(tensor)


make_context_vector(data[0][0], word_to_ix)  # example