# Word Embeddings in PyTorch

In PyTorch word embeddings are implemented as a lookup table (see [PyTorch documentation](https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html)). The table has as many rows as there are tokens in the vocabulary. The number of columns correspond to the dimensionality of the embedding vector.

In [1]:
import torch
import torch.nn as nn

In our example we assume that we have 10 tokens and we would like to create an embedding of lenght 4.

In [2]:
VOCABULARY_SIZE=10
EMBEDDING_DIM=4

We create a sequence that simulates 5 sentences, each consisting of 3 tokens. We initialize the sequence randomly with integer values between 0 and 9.

In [3]:
BATCH_SIZE=5
SEQ_LEN=3
sequence = torch.randint(low=0, high=VOCABULARY_SIZE, size=(BATCH_SIZE, SEQ_LEN))

The sequence is essentially a set of keys that will be used to lookup values in the embedding table.

In [4]:
sequence

tensor([[0, 6, 7],
        [3, 2, 2],
        [2, 5, 2],
        [0, 8, 3],
        [4, 6, 3]])

The embedding itself is a PyTorch module. The layer `nn.Embedding` requires two arguments:

- num_embeddings: This value corresponds to the number of tokens and is the number of rows in the embedding matrix.
- embedding_dim: This is the vector length of the word embeddings and is the number of columns in the embedding matrix.

In [5]:
embedding = nn.Embedding(num_embeddings=VOCABULARY_SIZE, embedding_dim=EMBEDDING_DIM)

The embedding layer is obviously not static, but is a set of weights. Those weights are learned jointly with other weights through the backpropagation algorithm.

In [6]:
print(embedding.weight)

Parameter containing:
tensor([[ 1.4610, -0.8082, -0.7816, -0.3135],
        [ 0.1052, -0.0626, -0.7115, -1.1456],
        [-0.3115,  1.5333,  0.6740, -1.8035],
        [-0.1831, -2.3549,  1.2970, -0.4006],
        [ 1.8532,  0.2546,  0.4889,  0.4106],
        [ 1.0876,  1.1219, -0.5330,  0.1026],
        [-0.4453, -0.8550, -0.0360,  2.3146],
        [ 0.7443, -0.3514,  0.6963, -0.9283],
        [-1.3944,  0.0908,  0.6961,  1.0211],
        [-0.2815,  2.1397, -2.8740, -0.4085]], requires_grad=True)


When we input a value x into the embedding layer, we get back the x'th row of the embedding matrix.

In [7]:
with torch.inference_mode():
    print(embedding(torch.tensor(0)))

tensor([ 1.4610, -0.8082, -0.7816, -0.3135])


And when we provide a tensor, we get the embeddings that correspond to the index values of the tokens.

In [8]:
with torch.inference_mode():
    embeddings = embedding(sequence)
    print(embeddings)

tensor([[[ 1.4610, -0.8082, -0.7816, -0.3135],
         [-0.4453, -0.8550, -0.0360,  2.3146],
         [ 0.7443, -0.3514,  0.6963, -0.9283]],

        [[-0.1831, -2.3549,  1.2970, -0.4006],
         [-0.3115,  1.5333,  0.6740, -1.8035],
         [-0.3115,  1.5333,  0.6740, -1.8035]],

        [[-0.3115,  1.5333,  0.6740, -1.8035],
         [ 1.0876,  1.1219, -0.5330,  0.1026],
         [-0.3115,  1.5333,  0.6740, -1.8035]],

        [[ 1.4610, -0.8082, -0.7816, -0.3135],
         [-1.3944,  0.0908,  0.6961,  1.0211],
         [-0.1831, -2.3549,  1.2970, -0.4006]],

        [[ 1.8532,  0.2546,  0.4889,  0.4106],
         [-0.4453, -0.8550, -0.0360,  2.3146],
         [-0.1831, -2.3549,  1.2970, -0.4006]]])
