## PyTorch's nn.Embedding

In [31]:
import torch
from torch import nn
import numpy as np

In [22]:
# define the dictionary
word_to_ix = {"p22": 0, "p23": 1, "p42": 2, "endp22": 3, "wait2": 4, "wait3": 5}

In [23]:
# set parameters
vocab_size = len(word_to_ix)
embedding_dim = 5

In [24]:
# example of word to integer
word_to_ix["p42"]

2

In [25]:
# create embedding layer
embeds = nn.Embedding(vocab_size, embedding_dim)  # 6 words in vocab, 5 dimensional embeddings

In [26]:
# convert text -> integer -> 1d-tensor
example_tensor = torch.tensor([word_to_ix["p42"]], dtype=torch.long)
print(example_tensor)

tensor([2])


In [27]:
# embed 1d-tensor into 5-dim vector
example_embed = embeds(example_tensor)
print(example_embed)

tensor([[-0.5775, -1.7360,  0.2905, -1.3652,  1.5233]],
       grad_fn=<EmbeddingBackward>)


In [29]:
# example of embedding a batch of 5 words for a vocab size of 149

test = torch.tensor([22, 23, 46, 52, 72])

embeds = nn.Embedding(149, 5)  # 149 words in vocab, 5 dimensional embeddings

test_embed = embeds(test)

print(test_embed)

tensor([[-0.0394, -1.2079, -0.5101,  0.2840, -0.2728],
        [ 1.6345, -1.4403, -1.1477,  0.0996, -0.9257],
        [-0.6820,  0.7039, -0.3863,  1.0630,  1.0672],
        [ 0.4961, -1.4209,  1.1120, -0.8681, -0.3484],
        [ 0.4165,  0.9182,  0.9482,  1.0565, -1.2355]],
       grad_fn=<EmbeddingBackward>)


In [34]:
x = np.array([[-0.0394, -1.2079, -0.5101,  0.2840, -0.2728],
        [ 1.6345, -1.4403, -1.1477,  0.0996, -0.9257],
        [-0.6820,  0.7039, -0.3863,  1.0630,  1.0672],
        [ 0.4961, -1.4209,  1.1120, -0.8681, -0.3484],
        [ 0.4165,  0.9182,  0.9482,  1.0565, -1.2355]])

In [35]:
x.T

array([[-0.0394,  1.6345, -0.682 ,  0.4961,  0.4165],
       [-1.2079, -1.4403,  0.7039, -1.4209,  0.9182],
       [-0.5101, -1.1477, -0.3863,  1.112 ,  0.9482],
       [ 0.284 ,  0.0996,  1.063 , -0.8681,  1.0565],
       [-0.2728, -0.9257,  1.0672, -0.3484, -1.2355]])

In [36]:
x.transpose(1,0)

array([[-0.0394,  1.6345, -0.682 ,  0.4961,  0.4165],
       [-1.2079, -1.4403,  0.7039, -1.4209,  0.9182],
       [-0.5101, -1.1477, -0.3863,  1.112 ,  0.9482],
       [ 0.284 ,  0.0996,  1.063 , -0.8681,  1.0565],
       [-0.2728, -0.9257,  1.0672, -0.3484, -1.2355]])

## nn.Embedding

1. define vocab length
2. takes in a 0D-tensor (i.e. 49, 23, 24)
3. no need for one-hot encoding
4. if input tensor([49,23,34])
    - will output tensor of 3*5
    - in general, will output tensor of shape input_length*embedding_dim

## to do
1. remove one-hot encoding
2. input should be 1d tensor
3. output is still the same?
    - usually output should be an embedding vector as well
    - then use arg.max to get token_id
4. use jupyter notebook
5. remove keras's categorical

## questions

1. should the output still be the same?
    - usually output should be an embedding vector as well
    - then use arg.max to get token_id
2. what should the embedding dimension be?
    - use hyperparameter optimization to get best number, eg. ranging from 50 to 1000
    - A good rule of thumb is 4th root of the vocab_length, eg. 149^(1/4) = 3.5
    - The typical number of dimensions is between 200–300.
    - The number of dimensions does not greatly impact how distances in the word embedding space encode semantic relationships. You can pick a power of 32 (64, 128, 256) to speed up modeling training.