## Creating token embeddings

In [1]:
import torch

In [2]:
input_ids = torch.tensor([2, 3, 5, 1])

In [3]:
vocab_size = 6
output_dim = 3

torch.manual_seed(123)
embedding_layer = torch.nn.Embedding(vocab_size, output_dim)


The print statement in the code prints the embedding layer's underlying weight matrix

In [4]:
print(embedding_layer.weight)

Parameter containing:
tensor([[ 0.3374, -0.1778, -0.1690],
        [ 0.9178,  1.5810,  1.3010],
        [ 1.2753, -0.2010, -0.1606],
        [-0.4015,  0.9666, -1.1481],
        [-1.1589,  0.3255, -0.6315],
        [-2.8400, -0.7849, -1.4096]], requires_grad=True)


In [5]:
print(embedding_layer(torch.tensor([3])))

tensor([[-0.4015,  0.9666, -1.1481]], grad_fn=<EmbeddingBackward0>)


We have seen that how to convert a single token id into three dimensional embedding vector. Let's now add (torch.tensor([2, 3, 5, 1]))

In [6]:
print(embedding_layer(input_ids))

tensor([[ 1.2753, -0.2010, -0.1606],
        [-0.4015,  0.9666, -1.1481],
        [-2.8400, -0.7849, -1.4096],
        [ 0.9178,  1.5810,  1.3010]], grad_fn=<EmbeddingBackward0>)


## Positional Embeddings (Encoding word positions)

In [7]:
vocab_size = 50257
output_dim = 256

token_embedding_layer = torch.nn.Embedding(vocab_size, output_dim)

In [8]:
from main import create_dataloader_v1, GPTDatasetV1

In [9]:
with open("the-verdict.txt", "r", encoding="utf-8") as f:
    raw_text = f.read()

In [10]:
max_length = 4
dataloader = create_dataloader_v1(raw_text, batch_size=8, max_length=max_length,
                                  stride=max_length, shuffle=True)
data_iter = iter(dataloader)
inputs, targets = next(data_iter)

In [11]:
print("Token ID: \n", inputs)
print("\nInputs shape:\n", inputs.shape)

Token ID: 
 tensor([[24818,   417,    12, 12239],
        [  314,  3114,   379,   262],
        [ 2156,   286,  4116,    13],
        [  866,   262,  2119,    11],
        [ 3363,    11,   340,   373],
        [  198,     1,    40,  2900],
        [  465, 14475,    13,   198],
        [ 3081,   286,  2045,  1190]])

Inputs shape:
 torch.Size([8, 4])


In [12]:
token_embeddings = token_embedding_layer(inputs)
print(token_embeddings.shape)

torch.Size([8, 4, 256])


As we can tell based on the 8*4*256 dimensional tensor output, each token id is now embedded as a 256-dimensional vector

In [13]:
context_length = max_length
pos_embedding_layer = torch.nn.Embedding(context_length, output_dim)


In [15]:
pos_embeddings = pos_embedding_layer(torch.arange(max_length))
print(pos_embeddings.shape)

torch.Size([4, 256])


In [17]:
input_embeddings = token_embeddings + pos_embeddings
print(input_embeddings.shape)

torch.Size([8, 4, 256])


The input_embeddings we created are the embedded input examples that can now be processed by the main LLM modules