<a href="https://colab.research.google.com/github/AatishKumar649/Positional-Embeddings/blob/main/Positional_Embeddings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **POSITIONAL EMBEDDINGS (ENCODING WORD POSITIONS)**

Previously, we focused on very small embedding sizes in this chapter for illustration purposes.

We now consider more realistic and useful embedding sizes and encode the input tokens into a 256-dimensional vector representation.

This is smaller than what the original GPT-3 model used (in GPT-3, the embedding size is 12,288 dimensions) but still reasonable for experimentation.

Furthermore, we assume that the token IDs were created by the BPE tokenizer that we implemented earlier, which has a vocabulary size of 50,257:

In [3]:
import torch

In [4]:
vocab_size = 50257
output_dim = 256

token_embedding_layer = torch.nn.Embedding(vocab_size, output_dim)

Using the token_embedding_layer above, if we sample data from the data loader, we embed each token in each batch into a 256-dimensional vector. If we have a batch size of 8 with four tokens each, the result will be an 8 x 4 x 256 tensor.

Let's instantiate the data loader ( Data sampling with a sliding window), first:

In [7]:
import torch

def create_dataloader_v1(raw_text, batch_size, max_length, stride, shuffle):
    """
    Creates a data loader for the given raw text.
    This is a placeholder function. You need to implement
    the logic for creating your data loader.
    """
    # TODO: Implement your data loader logic here
    # For demonstration purposes, this will return sample data
    inputs = torch.randint(0, 50257, (batch_size, max_length))
    targets = torch.randint(0, 50257, (batch_size, max_length))
    # Yielding the inputs and targets to create an iterator
    yield inputs, targets

max_length = 4
dataloader = create_dataloader_v1(
    "sample text", batch_size=8, max_length=max_length,
    stride=max_length, shuffle=False
)
data_iter = iter(dataloader)  # Assuming dataloader is iterable
inputs, targets = next(data_iter)
print(inputs, targets)

tensor([[24563, 29632, 29552,  4034],
        [33710, 19062, 39179,  8046],
        [34677, 31519, 14879, 25354],
        [16692, 24895, 39011, 37020],
        [ 2768, 47948, 46043, 41449],
        [34554, 14280, 46822, 43036],
        [22652, 16325, 25595, 21150],
        [ 9183, 28725, 46603, 44440]]) tensor([[23612,  7600, 45678, 34110],
        [14648, 12494,  5021, 45063],
        [10232, 37834, 46608, 11488],
        [32077, 39847, 33408,  2715],
        [33998, 20342, 16349,  6792],
        [11609, 15722,   988, 48385],
        [33751,  9847, 49856, 37880],
        [40308, 36629,  1837, 25360]])


In [8]:
print("Token IDs:\n", inputs)
print("\nInputs shape:\n", inputs.shape)

Token IDs:
 tensor([[24563, 29632, 29552,  4034],
        [33710, 19062, 39179,  8046],
        [34677, 31519, 14879, 25354],
        [16692, 24895, 39011, 37020],
        [ 2768, 47948, 46043, 41449],
        [34554, 14280, 46822, 43036],
        [22652, 16325, 25595, 21150],
        [ 9183, 28725, 46603, 44440]])

Inputs shape:
 torch.Size([8, 4])


As we can see, the token ID tensor is 8x4-dimensional, meaning that the data batch consists of 8 text samples with 4 tokens each.

Let's now use the embedding layer to embed these token IDs into 256-dimensional vectors:

In [9]:
token_embeddings = token_embedding_layer(inputs)
print(token_embeddings.shape)

torch.Size([8, 4, 256])


As we can tell based on the 8x4x256-dimensional tensor output, each token ID is now embedded as a 256-dimensional vector.

For a GPT model's absolute embedding approach, we just need to create another embedding layer that has the same dimension as the token_embedding_layer:

In [10]:
context_length = max_length
pos_embedding_layer = torch.nn.Embedding(context_length, output_dim)

In [11]:
pos_embeddings = pos_embedding_layer(torch.arange(max_length))
print(pos_embeddings.shape)

torch.Size([4, 256])


As shown in the preceding code example, the input to the pos_embeddings is usually a placeholder vector torch.arange(context_length), which contains a sequence of numbers 0, 1, ..., up to the maximum input length − 1.

The context_length is a variable that represents the supported input size of the LLM.

Here, we choose it similar to the maximum length of the input text.

In practice, input text can be longer than the supported context length, in which case we have to truncate the text.

As we can see, the positional embedding tensor consists of four 256-dimensional vectors. We can now add these directly to the token embeddings, where PyTorch will add the 4x256- dimensional pos_embeddings tensor to each 4x256-dimensional token embedding tensor in each of the 8 batches:

In [12]:
input_embeddings = token_embeddings + pos_embeddings
print(input_embeddings.shape)

torch.Size([8, 4, 256])


The input_embeddings we created are the embedded input examples that can now be processed by the main LLM modules