<a href="https://colab.research.google.com/github/RCortez25/PhD/blob/main/LLM/3.%20Positional%20embeddings/Positional_embeddings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

We use positional embeddings to account for the position of a token in a given sentence.

**Absolute**: A unique embedding is added to the token embeddings. That is

Token embeddings + Positional embeddings = Input embeddings

For example:

For the first token:
$[1,2,3]+[1.1,1.2,1.3]=[1.1,3.1,4.3]$

For the second token:
$[4,5,6]+[2.1,2.2,2.3]=[6.1,7.2,8.3]$

Depending on the position (first, second, etc.), a positional embedding is added. However, this method does not account for the relative position of each token, which can be of importance.

**Relative**: In this case, the model learns the relationship between tokens taking into account their relative position in the sencente. The model generalizes better to sequences of varying lengths.

Both methods are good and the use of one over the other depends on the problem and the nature of the data beign processed.

**Absolute** is good when fixed order of tokens is important, for instance, qwhen generating a sequence (time series, physics, etc). GPT was trained using this, as well as the original transformer.

**Relative** is good for language modeling over long sequences where the same phrase can appearin different parts of the sequence.

This positional embeddings are also optimized during training.

Now, let's look at a simple example.

In [7]:
import torch

# Using Data_loader notebook as a .py file in order to import the classes
# This file is uploaded to the Colab workspace

from data_loader import create_dataloader_v1

# Import the text
with open('/content/the-veredict.txt', 'r', encoding='utf-8') as f:
    raw_text = f.read()

In [8]:
# Parameters for the embedding layer
# In this case, we're going to map each token to a 256-dimensional space
vocabulary_size = 50257
output_dimension = 256

# Create an embedding layer
embedding_layer = torch.nn.Embedding(vocabulary_size, output_dimension)

In [11]:
# Create the dataloader to tokenize the text
dataloader = create_dataloader_v1(
    text=raw_text, batch_size=8, context_size=4, stride=4, shuffle=False)

In [18]:
# Create the iterator
iterator = iter(dataloader)

# Check the first batch of tokens
inputs, targets = next(iterator)

print(f'Input tokens:\n {inputs}')
print(f'Target tokens:\n {targets}')
print()
print(f'Shape of inputs: {inputs.shape}')
print(f'Shape of targets: {targets.shape}')

Input tokens:
 tensor([[   40,   367,  2885,  1464],
        [ 1807,  3619,   402,   271],
        [10899,  2138,   257,  7026],
        [15632,   438,  2016,   257],
        [  922,  5891,  1576,   438],
        [  568,   340,   373,   645],
        [ 1049,  5975,   284,   502],
        [  284,  3285,   326,    11]])
Target tokens:
 tensor([[  367,  2885,  1464,  1807],
        [ 3619,   402,   271, 10899],
        [ 2138,   257,  7026, 15632],
        [  438,  2016,   257,   922],
        [ 5891,  1576,   438,   568],
        [  340,   373,   645,  1049],
        [ 5975,   284,   502,   284],
        [ 3285,   326,    11,   287]])

Shape of inputs: torch.Size([8, 4])
Shape of targets: torch.Size([8, 4])


Now, it is time to use the lookup table, i.e., the embedding matrix created by the embedding layer. This will map each token to the 256-dimensional space.

In [19]:
token_vectors = embedding_layer(inputs)

print(f'Shape of token vectors: {token_vectors.shape}')

Shape of token vectors: torch.Size([8, 4, 256])


So we have mapped 8 examples of 4 tokens each, into a 256-dimensional space.

# Creating positional embeddings

We need to create another embedding layer for positional encoding, that is, we need to create another matrix of randomly initialized numbers. This mapping must be to the same dimension of the space, in this case, it must be a 256-dimensional space. Also, the number of rows must be the number of input tokens, in this case, 4, which is equal to the context size.

In [None]:
context_size = 4
ouput_dimension = 256
positional_embedding_layer = torch.nn.Embedding(context_size, ouput_dimension)