# Positional embeddings

## Why ?
- Self attention model do not have inherent notion of position unlike RNNs. 
- Position, ordering matters in language. Same word in different order could mean different things. 

## Types of embeddings
### Absolute positional embedding
- Positional information is encoded through a trainable embedding matrix that converts integer positions into embedding vectors. 
- Different dimensions encode position information captured at different frequencies
- Easy to implement with standard embedding layers. It has poor sequence length extrapolation because it lacks knowledge of relative positioning. 

### Fixed sinusoidal embedding 
- No learned parameters, fixed sin and cosine embedding. Extrapolates to longer sequences.

### Relative positional embedding 
- Encodes relative distance between tokens rather than absolute positions. 

### Rotary positional embedding
- Has both relative and absolute positions. 


## Resources
- GO through https://huggingface.co/blog/designing-positional-encoding


In [5]:
import torch
import torch.nn as nn
import tiktoken

In [None]:
vocab_size = 50257
embedding_dim = 2
max_context_length = 5

input_text = "Hello my name is Ajay"
tokenizer = tiktoken.get_encoding("gpt2")

print(input_text)

input_tokens = tokenizer.encode(input_text)
print(input_tokens)

input_tensor = torch.tensor(input_tokens[:max_context_length]).unsqueeze(0)
print(input_tensor.shape)

token_embedding_layer = nn.Embedding(vocab_size, embedding_dim)
input_embedding = token_embedding_layer(input_tensor)

print(input_embedding)


Hello my name is Ajay
[15496, 616, 1438, 318, 22028, 323]
torch.Size([1, 5])
tensor([[[-0.7183,  0.0627],
         [-0.6489,  1.2228],
         [ 0.7233, -0.0964],
         [-0.4918,  0.6275],
         [-0.1155, -0.1441]]], grad_fn=<EmbeddingBackward0>)


### Absolute positional embedding

In [19]:
absolute_position_embedding_layer = nn.Embedding(max_context_length, embedding_dim)

absolute_position_embedding = absolute_position_embedding_layer(torch.arange(max_context_length).unsqueeze(0))

print(f"Shape of absolute position embedding vector: {absolute_position_embedding.shape}")

print(absolute_position_embedding)

input_plus_absolute_position_embedding = input_embedding + absolute_position_embedding

print(f"Shape of input plus absolute position embedding: {input_plus_absolute_position_embedding.shape}")

print(input_plus_absolute_position_embedding)



Shape of absolute position embedding vector: torch.Size([1, 5, 2])
tensor([[[ 0.7474, -0.4037],
         [-0.2403, -1.0787],
         [ 0.7312,  0.1028],
         [ 0.0545,  0.0521],
         [ 0.6613,  0.8048]]], grad_fn=<EmbeddingBackward0>)
Shape of input plus absolute position embedding: torch.Size([1, 5, 2])
tensor([[[ 0.0291, -0.3410],
         [-0.8892,  0.1441],
         [ 1.4545,  0.0065],
         [-0.4373,  0.6796],
         [ 0.5458,  0.6607]]], grad_fn=<AddBackward0>)
