![positional_encoding-2.png](attachment:positional_encoding-2.png)

^ Location of positional encoding in our process ^

**Process up to this point:**

We start with the sentence: My name is Grant

Pad the rest of the words that are not present with a dummy character or dummy sequence input (max number allowed words in our input)

Vocab size is the number of words we can use in our input

Pass each vector of sequence to a feed-forward network

Output is 512 dimensional vectors, one for each input of the sequence (each word)

For each word vector, we want to generate a query, key, and value all of 512 dimensions each

Pass each qkv vector into weights to be adjusted

Pass each output qkv vector into an attention unit

**Why do we care about positional encoding?**

It allows for inputs in the sequence to attend to other inputs that are further away from it in a much more attractable way

![pe_formula.png](attachment:pe_formula.png)

In [26]:
import torch
import torch.nn as nn

max_sequence_length = 10 # Maximum number of words we are passing
d_model = 6              # Typically 512 / for demo it is 6

In [27]:
even_i = torch.arange(0, d_model, 2).float() # From 0 to d_model(6), skipping 2
even_i                                       # Going to give us even indices' values

tensor([0., 2., 4.])

In [28]:
even_denominator = torch.pow(10000, even_i/d_model) 
even_denominator

tensor([  1.0000,  21.5443, 464.1590])

In [29]:
odd_i = torch.arange(1, d_model, 2).float() # From 0 to d_model(6), skipping 2
odd_i                                       # Going to give us even indices' values

tensor([1., 3., 5.])

In [30]:
even_denominator = torch.pow(10000, (odd_i-1)/d_model) 
even_denominator

tensor([  1.0000,  21.5443, 464.1590])

In [31]:
# Because both even and odd indices effectively have the same denominator
denominator = even_denominator

In [32]:
position = torch.arange(max_sequence_length, dtype=torch.float).reshape(max_sequence_length, 1)

In [33]:
position

tensor([[0.],
        [1.],
        [2.],
        [3.],
        [4.],
        [5.],
        [6.],
        [7.],
        [8.],
        [9.]])

In [34]:
even_PE = torch.sin(position / denominator)
odd_PE = torch.cos(position / denominator)

In [35]:
even_PE

tensor([[ 0.0000,  0.0000,  0.0000],
        [ 0.8415,  0.0464,  0.0022],
        [ 0.9093,  0.0927,  0.0043],
        [ 0.1411,  0.1388,  0.0065],
        [-0.7568,  0.1846,  0.0086],
        [-0.9589,  0.2300,  0.0108],
        [-0.2794,  0.2749,  0.0129],
        [ 0.6570,  0.3192,  0.0151],
        [ 0.9894,  0.3629,  0.0172],
        [ 0.4121,  0.4057,  0.0194]])

In [36]:
odd_PE

tensor([[ 1.0000,  1.0000,  1.0000],
        [ 0.5403,  0.9989,  1.0000],
        [-0.4161,  0.9957,  1.0000],
        [-0.9900,  0.9903,  1.0000],
        [-0.6536,  0.9828,  1.0000],
        [ 0.2837,  0.9732,  0.9999],
        [ 0.9602,  0.9615,  0.9999],
        [ 0.7539,  0.9477,  0.9999],
        [-0.1455,  0.9318,  0.9999],
        [-0.9111,  0.9140,  0.9998]])

In [38]:
import torch
import torch.nn as nn

class PositionalEncoding(nn.Module):
    
    def __init__(self, d_model, max_sequence_length):
        super().__init__()
        self.max_sequence_length = max_sequence_length
        self.d_model = d_model
    
    def forward(self):
        even_i = torch.arange(0, self.d_model, 2).float()
        denominator = torch.pow(10000, even_i / self.d_model)
        position = torch.arange(self.max_sequence_length).reshape(self.max_sequence_length, 1)
        even_PE = torch.sin(position / denominator)
        odd_PE = torch.cos(position / denominator)
        stacked = torch.stack([even_PE, odd_PE], dim = 2)
        PE = torch.flatten(stacked, start_dim = 1, end_dim = 2)
        return PE

In [39]:
pe = PositionalEncoding(d_model = 6, max_sequence_length=10)
pe.forward()

tensor([[ 0.0000,  1.0000,  0.0000,  1.0000,  0.0000,  1.0000],
        [ 0.8415,  0.5403,  0.0464,  0.9989,  0.0022,  1.0000],
        [ 0.9093, -0.4161,  0.0927,  0.9957,  0.0043,  1.0000],
        [ 0.1411, -0.9900,  0.1388,  0.9903,  0.0065,  1.0000],
        [-0.7568, -0.6536,  0.1846,  0.9828,  0.0086,  1.0000],
        [-0.9589,  0.2837,  0.2300,  0.9732,  0.0108,  0.9999],
        [-0.2794,  0.9602,  0.2749,  0.9615,  0.0129,  0.9999],
        [ 0.6570,  0.7539,  0.3192,  0.9477,  0.0151,  0.9999],
        [ 0.9894, -0.1455,  0.3629,  0.9318,  0.0172,  0.9999],
        [ 0.4121, -0.9111,  0.4057,  0.9140,  0.0194,  0.9998]])