## Positional Encoding

This notebook will code positional encoding for Transformer neural networks with pytrch

In [30]:
import torch
import torch.nn as nn

max_sequence_length = 10
d_model = 6

$$
PE(\text{position}, 2i) = \sin\bigg( \frac{ \text{position} }{10000^\frac{2i}{d_{model}}} \bigg)
$$

$$
PE(\text{position}, 2i+1) = \cos\bigg( \frac{ \text{position} }{10000^\frac{2i}{d_{model}}} \bigg)
$$

We can rewrite these as

$$
PE(\text{position}, i) = \sin\bigg( \frac{ \text{position} }{10000^\frac{i}{d_{model}}} \bigg) \text{ when i is even}
$$

$$
PE(\text{position}, i) = \cos\bigg( \frac{ \text{position} }{10000^\frac{i-1}{d_{model}}} \bigg) \text{ when i is odd}
$$

In [31]:
even_i = torch.arange(0, d_model, 2).float()
even_i

In [32]:
even_denominator = torch.pow(10000, even_i/d_model)
even_denominator

In [33]:
odd_i = torch.arange(1, d_model, 2).float()
odd_i

In [34]:
even_denominator = torch.pow(10000, (odd_i - 1)/d_model)
even_denominator

`even_denominator` and `odd_denominator` are the same! So we can just do one of these actions and call the resulting variable `denominator`

In [35]:
denominator = even_denominator

In [36]:
position = torch.arange(max_sequence_length, dtype=torch.float).reshape(max_sequence_length, 1)

In [37]:
position

In [38]:
even_PE = torch.sin(position / denominator)
odd_PE = torch.cos(position / denominator)

In [39]:
even_PE

In [40]:
even_PE.shape

In [41]:
odd_PE

In [42]:
odd_PE.shape

In [43]:
stacked = torch.stack([even_PE, odd_PE], dim=2)
stacked.shape

In [44]:
PE = torch.flatten(stacked, start_dim=1, end_dim=2)
PE

## Class

Let's combine all the code above into a cute class

In [45]:
import torch
import torch.nn as nn

class PositionalEncoding(nn.Module):

    def __init__(self, d_model, max_sequence_length):
        super().__init__()
        self.max_sequence_length = max_sequence_length
        self.d_model = d_model

    def forward(self):
        even_i = torch.arange(0, self.d_model, 2).float()
        denominator = torch.pow(10000, even_i/self.d_model)
        position = torch.arange(self.max_sequence_length).reshape(self.max_sequence_length, 1)
        even_PE = torch.sin(position / denominator)
        odd_PE = torch.cos(position / denominator)
        stacked = torch.stack([even_PE, odd_PE], dim=2)
        PE = torch.flatten(stacked, start_dim=1, end_dim=2)
        return PE

In [46]:
pe = PositionalEncoding(d_model=6, max_sequence_length=10)
pe.forward()

Happy Coding!