# Transformers 101

This notebook serves as an exploration of the transformer architecture (Vaswani et. al.) Here, we'll implement in native PyTorch the basic building blocks of the transformer and then put them all together so we have a model architecture to put into `../models`

In the process of putting this together (much like my other exploratory projects) I tried to limit viewing existing code online, and primarily used my notes (pdf attached for anyone interested) as a foundation for this work.

In [1]:
import torch

We want something with output dims: (sequence_length, output_dim)

In [41]:
def positional_encoding(input_tensor: torch.Tensor, output_dim: int, n=10000): 
    """
    First block in the transformer architecture, the positional embedding
    layer. Here, we implement the naive approach from the original 
    paper with the sin and cosine functions. 
    """
    P = torch.zeros((input_tensor.shape[-1], output_dim))
    indices = torch.arange(input_tensor.size(-1))
    i_values = torch.arange(int(output_dim/2))
    denominators = torch.float_power(n, 2*i_values/output_dim)

    # Compute sin and cos values simultaneously for all indices
    P[:, 0::2] = torch.sin(indices.unsqueeze(1) / denominators.unsqueeze(0)) # start at 0, step by 2 sin for even nums
    P[:, 1::2] = torch.cos(indices.unsqueeze(1) / denominators.unsqueeze(0)) # start at 1, step by 2 cos for odd nums
