## Transformers From Scratch

Notebook where I build a Transformer from scratch using PyTorch neural networks.

### Read In Input File

In [1]:
#read in shakespear input file
with open('../notebooks/data/input.txt', 'r', encoding = 'utf-8') as file:
    text = file.read()

In [2]:
#get all the unique characters that occur in the text
chars = sorted(list(set(text)))
vocab_size = len(chars)
print(''.join(chars))
print(vocab_size)


 !$&',-.3:;?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
65


In [3]:
#create a mapping of characters to integers
string_to_integer = {ch: i for i, ch in enumerate(chars)}
#create a mapping of integers to characters
integer_to_string = {i: ch for i, ch in enumerate(chars)}

Encoding our initial text into a PyTorch tensor using our simple encodings.

In [4]:
#build simple encoder that takes a string and outputs a list of integers
encoder = lambda input_string: [string_to_integer[character] for character in input_string]
#build a simple decoder that takes a list of integers, and outputs a string
decoder = lambda input_list: ''.join([integer_to_string[integer] for integer in input_list])

In [5]:
import torch

#encode entire text dataset and store it into a torch.Tensor
data = torch.tensor(encoder(text), dtype=torch.long)
print(data[:10])

tensor([18, 47, 56, 57, 58,  1, 15, 47, 58, 47])


In [6]:
#get first 90% characters
n = int(0.9 * len(data))
#splitting our dataset into training and validation sets for future use
train_data = data[:n]
val_data = data[n:]

In [7]:
#make batches for training
block_size = 8
train_data[:block_size+1]

tensor([18, 47, 56, 57, 58,  1, 15, 47, 58])

In [12]:
#training visualizer
x = train_data[:block_size]
y = train_data[1:block_size+1]

#see training inputs
for trial in range(block_size):
    context = x[:trial+1]
    target = y[trial]
    print(f"when input is {context} the target is: {target}")

when input is tensor([18]) the target is: 47
when input is tensor([18, 47]) the target is: 56
when input is tensor([18, 47, 56]) the target is: 57
when input is tensor([18, 47, 56, 57]) the target is: 58
when input is tensor([18, 47, 56, 57, 58]) the target is: 1
when input is tensor([18, 47, 56, 57, 58,  1]) the target is: 15
when input is tensor([18, 47, 56, 57, 58,  1, 15]) the target is: 47
when input is tensor([18, 47, 56, 57, 58,  1, 15, 47]) the target is: 58


Now that we have visualized how batching will work we will create some batch dimensions.

In [18]:
#seeding for replication
torch.manual_seed(1337)
#the number of batches used for training
batch_size = 4
#the maximum context length for predictions
block_size = 8

#function that generates batches
def get_batch(split):
    #differentiate between training and validation
    if split == "train":
        data = train_data
    else:
        data = val_data
    #getting random indices between data for sampling of batches
    random_index = torch.randint(len(data) - block_size, (batch_size, ))
    #sampling batches and stacking them to same dimension
    x = torch.stack([data[index:index+block_size] for index in random_index])
    #sampling batches for target variable
    y = torch.stack([data[index+1:index+block_size+1] for index in random_index])
    return x, y
    
