# Poem Generation using GPT

In this notebook, we will generate a simple shakespearian poem using Generative Pretrained Transformers (GPT) that we implemented. This notebook will demonstrate poem generation character by character rather than the typical word by word generation. 

# Importing Libraries


In [16]:
import torch
import numpy as np

from gpt import GPT, GPTconfig
from gpt import TrainingConfig, Trainer
from gpt import sample_context
# Could have imported all of them at once. Doesn't matter :)

Setting Manual Seed to avoid varying results with every run.

In [17]:
from gpt.utils.utils import seed_all

seed_all(0)

## Poem Dataset class 

We will now use the Dataset class from `torch.utils.data` to setup our own dataset class for the dataloader. This class is responsible for loading the data from disk and generating chunks of characters. The training data will be a chunk of characters where chunk is a block_size (T). The targets for training would be the same as the training data but offset by one character. 

In [18]:
from torch.utils.data import Dataset as Dataset

class CharacterDataset(Dataset):
        def __init__(self, data, block_size):
            characters = sorted(list(set(data)))
            data_size, vocab_size = len(data), len(characters)
            
            print(f"Dataset has {data_size} characters. {vocab_size} of characters are unique.")
            
            # char to idx mapping and vice-versa
            self.stoi = {ch:i for i,ch in enumerate(characters)}
            self.itos = {i:ch for i,ch in enumerate(characters)}
            
            self.block_size = block_size
            self.vocab_size = vocab_size
            self.data_size = data_size
            self.data = data
            
        def __len__(self):
            return self.data_size - self.block_size
        
        def __getitem__(self, idx):
            # take a chunk of data from the given index from the dataset
            chunk = self.data[idx : idx + self.block_size + 1]
            
            #convert the chunk to integers
            data = [self.stoi[ch] for ch in chunk]
            
            # create x and y. 
            # x will contain every but the last character in the chunk.
            # y will contain every but the first character in the chunk.
            # Hence this will create an offset in targets by 1.
            # Thus helps in language modelling. Given a character, the goal of the transformer would be to predict the next character in sequence.
            
            x = torch.tensor(data[:-1], dtype=torch.long) # nn.Embedding requires input data to be in torch.long
            y = torch.tensor(data[1:], dtype=torch.long)
            
            return x,y

In [19]:
DATA_PATH = "./data/input.txt"
BLOCK_SIZE = 128 # spatial context of the transformer

In [20]:
file = open(DATA_PATH, "r")
data = file.read(-1) # -1 means read the whole file. If file size is large, you may want to consider replacing it with the number of characters to be read.

In [21]:
dataset = CharacterDataset(data = data, block_size = BLOCK_SIZE) 

Dataset has 0 characters. 0 of characters are unique.


### Visualizing the dataset

In [22]:
batch = dataset[1] # returns a tuple of tensors at idx = 0. Feel free to chnage the idx
x, y = batch
x = x.tolist()
y = y.tolist()

for i in range(len(x)):
    x[i] = dataset.itos[x[i]]
    y[i] = dataset.itos[y[i]]

print(f"\033[1mTraining Data : \033[0m\n\n{''.join(x)}")
print(f"\n\033[1mTargets : \033[0m\n\n{''.join(y)}")

[1mTraining Data : [0m



[1mTargets : [0m




## Configuring GPT

Now that we have finished setting up the dataset class, its now time to train our GPT model on this dataset. Before we start training, we will configure the GPT with appropriate model parameters. 

Because original GPT (referring to GPT3) requires huge computational resources, we will be using a smaller GPT model. This model, though being small, is by itself a very good model. A single layer model can learn to generate poems with fairly good accuracy. 

In [23]:
gpt_config = GPTconfig(num_layers = 2, 
                       n_heads = 12, 
                       embd_size = 768, 
                       vocab_size = dataset.vocab_size, 
                       block_size = dataset.block_size
                      )

In [24]:
model = GPT(gpt_config)

Number of Trainable Parameters :  14275584


## Training the GPT model

GPT model has been configured. Now it is time to train it on our dataset. As mentioned earlier, training the model requires lot of computational time and resources. So the amount of time taken to train depends on the kind of system you have. The training loop is designed to work with multiple GPUs if you have access to them. 

In [25]:
train_config = TrainingConfig(max_epochs = 2, 
                              batch_size = 256, 
                              lr_decay = True, 
                              lr = 6e-4,
                              warmup_tokens = 512*20,
                              final_tokens = 2 * len(dataset) * dataset.block_size,
                              ckpt_path = "./checkpoints/transformers.pt"
                             )

ValueError: __len__() should return >= 0

In [None]:
trainer = Trainer(model = model, train_set = dataset, test_set = None, configs = train_config)
trainer.train()

: 

## Let's generate poems

The model has been trained and it would have learnt the mappings of different sequences. Now, we will seed it with a starting context and ask the model to predict the next character seq by seq until we are done sampling.

In [None]:
seed_context = "Help me!"
x = torch.tensor([dataset.stoi[s] for s in seed_context], dtype=torch.long, device=trainer.device)[None,...]
y = sample_context(model=model, x=x, steps=10000, temperature=1.0, sample=True, top_k=10)[0]

: 

In [None]:
y = y.tolist()
y = [dataset.itos[i] for i in y]
y = "".join(y)

: 

In [None]:
print(f"\033[1mGenerated Data : \033[0m\n\n{y}")

: 

## Conclusion

This notebook shows that transformers can learn to generate not just word by word but can also go one step further and generate character by character. Generating poems character by character is a hard task. The model must learn to recognise characters from scratch. Sequences of characters must be joined together to form meaning full sentences. 

Self attention modules in transformer architecture learn to pay different amounts of "attention" to different words (here characters). This helps the model to learn effectively and hence perform well in langauge modelling tasks.