## Train a character-level GPT on some text data

The inputs here are simple text files, which we chop up to individual characters and then train GPT on. So you could say this is a char-transformer instead of a char-rnn. Doesn't quite roll off the tongue as well. In this example we will feed it some Shakespeare, which we'll get it to predict character-level.

In [1]:
# set up logging
import logging
logging.basicConfig(
        format="%(asctime)s - %(levelname)s - %(name)s -   %(message)s",
        datefmt="%m/%d/%Y %H:%M:%S",
        level=logging.INFO,
)

In [2]:
# make deterministic
from mingpt.utils import set_seed
set_seed(42)

In [3]:
import numpy as np
import torch
import torch.nn as nn
from torch.nn import functional as F
from mingpt.utils import CharDataset

In [4]:
block_size = 128 # spatial extent of the model for its context

In [5]:
# you can download this file at https://github.com/karpathy/char-rnn/blob/master/data/tinyshakespeare/input.txt
text = open('input.txt', 'r').read() # don't worry we won't run out of file handles
train_dataset = CharDataset(text, block_size) # one line of poem is roughly 50 characters

data has 1115394 characters, 65 unique.


In [6]:
from mingpt.model import GPT, GPTConfig
mconf = GPTConfig(train_dataset.vocab_size, train_dataset.block_size,
                  n_layer=8, n_head=8, n_embd=512)
model = GPT(mconf)

09/12/2022 15:26:12 - INFO - mingpt.model -   number of parameters: 2.535219e+07


In [8]:
from mingpt.trainer import Trainer, TrainerConfig

# initialize a trainer instance and kick off training
tconf = TrainerConfig(max_epochs=2, batch_size=512, learning_rate=6e-4,
                      lr_decay=True, warmup_tokens=512*20, final_tokens=2*len(train_dataset)*block_size,
                      num_workers=4)
trainer = Trainer(model, train_dataset, None, tconf)
trainer.train()

epoch 1 iter 1: train loss 3.65697. lr 5.999997e-04:   0%|    | 2/2179 [06:11<112:17:26, 185.69s/it]


KeyboardInterrupt: 

In [9]:
##### alright, let's sample some character-level Shakespeare
from mingpt.utils import sample

context = "O God, O God!"
x = torch.tensor([train_dataset.stoi[s] for s in context], dtype=torch.long)[None,...].to(trainer.device)
# x[None, ...] == x.unsqueeze(0)
y = sample(model, x, 2000, temperature=1.0, sample=True, top_k=10)[0]
completion = ''.join([train_dataset.itos[int(i)] for i in y])
print(completion)

O God, O God!dwnwcd$wfdf$fc$wHwnnd?f$n?dwn$w$dvd$$s$Hdss$HvnH$wHcwH$sssnHdHd$Hw$&sv?ss??HHw$&s$wcsHs$d$Hw$M$H$?$Hs?Hsv&dccdIdsc$d$Hdwvs$$dHM&$w?H?v---vvd&HswH&H$Hwvkwk-$kkXkv$HH$XR$?k$$-&-?$??&kw$w&&&$$RXv?Rw?MX--vvRwHHvXRkRvHkHk-????Rw$kRvH?wkX$$k&k&k-RXX-RR-RvRvk-v$Mhn&-Hs$-Hk?$HHknR?$swHHnkHHnnws$Hskdwn-?dH?ndIws$H?$sH$-dkknHH$nHH$&$kkksk?w?svk$?skcks?HwcsskwvskvwdHswckHHH?wndcH?vHvk$w$wcdcsssvd?vwwH$wd&skc-$HdHdHHcvc??cssH--HH-&d$$$H$&d$skHsH$Hvdcw$ssv$$&$cdk&$$wwvk-H&$&v&$&-k-$w&-v-wXkw--vR$&vXH?k?&-&XvkR$k&$H$?v?$?-w$&$MRH$w&XHHw&&HH&wwHwMRHXkH$$$kwH--MFH&-XRHRvkHXFH?$$w-Xk-HsH$wRRsohnhndIhHwwhace-ekned?whaHsRswH-$nsHs?w-vwdH???HnHkk-c-d$Hs$ncnc-ks$dnHH$c$c$dnc$sHHkndswnd$HHc$cc$&Hw$cd&Hc$$v&dwvcH$w&csHnsc$$s&HvcwwwddHssHv$dk$w&w?s$dH?$?&c-&&&&Hdd$vHM$vM$$$HH$-w?H?HHv?-HMH&$$&&&H-HvvX$M&$M$?&-&H$RRXH&-$X-$.M?vH-&RHv&$-X-X--w&HXH?$MXRH?M-Hw$H?Mw--HMw$w-kk$MXMM--R??RFRk-kXkne$RH$iHB$-Hsk$s?nkshn$e??HsRnk-shnndsH?csnc$d$sHcskH?$ncccHscnws$HkH$cHnskd$d$vwk?sdcknkwHn$newHHdcesswneHwnH

In [None]:
# well that was fun