This is not the main notebook in this challenge. Start with `understand-engine.ipynb`.

### Add `generate()` to `gpt.py`

This is what he calls the "naive autoregressive streaming inference." It doesn't use kv cache. It's similar to what I was doing by hand in earlier challenges with a few options like top_k and temperature.

In [1]:
import os
import sys
sys.path.append('../my_nanochat')
import torch
import torch.nn.functional as F
from my_nanochat.my_common import get_base_dir
from my_nanochat.my_checkpoint_manager import build_model

#### Snippets of code to help follow the code in `generate()`

top_k

In [2]:
logits = torch.randn((2,5,10)) # (B, T, V)
logits = logits[:,-1,:]
logits

tensor([[-0.4741,  0.3539,  0.8900,  1.4903, -0.6973,  1.5847, -0.8399,  1.8785,
         -0.2817,  2.0971],
        [-0.5197,  0.9771, -1.7021, -0.0772, -1.1148,  0.0348,  1.7367,  0.9720,
         -0.0634,  0.0729]])

In [3]:
v, _ = torch.topk(logits, 3)
v

tensor([[2.0971, 1.8785, 1.5847],
        [1.7367, 0.9771, 0.9720]])

In [4]:
v[:, [-1]]

tensor([[1.5847],
        [0.9720]])

In [5]:
logits[logits < v[:, [-1]]] = -float('Inf')
logits

tensor([[  -inf,   -inf,   -inf,   -inf,   -inf, 1.5847,   -inf, 1.8785,   -inf,
         2.0971],
        [  -inf, 0.9771,   -inf,   -inf,   -inf,   -inf, 1.7367, 0.9720,   -inf,
           -inf]])

temperature

In [6]:
temperature = 0.8

In [7]:
logits

tensor([[  -inf,   -inf,   -inf,   -inf,   -inf, 1.5847,   -inf, 1.8785,   -inf,
         2.0971],
        [  -inf, 0.9771,   -inf,   -inf,   -inf,   -inf, 1.7367, 0.9720,   -inf,
           -inf]])

In [8]:
logits = logits / temperature; logits

tensor([[  -inf,   -inf,   -inf,   -inf,   -inf, 1.9809,   -inf, 2.3481,   -inf,
         2.6213],
        [  -inf, 1.2214,   -inf,   -inf,   -inf,   -inf, 2.1709, 1.2150,   -inf,
           -inf]])

In [9]:
probs = F.softmax(logits, dim=-1); probs

tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.2304, 0.0000, 0.3326, 0.0000,
         0.4371],
        [0.0000, 0.2184, 0.0000, 0.0000, 0.0000, 0.0000, 0.5645, 0.2170, 0.0000,
         0.0000]])

Compare a temp of 0.8 and 0.1. At a lower temps, probs gets more "concentrated" on the likely stuff. The higher the temp, the more likely we'll choose an unlikely token.

In [10]:
F.softmax(torch.tensor([.3, .4]) / 0.8, dim=-1)

tensor([0.4688, 0.5312])

In [11]:
F.softmax(torch.tensor([.3, .4]) / 0.1, dim=-1)

tensor([0.2689, 0.7311])

In [12]:
rng = torch.Generator()

In [13]:
torch.multinomial(probs, num_samples=1, generator=rng)

tensor([[9],
        [1]])

In [14]:
torch.multinomial(probs, num_samples=1, generator=rng)

tensor([[7],
        [1]])

#### Try `generate()` just added to `my_gpt.py`

In [15]:
checkpoint_dir = os.path.join(get_base_dir(), "base_checkpoints", "d4")
model, tokenizer, meta_data = build_model(checkpoint_dir, step=10, device=torch.get_default_device(), phase="eval")

Building model with config: {'sequence_len': 128, 'vocab_size': 65537, 'n_layer': 4, 'n_head': 2, 'n_kv_head': 2, 'n_embd': 256}


In [16]:
prompt_tokens = tokenizer.encode('Hello', prepend=tokenizer.get_bos_token_id())
prompt_tokens

[65536, 28466]

In [17]:
tokens = []
tokens.extend(prompt_tokens)
for token in model.generate(prompt_tokens, max_tokens=5):
    tokens.append(token)
tokens

[65536, 28466, 49458, 331, 28461, 46644, 3247]

In [18]:
tokenizer.decode(tokens)

'<bos>Hello dependant on proudly fringes carry'

In [24]:
def try_it(**kwargs):
    tokens = []
    tokens.extend(prompt_tokens)
    for token in model.generate(prompt_tokens, max_tokens=5, **kwargs):
        tokens.append(token)
    print(f"{tokenizer.decode(tokens)} -- {kwargs}")

In [34]:
try_it()
try_it(seed=43)
try_it(seed=44)
try_it(temperature=0)
try_it(temperature=0, seed=43)
try_it(temperature=0, seed=44)
try_it(top_k=5)
try_it(top_k=5, seed=43)
try_it(top_k=5, seed=44)
try_it(top_k=1) # expect same as temperature = 0

<bos>Hello dependant on proudly fringes carry -- {}
<bos>Hello assumptions Auss unoccupiedibial inactive -- {'seed': 43}
<bos>Helloandy enchocr)|| inverter -- {'seed': 44}
<bos>Hello most most most most most -- {'temperature': 0}
<bos>Hello most most most most most -- {'temperature': 0, 'seed': 43}
<bos>Hello most most most most most -- {'temperature': 0, 'seed': 44}
<bos>Hello most most freight by most -- {'top_k': 5}
<bos>Hello freight most most by freight -- {'top_k': 5, 'seed': 43}
<bos>Hello most freight freight demand transport -- {'top_k': 5, 'seed': 44}
<bos>Hello most most most most most -- {'top_k': 1}
