This is not the main notebook in this challenge. Start with `understand-engine.ipynb`.

In [1]:
import os
import sys
sys.path.append('../my_nanochat')
import torch

I'm now going to start copying other parts of `engine.py` to `my_engine.py`.

Much of the code looks like it applies after pre-training once we have special tokens, a state machine to keep track of turns, a calculator to evaluate python expressions, etc. I'm going to try to leave all that out for now.

#### sample_next_token()

There is a function `sample_next_token(logits, rng, temperature=1.0, top_k=None)` that looks very similar to `GPT.generate()` that we looked at in `add-generate-to-gpt.ipynb` in this same challenge.

Looks clear except not sure why we need `idx.gather(1, choice)` because aren't we already dealing in indexes? Maybe it's an easy way to get the dimensions to be right? I'll hand copy the function and then play with it a bit. Ah, in copying it, I saw that for top_k we first cut down to just the top k, so after choosing we have to convert back to our original indexes. This is unlike in `GPT.generate()` which achieves the same thing by changing all the non-top-k ones to -inf so they won't be chosen. This way feels a bit cleaner and probably more efficient.

In [2]:
from my_nanochat.my_engine import sample_next_token

In [3]:
B = 2
V = 5
logits = torch.randn((B,V))
logits

tensor([[-0.9216,  1.8903,  0.9157,  0.5659, -0.4409],
        [-0.7331, -0.1370, -1.1918,  0.2197, -0.8647]])

In [4]:
rng = torch.Generator()

In [5]:
sample_next_token(logits, rng)

tensor([[2],
        [1]])

In [6]:
sample_next_token(logits, rng, temperature=5.0)

tensor([[1],
        [4]])

In [7]:
sample_next_token(logits, rng, temperature=5.0, top_k=1) # expect 1, 3

tensor([[1],
        [3]])

#### Engine.generate()

In [1]:
import os
import sys
sys.path.append('../my_nanochat')
import torch
from my_nanochat.my_common import get_base_dir
from my_nanochat.my_checkpoint_manager import build_model
from my_nanochat.my_engine import Engine

In [2]:
checkpoint_dir = os.path.join(get_base_dir(), "base_checkpoints", "d4")
model, tokenizer, meta_data = build_model(checkpoint_dir, step=10, device=torch.get_default_device(), phase="eval")

Building model with config: {'sequence_len': 128, 'vocab_size': 65537, 'n_layer': 4, 'n_head': 2, 'n_kv_head': 2, 'n_embd': 256}


In [3]:
prompt_tokens = tokenizer.encode('Hello', prepend=tokenizer.get_bos_token_id())
prompt_tokens

[65536, 28466]

In [4]:
engine = Engine(model, tokenizer)

In [5]:
for token_column, token_masks in engine.generate(prompt_tokens, max_tokens=10):
    print(token_column)

[49458]
[331]
[28461]
[46644]
[3247]
[3493]
[33440]
[47865]
[21686]
[24330]


In [6]:
for token_column, token_masks in engine.generate(prompt_tokens, max_tokens=10, num_samples=2):
    print(token_column)

[49458, 49458]
[331, 28461]
[46644, 8527]
[3493, 33440]
[47865, 21686]
[24330, 403]
[2236, 1581]
[51928, 10607]
[50547, 40263]
[31660, 47889]


In [7]:
# expect same samples
for token_column, token_masks in engine.generate(prompt_tokens, max_tokens=10, num_samples=2, temperature=0):
    print(token_column)

[668, 668]
[668, 668]
[668, 668]
[668, 668]
[668, 668]
[668, 668]
[668, 668]
[668, 668]
[668, 668]
[668, 668]


#### Engine.generate_batch()

In [8]:
engine.generate_batch(prompt_tokens, max_tokens=10)

([[65536,
   28466,
   49458,
   331,
   28461,
   46644,
   3247,
   3493,
   33440,
   47865,
   21686,
   24330]],
 [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

In [9]:
engine.generate_batch(prompt_tokens, max_tokens=10, num_samples=2)

([[65536,
   28466,
   49458,
   331,
   46644,
   3493,
   47865,
   24330,
   2236,
   51928,
   50547,
   31660],
  [65536,
   28466,
   49458,
   28461,
   8527,
   33440,
   21686,
   403,
   1581,
   10607,
   40263,
   47889]],
 [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])