Exercises for Andrej Karpathy's [Neural Networks: Zero To Hero](https://karpathy.ai/zero-to-hero.html) videos.

This notebook is for Part 2: [The spelled-out intro to language modeling: building makemore](https://www.youtube.com/watch?v=PaCmpygFfXo)

### Preamble: Load data

Objective: Load a list of words from the file ```names.txt``` into a list variable named ```words```.

In [None]:
# TODO: Add code to load names here.
words = open("../names.txt").read().splitlines()
# END TODO

In [None]:
def test_words():
    if not isinstance(words, list):
        print(f"Expected words to be a list")
        return
    if (len_words := len(words)) != (expected_words := 32033):
        print(f"Expected {expected_words} elements in words, found {len_words} elements")
        return
    if (zeroth_word := words[0]) != (expected_zeroth := "emma"):
        print(f"Expected zeroth word in words to be '{expected_zeroth}', was '{zeroth_word}'")
        return
    if (final_word := words[-1]) != (expected_final := "zzyzx"):
        print(f"Expected final word in words to be '{expected_final}', was '{final_word}'")
        return
    print("words looks good. Onwards!")
test_words()

### Step 1: Generate bigrams

Objective: Build a list of bigrams (2-element tuples) 

In [None]:
# TODO: Add code to build a list of bigrams here.
bigrams = []

# TODO Before releasing, find a way to hide solutions.

for word in words:
    bigrams.append(('.', word[0]))
    for pos in range(len(word) - 1):
        bigrams.append((word[pos], word[pos + 1]))
    bigrams.append((word[-1], '.'))

# END TODO

In [None]:
def test_bigrams():
    if not isinstance(bigrams, list):
        print(f"Expected bigrams to be a list")
        return
    if (start_m_ct := bigrams.count(('.', 'm'))) != (expected_start_m_ct := 2538):
        print(f"Expected {expected_start_m_ct} ('a', 'b') bigrams, found {start_m_ct}")
        return
    if (ab_ct := bigrams.count(('a', 'b'))) != (expected_ab_ct := 541):
        print(f"Expected {expected_ab_ct} ('a', 'b') bigrams, found {ab_ct}")
        return
    if (s_end_ct := bigrams.count(('s', '.'))) != (expected_s_end_ct := 1169):
        print(f"Expected {expected_s_end_ct} ('s', '.') bigrams, found {s_end_ct}")
        return
    print("bigrams looks good. Onwards!")
test_bigrams()

### Step 2: Map characters to indices

Objective: Build a dict ```stoi``` where the key is a character from ```words`` (including '.' for start/end) and the value is a unique integer.
(We'll use that integer to represent that character later)

In [None]:
# TODO: Add code to build a character to index map
stoi = {}

chars = set()
for bigram in bigrams:
    chars.add(bigram[0])
    chars.add(bigram[1])
stoi = { v:k for (k, v) in enumerate(sorted(chars))}
print(f"{stoi=}")

In [None]:
import string

def test_stoi():
    if not isinstance(stoi, dict):
        print(f"Expected stoi to be a dict")
        return
    for c in string.ascii_lowercase:
        if not c in stoi: 
            print(f"Expected {c} to be in stoi")
            return
    print("stoi looks good. Onwards!")
test_stoi()

### Step 3: Map indices to characters

Objective: Build a dict ```itos``` that has the same key-value pairs as ```stoi```, but with each pair's key and value swapped.

In [None]:
### TODO: Add code here.
itos = {}

itos = {stoi[c]:c for c in stoi}
print(f"{itos=}")

### END TODO

### Step 4: Count occurrences of each bigram

Objective: Build a torch Tensor ```N``` where:
* the row is the index of the first character in the bigram
* the column is the index of the second character in the bigram
* the value is the number of times that bigram occurs (represented as an integer)

In [None]:
import torch

### TODO: Add code here.
N = torch.zeros(len(stoi), len(stoi), dtype=torch.int32)
for bigram in bigrams:
    i0 = stoi[bigram[0]]
    i1 = stoi[bigram[1]]
    N[i0][i1] += 1

print(f"{N=}")
### END TODO

### Step 5: Build probability distribution of bigrams

Objective: Build a torch Tensor P where:
* the row is the index of the first character in the bigram
* the column is the index of the second character in the bigram
* the value is the probability (as torch.float64) of the second character of a bigram being the one given by the column if its first is the character given by the row

In [None]:
P = torch.zeros(len(stoi), len(stoi), dtype=torch.float64)

# TODO

N_sum = N.sum(1, keepdim=True)
print(f"{N_sum=}")
P = N / N_sum
print(f"{P=}")

# END TODO

In [None]:
def test_P():
    for row_idx in itos:
        if abs(1.0 - (row_sum := P[row_idx].sum().item())) > 0.00001:
            row_c = itos[row_idx]
            print(f"Expected the sum of row {row_idx} ({row_c}) to be 1.0, was {row_sum}")
            return
    print("P looks good. Onwards!")
test_P()