# Makemore

Noted, Coded, and Created by Han Summer 2024. Part of The 20th Summer Project

------------
### Makemore

Making more things from the data given to the model.

Under the hood, **makemore is character level language model**, it means treating every single line example of data from the training set as a sequence of individual characters.

Character level language model, simpelnya cuman prediksi huruf yang selanjutnya berdasarkan sequence huruf yang udah ada (before it)

----
## This page is a the finished implementation of makemore model including Statistic Bigram, NN Bigram

In [25]:
import torch
import torch.nn.functional as F

### Read Dataset

In [26]:
# Open the dataset (List of names) as a python list of strings which is words in this case
word= open('names.txt', 'r').read().splitlines()

In [27]:
N=torch.zeros((27,27),dtype=torch.int32)

#Make lookup table for character to index
chars=sorted(list(set(''.join(word)))) #Concat all the word in the dataset to 1 string, make it set which not allowing duplicate and sorted from a to z
#the index will start from 1 since index 0 will be used for start/end token
stoi={s:i+1 for i,s in enumerate(chars)} #Make dictionary of character to index {'a': 1, 'b': 2, 'c': 3, 'd': 4, and so on}
#Add start token and end token to the dictionary as a '.' and located in index 0
stoi['.']=0
#Inverse the matrix
itos={v:k for k,v in stoi.items()} 

### Bigram

In [28]:
def Bigram(word,namecount):

    for w in word:
        #Add start token and end token
        chs=['.']+list(w)+['.']
        for ch1, ch2 in zip(chs, chs[1:]): #Iterate with 2 characters at a time
            #Zip bakal stop ketika salah satu elemen habis, seperti chs[1:] habis duluan dibanding w
            ix1=stoi[ch1]
            ix2=stoi[ch2]
            #Counting up the occurence of the bigram
            N[ix1, ix2]+=1
    # g = torch.Generator().manual_seed(2147483647)
    
    
    P=(N+1).float() #Smoothing the model
    P=P/P.sum(1, keepdim=True)
    #for loop for how many names will be generated
    for i in range (namecount):
        out=[]
        ix=0
        while True:
            p=P[ix]
            #Generate the sample based on the probability in the row
            ix=torch.multinomial(p, num_samples=1, replacement=True).item()
            out.append(itos[ix])
            if ix==0: #This is when the char is '.' aka the end
                break
        print(''.join(out))


In [29]:
Bigram(word,10) #Generate 10 names

amciasanelarenthahin.
lirarinze.
fabe.
aimahinn.
bionielana.
n.
a.
malartalyon.
ziton.
mphelyadorudoarueilahkieestondriantolaliye.


### Bigram Neural Net

In [30]:
def NNBigram(epoch,learning_rate):
    
    #Create the training set of bigrams(x) and the target set of bigrams(y)
    xs, ys = [], []

    for w in word: #Iterate through all the bigrams
        #Add start token and end token
        chs=['.']+list(w)+['.']
        for ch1, ch2 in zip(chs, chs[1:]): #Iterate with 2 characters at a time
            #Zip bakal stop ketika salah satu elemen habis, seperti chs[1:] habis duluan dibanding w
            ix1=stoi[ch1]
            ix2=stoi[ch2]
            #Store the index value to the list
            xs.append(ix1)
            ys.append(ix2)
            
    #Convert to tensor (better to use lowercase "tensor" not "Tensor" since Tensor automatically assign the dtype to float32)
    xs=torch.tensor(xs)
    ys=torch.tensor(ys)    
    num = xs.nelement() #Number of element in the tensor
    
    # randomly initialize 27 neurons' weights. each neuron receives 27 inputs
    # g = torch.Generator().manual_seed(2147483647)
    W = torch.randn((27, 27), requires_grad=True) #requires_grad=True to make the weight can be updated by the optimizer
    
    #Model Training, gradient descent
    for i in range(epoch):
        #Forward pass
        xenc = F.one_hot(xs, num_classes=27).float() 
        logits = xenc @ W 
        counts = logits.exp()
        probs = counts / counts.sum(1, keepdims=True)
        loss=-probs[torch.arange(num), ys].log().mean()
        
        #Backward pass
        W.grad=None #Reset the gradient to zero
        loss.backward() #Calculate the gradient, torch is like micrograd, it tracks the computation graph
        
        #Update weight
        W.data += -learning_rate * W.grad
        if i % 10 == 0:
            print(f'Epoch {i}, Loss {loss.item()}')
            
    return W


In [31]:
#Model training
W=NNBigram(100,50)

Epoch 0, Loss 3.800018787384033
Epoch 10, Loss 2.659454107284546
Epoch 20, Loss 2.5612409114837646
Epoch 30, Loss 2.5260512828826904
Epoch 40, Loss 2.5074403285980225
Epoch 50, Loss 2.4960408210754395
Epoch 60, Loss 2.4883546829223633
Epoch 70, Loss 2.482818841934204
Epoch 80, Loss 2.4786510467529297
Epoch 90, Loss 2.4754178524017334


In [32]:
# Make the neural net do the character prediction
def predict_next_char(W):
    g = torch.Generator().manual_seed(2147483647)

    for i in range(5):
        out = []
        ix = 0
        while True:
            xenc = F.one_hot(torch.tensor([ix]), num_classes=27).float()
            logits = xenc @ W # predict log-counts
            counts = logits.exp() # counts, equivalent to N
            p = counts / counts.sum(1, keepdims=True) # probabilities for next character
            # ----------
            
            ix = torch.multinomial(p, num_samples=1, replacement=True).item()
            out.append(itos[ix])
            if ix == 0:
                break
        print(''.join(out))

In [33]:
predict_next_char(W)

s.
ly.
washja.
jan.
tlilytlo.
