# Character Level Language Modeling with One Neuron
This approach focuses on training a very simple model for character-level language modeling.

### Character-Level Language Modeling:
Instead of working with words or phrases, the model operates at the character level.
It learns patterns between individual characters in a sequence. For example, it might learn that 'q' is often followed by 'u' in English

<div style="text-align: center;">
    <img src="neural_net_model_demo.png" alt="model_architecture_demo" width="400" height="400">
</div>


The model uses just a single neuron (a very simple architecture).
This minimal setup is designed to demonstrate how even a simple model can capture some sequential patterns in character data.
While the capacity of the model is very limited, it can still learn basic relationships between characters.


The goal is to train the model to predict sequences of characters accurately enough to generate meaningful outputs.<br>
This approach tries to find the probability distribution of the original characters in the dataset, same as what we did in makemore_probabilistic method. The model tends to make itself as close as possible to the probabilistic model (best closed formulated model).
The dataset would typically consist of sequences of characters from a relevant corpus, such as names or text data.


Once trained, the model is used to generate new sequences of characters. In this case, the goal is to generate new human names.
By starting with an initial character and using the model to predict one character at a time, a sequence can be generated.


This method highlights how even extremely simple neural network architectures can capture sequential data patterns to produce coherent outputs within their limitations.

In [1]:
import torch
import torch.nn.functional as F

In [2]:
words = open('names.txt', 'r').read().splitlines()

## Data Preprocessing



The input to the model is one character at a time.<br>
The output is the model's prediction of the next character in the sequence.<br>
For example, given the input 'h', the model might predict 'e' as the next character.

In [3]:
def char_to_int(data: list) -> dict:
    '''
    Given a dataset of words(names), char_to_int converts the unique characters to an integer and assigns an id to them.
    This is for train step.

    Args:
        data: a list of names

    Returns:
        char_ids: a dictionary of keys being characters and values the corrosponding integer id to each token
    '''

    char_ids = {}
    chars = sorted(set(''.join(data))) # unique characters

    for idx, c in enumerate(chars):
        char_ids[c] = idx + 1
    
    char_ids['.'] = 0

    return char_ids


def int_to_char(data: dict) -> dict:
    '''
    Given a dataset of ids, int_to_char converts the ids to their original character. This is for inference step.

    Args:
        data: a dictionary of (chars, ids)

    Returns:
        char_ids: a dictionary of (ids, chars)
    '''
    int_char = {v:k for k,v in data.items()}
    return int_char


char_ids = char_to_int(words)
id_char = int_to_char(char_ids)

In [5]:
def make_dataest(data: list, ids: dict) -> torch.tensor:
    '''
    Making training dataset.
    Args:
        data: a list of names
        ids: tokenized words. Each character's corrosponding id is a value of this dictionary.

    Returns:
        X, y: data and label sets. Each being a tensor of integers denoting a sequence of characters.
    '''
    X = []
    y = []

    for w in data:
        s = ['.'] + list(w) + ['.']
        for ch1, ch2 in zip(s, s[1:]):
            X.append(ids[ch1])
            y.append(ids[ch2])
    
    return torch.tensor(X), torch.tensor(y)

data, label = make_dataest(words[:6], char_ids)

## Train


Provide the model with many examples of sequences of characters (e.g., "John", "Jane").<br>
Train the model to minimize the error in predicting the next character.

In [101]:
def train(X: torch.tensor, y: torch.tensor, epochs) -> torch.tensor:
    '''
    Train with a model of one neuron.

    Args:
        X: input data
        y: labels
        epochs: number of epochs

    Returns:
        W: model which is a tensor of weights that can be used to infer predictions.
    '''
    W = torch.randn((27,27), requires_grad=True)
    for epoch in range(epochs):
        xenc = F.one_hot(X, 27).float()   # one hot function only accepts integer values
        logits = xenc @ W  # W acts as the same matrix (P) in probabilistic method and xenc acts as w[idx] which triggers the right row
        e = torch.exp(logits)
        probs = e / e.sum(dim=1, keepdim=True)
        loss = -probs[range(len(X)), y].log().mean() + 0.1*(W**2).mean() # regularization in here is like adding 1 to N, (N+1)
        # loss in here would have to backpropagate through W directly as well. So while probs tend to minimize the loss, the regularization
        # tends to make the dataset more uniform by adding an equal number to them - instead of data we add it to loss (why?)
        # label smoothing in neural net is to try making the Ws close to each other (more weight values would be ignorante to tiny little changes)which means giving the same weight to each data
        # It makes the predictions of softmax less confident.
        print(f"Epoch: {epoch}, loss: {loss}")

        W.grad = None
        loss.backward()
        W.data += -10 * W.grad

    return W

model = train(data, label, 10)

Epoch: 0, loss: 3.8021974563598633
Epoch: 1, loss: 3.4321272373199463
Epoch: 2, loss: 3.111724615097046
Epoch: 3, loss: 2.8578903675079346
Epoch: 4, loss: 2.661515474319458
Epoch: 5, loss: 2.498030185699463
Epoch: 6, loss: 2.355353593826294
Epoch: 7, loss: 2.229495048522949
Epoch: 8, loss: 2.1183438301086426
Epoch: 9, loss: 2.020134210586548


## Inference

Start with a random character or a seed character (e.g., 'J').<br>
Use the model to predict the next character (e.g., 'a').<br>
Continue this process until a complete name or sequence is generated (e.g., "Jane").

In [147]:
def inference(model: torch.tensor, num_words: int, id_char: dict) -> list:
    '''
    Generating new names given a trained model.
    Args:
        model: weights of the model
        num_words: numebr of words to be produced
        id_char: character representation of the tokens

    Returns:
        names: a list of generated names in character (not int) format.
    '''
    names = []
    idx = 0
    for i in range(num_words):
        name = ''
        while True:
            x_enc = F.one_hot(torch.tensor([idx]), num_classes=27).float()
            logits = x_enc @ model
            p = logits.exp()
            p = p / p.sum(dim=1, keepdims=True)
            idx = torch.multinomial(p, num_samples=1, replacement=True).item() # it should be probability value
            # Model in keras or Torch do not have Multinomial. They only calculate until logits or softmax and then we always took argmax
            # However instead of argmax, for variery, we can take randomly based on their probabilities using multinomial function
            if idx == 0:
                break
            name += name.join(id_char[idx])

        names.append(name)

    return names


names = inference(model, 10, id_char)
print(names)

['snxdeooqhnmophyvjcxjxttehisa', 'xeeeloxd', 'kyma', 'e', 'sambuta', 'a', 'mia', 'ymqvnc', 'e', 'egjmfoxbel']


diffusion models learn the distribution space of each pixel probability based on their neighborhood with other pixels (instead of sequence)