How does an LLM work?
Current generation large language models are generally transformer neural networks that are trained on a large corpus of text. The text is split into tokens, which are sequences of characters that are usually seen together, e.g.:

`[Div, ided,  up,  like,  this, .]`

And the tokens are given a number, with lower numbers being more common:

`[24095, 1384, 510, 588, 428, 13]`

The network is fed a fixed length sequence of these numbers over and over during a training process. Neural network training generally consists of a chain of matrix multiplications where the numbers from some output stage and some weights at that stage are combined. The weights start off as random numbers and are adjusted continually to meet some goal. The overall shape of the network, how the stages are connected is called the model's architecture, different architectures have different properties but the general idea is to have an architecture that allows some form of incremental adjustment of the weights to meet a training goal.

As mentioned, current LLMs use a transformer architecture. Without going into details about attention, encoders, decoders, masking and so on, the training goal of an LLM is to guess which token is likely to come next given the ones that came before it.

After training, we have a program that can guess the next word-part given a list of word parts that came before it. By feeding in something like this:

 ```
This is a conversation between a friendly AI and a human.

Human: hello
AI: Hello, how can I help you?
Human: 
```

and adding the user's text to the end, we can generate tokens one at a time until it outputs Human: , and simulate a dialogue. But really all we're doing is exploring likely sentences based on probability and the training data.

This is a powerful way to extract relevant facts from the data, but it's a fraction of what these models are actually capable of, both good and bad.