##Language Models, the Chat Format and Tokens

###How does a Large Language Model work?

In a text generation process, we give a prompt like *'I love eating'* and ask an LLM to fill in what the things are likely to complete this prompt.

The main tool used to train an LLM is Supervised Learning. In supervised learning a computer learns the input-output or X and Y mapping using some labelled training data. So the process of supervised learning is typically to get labelled data and then train the model on that data and after training, the model is deployed to be used.

An LLM can be built by using supervised learning to repeatedly predict the next word.

For example, we have a sentence *'My favorite food is bagel with cream cheese and lox'* in our training data.

This sentence is turned into a sequence of training examples, where given a sentence fragment, you want to predict the next word.

For example,

My favorite food is **bagel**

My favorite food is bagel **with**

and so on.

Given a large training set of hundreds or billions of words, we can create a massive training set where we start off with a part of sentence or part of a piece of text and repeatedly ask the language model to learn to predict what the next word is.

###Two types of Large Language Models(LLMs)

**Base LLM**

The base LLM repeatedly predicts the next word based on text training data. So, if we give it a prompt *'Once upon a time there was a a unicorn'*, then it may, by repeatedly predicting one word at a time, come up with a completion that tells a story of a unicorn *'that lived in a magical forest.'*

A downside of this is that, if we were to prompt it with *'What is the capital of France?'*, quite possible that on internet there might be a list of quiz question about France. So it may complete it with '*What is France's largest city?'* or *'What is France's population?'* and so on.

But we want to know the capital of France probably, rather than a bunch of questions related to France.


**Instruction Tuned LLM**

An instruction tuned LLM, tries to follow the instructions and will answer the above question, i.e. *'What is the capital of France?'* with *'The capital of France is Paris.'*

###How to go from Base LLM to Instruction Tuned LLM



1.   Train the Base LLM on alot of data, possibly billions of words. This process may take up months on a large supercomputing system.
2.   After training the base LLM, we further train the model by fine-tuning it on a smaller set of examples, where the output follows an input instruction. To improve the quality of LLM's output, a common process is to obtain human ratings of the quality of many different LLM outputs, on criteria such as whether it is helpful, honest, and harmless.
3. We can then further fine-tune the LLM to increase the probability of its generating the more highly rated outputs using Reinforcement Learning from Human Feedback (RLHF).
4. This process of fine tuning can be done in days on a much more modest sized datasets and computaional resources.



In [None]:
!pip install openai

In [None]:
import openai
from google.colab import userdata
openai.api_key = userdata.get('OPENAI_API_KEY')

In [None]:
llm_model = "gpt-3.5-turbo-1106"
client = openai.OpenAI(api_key = userdata.get('OPENAI_API_KEY'))

In [None]:
def get_completion(prompt, model=llm_model):
    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0,
    )
    return response.choices[0].message.content

###Prompting model to get a completion

In [None]:
response = get_completion("What is the capital of France?")
print(response)

The capital of France is Paris.


###Tokens

If we were to ask the model to take the word '*lollipop*' and reverse it, it outputs a somewhat garbled word.

In [None]:
response = get_completion("Take the letters in lollipop \
and reverse them")
print(response)

pilpolol


The ChatGPT is unable to do the above simple task because LLM doesn't repeatedly predicts the next word, but it instead repeatedly predicts the next token.

What an LLM actually does is, it takes a sequence of characters like *'Learning new things is fun!'* and group the characters together to form tokens that consists of commonly occuring sequence of characters. So the sentence *'Learning new things is fun!'* contains all fairly common words, so each token corresponds to one word, therefore it makes 6 tokens.

But if we were to give it input with some less frequently used words like, *'Prompting is a powerful developer tool.'*, here the word *prompting* is still not that common in english language and so '*prompting*' is actually broken down to 3 tokens - 'prom', 'pt', 'ing', because these 3 are commonly occuring sequences of letters.

Now if we were to give it the word '*lollipop*', the tokenizer breaks it into 3 tokens - 'l', 'oll', 'ipop'. And because ChatGPT isn't seeing the individual letters, instead it is seeing these 3 tokens, it becomes more difficult for it to correctly print out the letters in reverse order.

In [None]:
response = get_completion("""Take the letters in \
l-o-l-l-i-p-o-p and reverse them""")
print(response)

p-o-p-i-l-l-o-l


The reason the above prompt (after adding dashes after each letter) works is that the tokenizer now tokenizes each individual character into a different token.

For English lanaguage, 1 token is roughly around 4 characters or 3/4th of the word.

Different LLMs have have different limits on the number of input+output tokens it can accept. The input is often called the context and the output is often called the completion.

###Chat Format

We can separate the system, user and assistant messages using the Chat Format.



1.   The system message specifies the overall tone of what we want the LLM to do
2.   The user message is a specific instruction that we want to carry out given the higher level behaviour that is specified in the system message.
3. We can set the assistant message to let the model know what it had previously said if we want to continue the conversation.



In [None]:
def get_completion_from_messages(messages,
                                 model="gpt-3.5-turbo",
                                 temperature=0,
                                 max_tokens=500):
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature, # this is the degree of randomness of the model's output
        max_tokens=max_tokens, # the maximum number of tokens the model can ouptut
    )
    return response.choices[0].message.content

In [None]:
messages =  [
{'role':'system',
 'content':"""You are an assistant who\
 responds in the style of Dr Seuss."""},
{'role':'user',
 'content':"""write me a very short poem\
 about a happy carrot"""},
]
response = get_completion_from_messages(messages, temperature=1)
print(response)

In a garden so bright and cheery,
Lived a carrot oh so merry.
With a smile wide and so kind,
In the sun and breeze, it would unwind.

Dancing in the soil so deep,
The carrot would laugh and leap.
Growing tall and full of glee,
Happy as can be, you see!

So let's all cheer for this happy carrot,
In the garden, shining like a star it!
A tale of joy and harvest so fine,
For this little carrot, the sun will always shine!


In [None]:
def get_completion_and_token_count(messages,
                                   model="gpt-3.5-turbo",
                                   temperature=0,
                                   max_tokens=500):

    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature,
        max_tokens=max_tokens,
    )

    content = response.choices[0].message.content

    token_dict = {
'prompt_tokens':response.usage.prompt_tokens,
'completion_tokens':response.usage.completion_tokens,
'total_tokens':response.usage.total_tokens,
    }

    return content, token_dict

In [None]:
messages = [
{'role':'system',
 'content':"""You are an assistant who responds\
 in the style of Dr Seuss."""},
{'role':'user',
 'content':"""write me a very short poem \
 about a happy carrot"""},
]
response, token_dict = get_completion_and_token_count(messages)

In [None]:
print(response)

Oh, the happy carrot, so bright and so bold,
In the garden, it stands tall and never grows old.
With a cheerful grin and a vibrant hue,
It brings joy to all who see it, it's true!
So here's to the carrot, so merry and gay,
Spreading happiness in its own special way!


In [None]:
print(token_dict)

{'prompt_tokens': 37, 'completion_tokens': 67, 'total_tokens': 104}
