# Building Blocks of LLM Apps

## What kinds of Problems can we Solve with LLMs?

The quick simplified answer to that is, anything with text. Now, of course is not literally ANYTHING, but almost anything that one can describe as a text based problem, can at least be partially tackled with a powerful LLM. 

Let's look at some of these problems:

- Text Summarization: given a piece of text the model can summarize that text effectively compressing the information to a smaller chunk of text
- Question & Answering over text: given a piece of text and a question, the model can provide the appropriate answer
- Text Generation: given a prompt to generate some text following certain rules, the model can adhere to those rules and generate the appropriate text

There are many other potential applications for LLMs, but let's stick to these 3 core problems for now. 

Now what an LLM app would look like for problems like these?

## 2 Core Concepts

- __Prompt__

- __Interface__

A __Prompt__ is text input that will serve as the thing that the user gives to the model.

An __Interface__, is the UI that the user will interact with to access the LLM model functionalities.

The LLM App will join both these concepts into one environment that is well suited to solve the task/problem at hand.

# LLM App
Let's start this by looking at an LLM App for what it is. An App! So as such, you'll naturally expect to have at least 2 major components:

- A frontend
- A backend


<img src="./images/frontend_backend.png" alt="Frontend Backend Image" width=500>

On the frontend side, you'd expect to see things like the UI, the user interface with which the user will interact.

<img src="./images/frontend_llm_app.png" alt="Frontend Backend Image" width=500>

On the backend side, you'd expect to have the actual model, the LLM communicating with the frontend through some intermediary process that manages the inputs from the user to the LLM and vice-versa.

<img src="./images/backend_llm_app1.png" alt="Backend LLM App" width="500">

We'll also have some sort of prompt management going on in the backend, to make sure the app is working properly.

<img src="./images/backend_llm_app2.png" alt="Backend LLM App 2" width="500">

Finally, to provide some practical context, let's examine different abstraction layer levels while building LM apps. Our focus will be on a quiz generator as an illustrative example.

# Different Levels of an LLM App

Let's take a practical approach, let's start by setting up access to some LLM, and let's interact with it a little bit at different levels of abstraction.

## Level 1 - Calling the API

In [2]:
import openai

def llm_model(prompt_question):
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "system", "content": "You are a helpful research and\
            programming assistant"},
                  {"role": "user", "content": prompt_question}]
    )
    
    return response["choices"][0]["message"]["content"]

prompt = "Give me 5 exercises to practice calculus"
response = llm_model(prompt)
response
# Output
# '1. Differentiate the following functions:
# \n\na) f(x) = 3x^2 + 4x^3 - 2x\nb) g(x) = sin(x) + cos(x)\nc)
# h(x) = 2e^x - 4ln(x)\n\n2. 
# ......
# ......

'1. Differentiate the following functions:\n\na) f(x) = 3x^2 + 4x^3 - 2x\nb) g(x) = sin(x) + cos(x)\nc) h(x) = 2e^x - 4ln(x)\n\n2. Find the indefinite integral of the following functions:\n\na) f(x) = 2x^3 + 5x^2 - 3x + 1\nb) g(x) = sec^2(x) + tan(x)\nc) h(x) = e^x + ln(x) - 1\n\n3. Use the chain rule to differentiate the following composite functions:\n\na) f(x) = (3x^2 + 5x)^4\nb) g(x) = sin(2x^3 + 7x)\nc) h(x) = e^(2x^2 + 4x)\n\n4. Calculate the limits of the following functions:\n\na) lim(x->2) (x^3 - 8)/(x - 2)\nb) lim(x->0) (sin(2x))/x\nc) lim(x->infinity) (3x^2 + 2x)/(4x^2 - 5)\n\n5. Find the definite integral of the following functions over the given intervals:\n\na) ∫(2x + 3) dx, from x = 1 to x = 4\nb) ∫(cos(x)) dx, from x = 0 to x = π/2\nc) ∫(x^2 + 2x + 3) dx, from x = -2 to x = 2'

We want to give the user, **specialized freedom**.

What I mean is freedom to specify anything within the confounds of the application and its context, in this case, anything related to the sucesfful creation of quizzes.

One of the essential questions for building llm apps is how can we give the user specialized freedom to do that through clever pre and post prompting of the model? 

As well as any other type of preparation to make sure we are giving the most flexibility to the user within the context of our app?

## Level 2 - LLM API Call + UI

In [2]:
import ipywidgets as widgets
from IPython.display import display
# Example of a language model
import openai
import os
from dotenv import load_dotenv

load_dotenv("../.env")

openai.api_key = os.environ["OPENAI_API_KEY"]

In [3]:
def llm_model(prompt_question):
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "system", "content": "You are a helpful research and\
            programming assistant"},
                  {"role": "user", "content": prompt_question}]
    )
    
    return response["choices"][0]["message"]["content"]

def handle_submit(sender):
    input_text = text_input.value
    text_completion = llm_model(input_text)
    output_text.value = text_completion

# Create the text input widget and submit button.
text_input = widgets.Text(placeholder='Enter the prompt for the model',
                          description='Prompt:')
submit_button = widgets.Button(description='Ask ChatGPT')
# Create the output widget to display the generated random text.
output_text = widgets.Textarea(description='ChatGPT response:', disabled=True)

# Register the submit button's click event to call the handle_submit function.
submit_button.on_click(handle_submit)

# Display the widgets.
display(text_input, submit_button, output_text)

Text(value='', description='Prompt:', placeholder='Enter the prompt for the model')

Button(description='Ask ChatGPT', style=ButtonStyle())

Textarea(value='', description='ChatGPT response:', disabled=True)

We set up a super simple interface with the model.

## Level 3 - Adding Prompt Management


### Pre-prompting example

We can for example, specify some desirable behaviors we want from the LLM like

*"Act as an expert researcher and learning assistant and you will help students create instructive quizzes on any subject matter"*

In [2]:
import ipywidgets as widgets
from IPython.display import display
# Example of a language model
import openai

def llm_model(prompt_question):
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "system", "content": "You are a helpful research and\
            programming assistant"},
                  {"role": "user", "content": prompt_question}]
    )
    
    return response["choices"][0]["message"]["content"]

def handle_submit(sender):
    pre_prompt = "Act as an expert researcher and learning assistant and you will help students create instructive quizzes on any subject matter"
    input_text = text_input.value
    input_text = pre_prompt + " " + input_text
    print("Full prompt for the model: ", input_text)
    text_completion = llm_model(input_text)
    output_text.value = text_completion

# Create the text input widget and submit button.
text_input = widgets.Text(placeholder='Enter the prompt for the model',
                          description='Prompt:')
submit_button = widgets.Button(description='Ask ChatGPT')
# Create the output widget to display the generated random text.
output_text = widgets.Textarea(description='ChatGPT response:', disabled=True)
# Register the submit button's click event to call the handle_submit function.
submit_button.on_click(handle_submit)
# Display the widgets.
display(text_input, submit_button, output_text)

Text(value='', description='Prompt:', placeholder='Enter the prompt for the model')

Button(description='Ask ChatGPT', style=ButtonStyle())

Textarea(value='', description='ChatGPT response:', disabled=True)

### Post-prompting example

**Grammar check**

**Post-Prompt**: *"Correct any grammar mistakes in the following text and return the corrected text"*

In [6]:
import ipywidgets as widgets
from IPython.display import display
# Example of a language model
import openai

def llm_model(prompt_question):
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "system", "content": "You are a helpful research and\
            programming assistant"},
                  {"role": "user", "content": prompt_question}]
    )
    
    return response["choices"][0]["message"]["content"]

def handle_submit(sender):
    pre_prompt = "Act as an expert researcher and learning assistant and you will help students create instructive quizzes on any subject matter"
    input_text = text_input.value
    input_text = pre_prompt + " " + input_text
    print("Full prompt for the model: ", input_text)
    text_completion = llm_model(input_text)
    prompt_grammar = "Correct any grammar mistakes in the following text and return the corrected text"
    text_completion_grammar_checked = llm_model(prompt_grammar + text_completion)
    output_text.value = text_completion_grammar_checked

# Create the text input widget and submit button.
text_input = widgets.Text(placeholder='Enter the prompt for the model',
                          description='Prompt:')
submit_button = widgets.Button(description='Ask ChatGPT')
# Create the output widget to display the generated random text.
output_text = widgets.Textarea(description='ChatGPT response:', disabled=True)
# Register the submit button's click event to call the handle_submit function.
submit_button.on_click(handle_submit)
# Display the widgets.
display(text_input, submit_button, output_text)

Text(value='', description='Prompt:', placeholder='Enter the prompt for the model')

Button(description='Ask ChatGPT', style=ButtonStyle())

Textarea(value='', description='ChatGPT response:', disabled=True)

Full prompt for the model:  Act as an expert researcher and learning assistant and you will help students create instructive quizzes on any subject matter Write me 3 exercises about how the brain works.


Path to building an LLM app, constructing an environment that prepares the LLM to solve the task at hand, as well as set up a UI that best matches the context of the task.

# Token Limits Lesson
# Understanding Token Management in the ChatGPT API

## Tokens: The Building Blocks of Text

In language models, text is parsed and interpreted in chunks known as 'tokens'. In English, tokens can be as short as a character or as long as a word. For instance, "a" or "apple" can each represent a token. It's important to note that in different languages, tokens can vary in length.

Consider the string "ChatGPT is great!". The language model breaks this down into six tokens: ["Chat", "G", "PT", " is", " great", "!"].

## The Role of Tokens in API Calls

Tokens play a crucial role in API calls. Their significance lies in three primary areas:

1. **Cost:** Your API call cost is calculated per token.
2. **Time:** The duration of your API call is influenced by the number of tokens as writing more tokens takes more time.
3. **Functionality:** An API call can only function if the total tokens used are below the model's maximum limit (for example, 4096 tokens for gpt-3.5-turbo).

Remember, both the input and output tokens count toward these quantities. For instance, if your API call uses 10 tokens for message input and receives 20 tokens in the message output, you'll be billed for a total of 30 tokens.

To check the number of tokens used by an API call, you can refer to the usage field in the API response (for example, response['usage']['total_tokens']).

## Counting Tokens in Chat API Calls

The process of counting tokens for chat API calls requires careful attention. Despite chat models like gpt-3.5-turbo and gpt-4 using tokens similarly to models in the completions API, the message-based formatting makes it challenging to estimate the number of tokens used in a conversation.

Let's take a deep dive into counting tokens for chat API calls.

To help with this process, you can use the example function provided below, which counts tokens for messages passed to the gpt-3.5-turbo model. However, bear in mind that the method of converting messages into tokens may differ between models. Hence, the answers returned by this function may only be approximate for future model versions.

```python
def num_tokens_from_messages(messages, model="gpt-3.5-turbo"):
  # Function definition goes here
```

You can create a message and pass it to the function defined above to see the token count. This should match the value returned by the API usage parameter. An example of such a function call is:

```python
messages = [
  {"role": "system", "content": "You are a helpful, pattern-following assistant."},
  {"role": "user", "content": "This late pivot means we don't have time to boil the ocean for the client deliverable."},
]

model = "gpt-3.5-turbo"

print(f"{num_tokens_from_messages(messages, model)} prompt tokens counted.")
```

To verify the number generated by our function above, you can create a new Chat Completion and compare the value with the API returned token usage:

```python
import openai

response = openai.ChatCompletion.create(
    model=model,
    messages=messages,
    temperature=0,
)

print(f'{response["usage"]["prompt_tokens"]} prompt tokens used.')
```

For examining token count in a text string without making an API call, you can use OpenAI’s `tiktoken` Python library. 

It's important to note that if a conversation surpasses a model's maximum token limit (e.g., more than 4096 tokens for gpt-3.5-turbo), you will need to truncate, omit, or shrink your text until it fits. Also, very long conversations are more likely to receive incomplete replies due to token constraints. 

Using the `tiktoken` package:

In [1]:
# !pip install tiktoken
import tiktoken

enc = tiktoken.encoding_for_model("gpt-3.5-turbo")

sentence = "ChatGPT is useful!"

print("Token count: ", len(enc.encode(sentence)))

print("Tokenized sentence: ", [enc.decode([tk]) for tk in enc.encode(sentence)]) 

Token count:  6
Tokenized sentence:  ['Chat', 'G', 'PT', ' is', ' useful', '!']


In [4]:
import pandas as pd

data = {
    'Model': ['GPT-4 8K context', 'GPT-4 32K context', 'GPT-3.5 Turbo 4K context', 'GPT-3.5 Turbo 16K context'],
    'Input Price ($/1K tokens)': [0.03, 0.06, 0.0015, 0.003],
    'Output Price ($/1K tokens)': [0.06, 0.12, 0.002, 0.004]
}

df = pd.DataFrame(data)

df

Unnamed: 0,Model,Input Price ($/1K tokens),Output Price ($/1K tokens)
0,GPT-4 8K context,0.03,0.06
1,GPT-4 32K context,0.06,0.12
2,GPT-3.5 Turbo 4K context,0.0015,0.002
3,GPT-3.5 Turbo 16K context,0.003,0.004


In [8]:
max_num_tokens = 1000
prompt_tokens = 200


def calculate_num_tokens(prompt):
    enc = tiktoken.encoding_for_model("gpt-3.5-turbo")

    prompt_size = len(enc.encode(prompt))
    print("Token count: ", prompt_size)
    
    return prompt_size


def calculate_cost_of_prompt(prompt, max_num_tokens, price_per_1000_tk_input = 0.0015, price_per_1000_tk_output = 0.002):
    
    prompt_tokens = calculate_num_tokens(prompt)
    price_input = (prompt_tokens * price_per_1000_tk_input) / 1000
    price_output_min = 10 * (price_per_1000_tk_output) / 1000
    price_output_max = max_num_tokens * (price_per_1000_tk_output) / 1000
    price_total_min = price_input + price_output_min
    price_total_max = price_input + price_output_max
    print(f"Price input: {price_total_min} - {price_total_max} ", )
    
    return price_total_min, price_total_max


prompt = "Tell me a joke"
max_num_tokens = 2000
calculate_cost_of_prompt(prompt, max_num_tokens)

Token count:  4
Price input: 2.6000000000000002e-05 - 0.004006 


(2.6000000000000002e-05, 0.004006)

In [2]:
# source: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb

def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613"):
    """Return the number of tokens used by a list of messages."""
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        print("Warning: model not found. Using cl100k_base encoding.")
        encoding = tiktoken.get_encoding("cl100k_base")
    if model in {
        "gpt-3.5-turbo-0613",
        "gpt-3.5-turbo-16k-0613",
        "gpt-4-0314",
        "gpt-4-32k-0314",
        "gpt-4-0613",
        "gpt-4-32k-0613",
        }:
        tokens_per_message = 3
        tokens_per_name = 1
    elif model == "gpt-3.5-turbo-0301":
        tokens_per_message = 4  # every message follows <|start|>{role/name}\n{content}<|end|>\n
        tokens_per_name = -1  # if there's a name, the role is omitted
    elif "gpt-3.5-turbo" in model:
        print("Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.")
        return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613")
    elif "gpt-4" in model:
        print("Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.")
        return num_tokens_from_messages(messages, model="gpt-4-0613")
    else:
        raise NotImplementedError(
            f"""num_tokens_from_messages() is not implemented for model {model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens."""
        )
    num_tokens = 0
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":
                num_tokens += tokens_per_name
    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>
    return num_tokens

In [None]:
# source: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
# let's verify the function above matches the OpenAI API response

import openai

example_messages = [
    {
        "role": "system",
        "content": "You are a helpful, pattern-following assistant that translates corporate jargon into plain English.",
    },
    {
        "role": "system",
        "name": "example_user",
        "content": "New synergies will help drive top-line growth.",
    },
    {
        "role": "system",
        "name": "example_assistant",
        "content": "Things working well together will increase revenue.",
    },
    {
        "role": "system",
        "name": "example_user",
        "content": "Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage.",
    },
    {
        "role": "system",
        "name": "example_assistant",
        "content": "Let's talk later when we're less busy about how to do better.",
    },
    {
        "role": "user",
        "content": "This late pivot means we don't have time to boil the ocean for the client deliverable.",
    },
]

for model in [
    "gpt-3.5-turbo-0301",
    "gpt-3.5-turbo-0613",
    "gpt-3.5-turbo",
    "gpt-4-0314",
    "gpt-4-0613",
    "gpt-4",
    ]:
    print(model)
    # example token count from the function defined above
    print(f"{num_tokens_from_messages(example_messages, model)} prompt tokens counted by num_tokens_from_messages().")
    # example token count from the OpenAI API
    response = openai.ChatCompletion.create(
        model=model,
        messages=example_messages,
        temperature=0,
        max_tokens=1,  # we're only counting input tokens here, so let's not waste tokens on the output
    )
    print(f'{response["usage"]["prompt_tokens"]} prompt tokens counted by the OpenAI API.')
    print()

# References
- [Build LLM Systems with the ChatGPT API](https://learn.deeplearning.ai/chatgpt-building-system/lesson/2/language-models,-the-chat-format-and-tokens)

- [FSDL LLM Bootcamp](https://fullstackdeeplearning.com/llm-bootcamp/spring-2023/llmops/)
- [Visual guide to llm-powered app architecture](https://medium.com/@remitoffoli/a-visual-guide-to-llm-powered-app-architecture-57e47426a92f)
- [Anatomy of LLM-Based Chatbot Applications](https://towardsdatascience.com/anatomy-of-llm-based-chatbot-applications-monolithic-vs-microservice-architectural-patterns-77796216903e)
- [streamlit langchain llm app architecture](https://blog.streamlit.io/langchain-tutorial-1-build-an-llm-powered-app-in-18-lines-of-code/)
- [ChatGPT API documentation](https://platform.openai.com/docs/guides/gpt/chat-completions-api)
- [Token Limits](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_handle_rate_limits.ipynb)
- [Openai Cookbook - How to format inputs for ChatGPT models](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_format_inputs_to_ChatGPT_models.ipynb)