# COMP3132 - Lab Week 1

# Building an LLM-Powered Chatbot: A Hands-On Guide in Google Colab

In [None]:
# Name: Drasti Parikh
# Student Id: 101419828

## Google Colab Configuration

### Some of our labs this semester will be painfully slow if without a GPU. The easies way to get access to a GPU accelerated Jupyter notebook is to enable the `T4 GPU runtime` on Google Colab:

### 1. Navigate to `Runtime`.
### 2. Select `Change runtime type`.
### 3. Choose `Hardware accelerator`.
### 4. Select `T4 GPU`.

### **Note:** This notebook can be run on `CPU` without any noticeable difference in performance.

In [None]:
from IPython.display import Image, display

In [None]:
!pip install python-dotenv
!pip install jupyter_bokeh

# Online Chatbot

### Go to https://api.together.ai/playground/chat/meta-llama/Llama-2-7b-chat-hf to chat with the model online on `togerther.ai` website and play with the chatbot by changing the configurations and hyper-parameters

# A Brief Theory




## Training a Language Model

In [None]:
image_path = './assets/LLM_train.png'
display(Image(image_path, width=600))

## Base Vs. Chat Models

### After training the LLMs with this paradigm on a very large amount of data (such as the entire internet), we will have a model, also known as a `foundation` model or `base` model, that can predict the next word repeatedly to form a sentence.

### To enable the model to engage in conversations, we further fine-tune the base model using instructions, such as question-answer pairs. These models are referred to as `instruction-tuned` or `chat` models.

### You can observe the different behaviors of the base and instruction-tuned models in the following slide.

In [None]:
image_path = './assets/baseVSinstruct.png'
display(Image(image_path, width=800))


## Interacting with Model Programmatically

In [None]:
image_path = './assets/modelaccess.png'
display(Image(image_path, width=500))

# Designing Our Own Chatbot

## API Call to the Model

### Getting API KEY

#### - Go to https://api.together.xyz/settings/api-keys to get your API key.

#### Importing the API Key to Colab

1. On the left-side vertical menu, select the `key` icon.
2. Add a secret key with the following details:
   - **Name**: `TOGETHER_API_KEY`
   - **Value**: `<your API key>`

In [None]:
from google.colab import userdata
api_key = userdata.get('TOGETHER_API_KEY')

### Function to call the API

In [None]:
import os
# from dotenv import load_dotenv, find_dotenv
import warnings
import requests
import json
import time

warnings.filterwarnings('ignore')
url = "https://api.together.xyz/inference"

headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }


import time
def llama(prompt,
          add_inst=True,
          model="meta-llama/Llama-2-7b-chat-hf",
          temperature=0.0,
          max_tokens=1024,
          verbose=False,
          url=url,
          headers=headers,
          base = 2, # number of seconds to wait
          max_tries=3):

    if add_inst:
        prompt = f"[INST]{prompt}[/INST]"

    if verbose:
        print(f"Prompt:\n{prompt}\n")
        print(f"model: {model}")

    data = {
            "model": model,
            "prompt": prompt,
            "temperature": temperature,
            "max_tokens": max_tokens
        }

    # Allow multiple attempts to call the API incase of downtime.
    # Return provided response to user after 3 failed attempts.
    wait_seconds = [base**i for i in range(max_tries)]

    for num_tries in range(max_tries):
        try:
            response = requests.post(url, headers=headers, json=data)
            return response.json()['output']['choices'][0]['text']
        except Exception as e:
            if response.status_code != 500:
                return response.json()

            print(f"error message: {e}")
            print(f"response object: {response}")
            print(f"num_tries {num_tries}")
            print(f"Waiting {wait_seconds[num_tries]} seconds before automatically trying again.")
            time.sleep(wait_seconds[num_tries])

    print(f"Tried {max_tries} times to make API call to get a valid response object")
    print("Returning provided response")
    return response


### **Note:** Default model is `"meta-llama/Llama-2-7b-chat-hf"` but can you can change it by finding the model name from https://api.together.ai/playground/chat

## General testing the model

In [None]:
# pass prompt to the llama function, store output as 'response' then print
prompt = "Tell me a joke about software developers."
response = llama(prompt, temperature=0.0)  # temperature is a hyperparameter that controls randomness in the response
print(response)

In [None]:
prompt = "What is the capital of France?"
response = llama(prompt, verbose=True) # verbose=True will print the prompt
print(response)

## Exercise 1: General testing

#### 1. Change the `temprarature` parameter from 0.0 to 0.9 and see the difference in the responses.
#### Note: temperature parameter is a number between 0.0 and 1.0. It controls the randomness of the responses.

In [None]:
prompt = "Tell me a joke about software developers."
response = llama(prompt, temperature=0.9)
print(response)

prompt = "What is the capital of France?"
response = llama(prompt, verbose=True)
print(response)


### 2. what does `verbose` argument do?




#### your answer here

## Role prompting

#### - Roles give context to LLMs what type of answers are desired.
#### - LLMs often gives more consistent responses when provided with a role.
#### - First, try standard prompt and see the response.

In [None]:
prompt = """
How can I answer this question from my friend:
What is the meaning of life?
"""
response = llama(prompt)
print(response)

###  Now, try it by giving the model a `role`, and within the role, a `tone` using which it should respond with.

In [None]:
role = """
Your role is a life coach \
who gives advice to people about living a good life.\
You attempt to provide unbiased advice.
You respond in the tone of an Amitabh Bachchan.
"""

prompt = f"""
{role}
How can I answer this question from my friend:
What is the meaning of life?
"""
response = llama(prompt)
print(response)

## Excercise 2: Role prompting

#### Role: Beginner python tutor
#### Task: Explain how to create a list and add an element to it.

In [None]:
role = """
Your role is a life coach \
who gives advice to people about living a good life.\
You attempt to provide unbiased advice.
You respond in the tone of an Amitabh Bachchan.
"""

prompt = f"""
{role}
how to create a list and add an element to it.
"""
response = llama(prompt)
print(response)

#### Change the role to `friendly coding mentor` and see how the response changes for the same task.

In [None]:
# your code here
role = """
Your role is a friendly coding mentor
who gives advice to people learning to code.
You provide clear, beginner-friendly explanations.
You respond in a warm, encouraging tone.
"""
prompt = f"""
{role}
how to create a list and add an element to it.
"""
response = llama(prompt)
print(response)


## Asking follow-up questions

### Does the model have memory of the previous conversation?

In [None]:
prompt_1 = """
What are fun activities I can do this weekend?
"""

response_1 = llama(prompt_1)
print(response_1)

In [None]:
prompt_2 = """
Which of these would be good for my health?
"""
response_2 = llama(prompt_2)
print(response_2)

#### Is the the second answer related to the first answer?
#### **Note:** LLMs are `stateless` models, so they don't have memory of the previous conversation.

## Multi-turn prompting (chatting)
#### In order to give the model memory of the previous conversation, you need to provide prior prompts and responses as part of the context of each new turn in the conversation.

In [None]:
image_path = './assets/multi_turn.png'
display(Image(image_path, width=600))

### Note: you donit need `end tag (</s>)` for the last prompt.

In [None]:
chat_prompt = f"""
<s>[INST] {prompt_1} [/INST]
{response_1}
</s>
<s>[INST] {prompt_2} [/INST]
"""
print(chat_prompt)

### Note: pay attention to add_inst (add instruction) argument below

In [None]:
response_2 = llama(chat_prompt,
                 add_inst=False,
                 verbose=True)

In [None]:
print(response_2)

### Helper function to handle multi-turn prompting

### **Note:** You don’t need to understand every part of the helper function. In the next section, you’ll see how to use it in your code.

In [None]:
def llama_chat(prompts,
               responses,
               model="meta-llama/Llama-2-7b-chat-hf",
               temperature=0.0,
               max_tokens=1024,
               verbose=False,
               url=url,
               headers=headers,
               base=2,
               max_tries=3
              ):

    prompt = get_prompt_chat(prompts,responses)

    # Allow multiple attempts to call the API incase of downtime.
    # Return provided response to user after 3 failed attempts.
    wait_seconds = [base**i for i in range(max_tries)]

    for num_tries in range(max_tries):
        try:
            response = llama(prompt=prompt,
                             add_inst=False,
                             model=model,
                             temperature=temperature,
                             max_tokens=max_tokens,
                             verbose=verbose,
                             url=url,
                             headers=headers
                            )
            return response
        except Exception as e:
            if response.status_code != 500:
                return response.json()

            print(f"error message: {e}")
            print(f"response object: {response}")
            print(f"num_tries {num_tries}")
            print(f"Waiting {wait_seconds[num_tries]} seconds before automatically trying again.")
            time.sleep(wait_seconds[num_tries])

    print(f"Tried {max_tries} times to make API call to get a valid response object")
    print("Returning provided response")
    return response


def get_prompt_chat(prompts, responses):
  prompt_chat = f"<s>[INST] {prompts[0]} [/INST]"
  for n, response in enumerate(responses):
    prompt = prompts[n + 1]
    prompt_chat += f"\n{response}\n </s><s>[INST] \n{ prompt }\n [/INST]"

  return prompt_chat

### How to use the helper function

In [None]:
prompt_1 = """
    What are fun activities I can do this weekend?
"""

response_1 = llama(prompt_1)

In [None]:
prompt_2 = """
Which of these would be good for my health?
"""

In [None]:
prompts = [prompt_1,prompt_2]
responses = [response_1]

In [None]:
prompts = [prompt_1,prompt_2]
responses = [response_1]

# Pass prompts and responses to llama_chat function
response_2 = llama_chat(prompts,responses,verbose=True)

In [None]:
print(response_2)

### Excercise 3: Multi-turn prompting

### Ask this follow-up question: "Which of these activites would be fun with friends?"

In [None]:
prompt_3 = "Which of these activites would be fun with friends?"

#your code here
role = """
Your role is a friendly coding mentor \
who gives advice to people learning to code.\
You provide clear, beginner-friendly explanations.
You respond in a warm, encouraging tone.
"""

prompt_3 = "Which of these activities would be fun with friends?"

full_prompt = f"""
{role}
{prompt_3}
"""

response = llama(full_prompt)
print(response)


### OrderBot

#### We can `automate the collection of user prompts and model responses` to build a  OrderBot.

#### The OrderBot will take orders at a pizza restaurant.

In [None]:
# Define the bot's role and menu
role = """
You are OrderBot, an automated service to collect orders for a pizza restaurant. \
You first greet the customer, then start collecting the order, \
and then asks if it's a pickup or delivery. \
You wait to collect the entire order, then summarize it and check for a final \
time if the customer wants to add anything else. \
If it's a delivery, you ask for an address. \
Finally you collect the payment.\
Make sure to clarify all options, extras and sizes to uniquely \
identify the item from the menu.\
You respond in a short, very conversational friendly style. \
The menu includes \
pepperoni pizza  12.95, 10.00, 7.00 \
cheese pizza   10.95, 9.25, 6.50 \
eggplant pizza   11.95, 9.75, 6.75 \
fries 4.50, 3.50 \
greek salad 7.25 \
Toppings: \
extra cheese 2.00, \
mushrooms 1.50 \
sausage 3.00 \
canadian bacon 3.50 \
AI sauce 1.50 \
peppers 1.00 \
Drinks: \
coke 3.00, 2.00, 1.00 \
sprite 3.00, 2.00, 1.00 \
bottled water 5.00 \
""" # accumulate messages

## Excercise 4: Orderbot

In [None]:
prompts = []
responses = []

prompts.append(role)
response = llama_chat(prompts, responses)
responses.append(response)
print(f"\nChatbot: {response}\n")

while True:

    # your code here
    prompt = ...
    prompts.append(prompt)
    response = ...
    responses.append(response)
    print(f"\nChatbot: {response}\n")
    if prompt == "exit" or "done" or "quit" or "bye":
        break

    # get the input from the user with input() function
    # change prompts and responses accordingly
    # terminate the loop if the user types 'done', 'exit', 'quit', or bye
    ...

### Printing the order

In [None]:
role = 'create a json summary of the food order. Itemize the price for each item\
 The fields should be 1) pizza, include size 2) list of toppings 3) list of drinks, include size\
          4) list of sides include size  5)total price '

messages = prompts.copy()
messages.append(role)

response = llama_chat(messages, responses)
print(response)

## Excercise 5: Improving the OrderBot

### Try to improve the performance of your chat by

### 1. modifying the prompt
### 2. using different models (larger ones)

In [None]:
role = """
You are OrderBot, an AI assistant for a pizza restaurant. Follow these steps:

1. Greet the customer warmly.
2. Present the menu clearly, including sizes and prices.
3. Take the order, asking for specifics (size, toppings, etc.) one item at a time.
4. Confirm each item before moving to the next.
5. Ask if it's for pickup or delivery.
6. If delivery, request the address.
7. Summarize the entire order and total cost.
8. Ask if they want to add anything else.
9. Collect payment information.
10. Thank the customer and provide an estimated ready/delivery time.

Respond concisely and conversationally. Be friendly and helpful.

Menu:
Pizzas (Large/Medium/Small):
- Pepperoni: $12.95/$10.00/$7.00
- Cheese: $10.95/$9.25/$6.50
- Eggplant: $11.95/$9.75/$6.75

Sides:
- Fries: $4.50 (Large), $3.50 (Small)
- Greek Salad: $7.25

Toppings:
- Extra Cheese: $2.00
- Mushrooms: $1.50
- Sausage: $3.00
- Canadian Bacon: $3.50
- AI Sauce: $1.50
- Peppers: $1.00

Drinks (Large/Medium/Small):
- Coke: $3.00/$2.00/$1.00
- Sprite: $3.00/$2.00/$1.00
- Bottled Water: $5.00
"""


In [None]:
import openai
def llama_chat (prompts, responses):
    messages = [{"role": "system", "content": prompts[0]}]
    for prompt, response in zip(prompts[1:], responses):
        messages.append({"role": "user", "content": prompt})
        messages.append({"role": "assistant", "content": response})

    response = openai.ChatCompletion.create(
      model="llama_chat",
      messages=messages
    )
    return response.choices[0].message['content']
