# CHAT NOTEBOOK THING FOR AARON
Admittedly, I may have overengineered this. In my defense, a boy's gotta have fun.

I've created a `Chat` class object which will:
- allow you to persist chat message histories for all chatbots in a single .messages object
- add and remove messages as required
- add models and their respective function/API calls as necessary (with particular requirements for said function calls)
- arbitrarily run however many of the defined functions/API calls to generate chatbot responses
- add whatever chatbot response to the message history, in order to continue the conversation.

The generally intended order of operations of this notebook is:

1) define the `Chat` class by running the first cell (or essentially the first cell - it should be obvious)
2) define functions for each chat model, which ingests a `Chat` class object and returns a string (the generated string response of the model, **NOT THE WHOLE JSON**).
   a) if the API/function call functions as a chatbot (i.e. doesn't need a chat head), just call the `Chat` class object.
   b) if the API/function call needs a chathead, use the .temp_chathead() class method, with the the chathead string as an input.
3) initialise a `Chat` class object (call it whatever)
4) add each model to your `Chat` class object by running .add_model('model_name', model_func())
5) start generating responses by running .generate_outputs() on your chat model. Input your input string, as well as a (optional) list of models to run (if you put nothing in, it defaults to running on every model).
6) Look through each model output, and then append whichever model output you like to the chat history by running .add_generated_output('model_name').
7) Continue ad infinitum!

I've written up some (free-to-run) examples of LLMs so you can see the workflow I've imagined (llama 3.2 and qwen2.5 1.5b_. I'm terrible at documentation but hopefully this is enough to get you going.


In [None]:
# for anything from huggingface - not necessary for the API calls, but if you want to try my code for llama 3.2, or generically want to use stuff behind permissions on huggingface, this is how you'd do it

from huggingface_hub import notebook_login

notebook_login()

In [None]:
# google colab installs
!pip install bitsandbytes
!pip install groq
!pip install openai
!pip install -q -U google-generativeai
!pip install anthropic

In [102]:
#IMPORTS
#generic imports
from pprint import pprint
import os
import copy

#local demo imports and config
import torch
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_4bit=True)
torch_device='cuda' if torch.cuda.is_available() else 'cpu'

#groq for llama 3.1 70b inference imports
from groq import Groq

#OpenAI imports for GPT4o
from openai import OpenAI

#gemini imports
import google.generativeai as genai
from google.generativeai.types import HarmCategory, HarmBlockThreshold

# anthropic imports
import anthropic


In [103]:
#DEFINE CHAT CLASS
class Chat():
    '''
    Class that allows you to persist a chat history across multiple chat models easily.
    '''
    def __init__(self):
        self.messages = []
        self.chathead = []
        self.models = {}
        self.chat_outputs = {}
        
    def __call__(self):
        "default call returns all message history"
        return self.messages

    def __len__(self):
        return len(self.messages)
    
    def add_user_message(self, message):
        "add user message. For user input."
        self.messages.append({
                'role': 'user',
                'content': message
            })
    
    def add_system_message(self, message):
        "add system message. For adding to message chains from chatbot generated content"
        self.messages.append({
                'role': 'system',
                'content': message
            })

    def continue_message(self, text: str):
        "appends text to last message in chat history"
        self.messages[-1]['content'] = ' '.join(self.messages[-1]['content'], text)

    def temp_chathead(self, chathead):
        "for textgen-only APIs that might not have an inbuilt chathead. Call this instead of default call for temporary chathead."
        self.chathead = [{
            'role': 'system',
            'content': chathead
        }]
        self.chathead.extend(self.messages)
        return self.chathead

    def add_model(self, name: str, func: callable, ):
        """
        adds a model key and function to the models dictionary of this chat instance
        
        make sure that each function only accepts a self.messages type input as an argument (i.e. only takes list[dict] of message history)
        """
        self.models[name] = func

    def avail_models(self):
        return self.models

    def generate_outputs(self, chatinput: str, models: list[str] = None):
        """adds in user message and generates a chat output for each model listed in list input
        
        NOTE - each function has to be able to ingest a Chat class, which will either make default call (return chat without chathead) or +chathead call.
        each function must return a single string.
        
        """
        if len(self) != 0:
            assert self.messages[-1]["role"] != "user", "The last message in this chat history is a user message! Either select a model output to add to the chat with add_generated_output(), or delete the last message with delete_last_message()!"
        self.add_user_message(chatinput)
        #running this in a separate loop to make sure the function doesn't run and then bork itself halfway through
        if models is None:
            models = list(self.models.keys())
        
        temp_results_dict = {}
        
        for model in models:
            assert model in list(self.models.keys()), f"{model} is not in {list(self.models.keys())}! Pick a valid model!"
        
        for model in models:
            temp_results_dict[model] = self.models[model](self) # THIS WON'T WORK CORRECTLY IF YOU NEED TO ADD TEMP CHATHEAD AT INFERENCE

        self.chat_outputs = temp_results_dict
        pprint(temp_results_dict)

    def add_generated_output(self, model: str):
        "adds one particular model output to chat history"
        assert model in list(self.chat_outputs.keys()), f"{model} is not in {list(self.models.keys())}! Pick a valid model!"

        self.add_system_message(self.chat_outputs[model])
        #

    def delete_last_message(self):
        self.messages = self.messages[:-1]
    

# EXAMPLE WORKFLOW WITH LOCAL MODELS

In [104]:
# STEP 0: DEFINE CHAT GENERATION INFERENCE FUNCTIONS

### LLAMA 3.2 EXAMPLE 

llama_model=AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B-Instruct", 
                                           quantization_config=quantization_config, 
                                           torch_dtype=torch.float32, 
                                           device_map=torch_device)

llama_tokenizer=AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")

llama32_3b_pipe = pipeline(
    "text-generation",
    model=llama_model,
    tokenizer=llama_tokenizer,
)



## RELEVANT METHOD DEFINED HERE
def llama32_3b_chat(messages: Chat) -> str: #NOTE - MUST ingest the Chat() class, and MUST output the generated string response of the model
    "simplifies pipeline output to only return generated text"
    outputs = llama32_3b_pipe(
        messages.temp_chathead('you are an incredibly empathetic therapist chatbot with a calm demeanour.'),
        max_new_tokens=512
    )
    return outputs[-1]['generated_text'][-1]['content']

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 2/2 [00:19<00:00,  9.63s/it]


In [105]:
### qwen2.5 1.5b example
qwen2_model=AutoModelForCausalLM.from_pretrained(
                                            "Qwen/Qwen2.5-1.5B-Instruct", 
                                            quantization_config=quantization_config, 
                                            torch_dtype=torch.float32, 
                                            device_map=torch_device,
                                                )

qwen2_tokenizer=AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")

qwen2_pipe = pipeline(
    "text-generation",
    model=qwen2_model,
    tokenizer=qwen2_tokenizer,
)

def qwen2_chat(messages: Chat) -> str:  #NOTE - MUST ingest the Chat() class, and MUST output the generated string response of the model
    "simplifies pipeline output to only return generated text"
    outputs = qwen2_pipe(
        messages.temp_chathead('you are an incredibly empathetic therapist chatbot with a calm demeanour.'),
        max_new_tokens=512
    )
    return outputs[-1]['generated_text'][-1]['content']

In [106]:
# STEP 1: INSTANTIATE A CHAT CLASS OBJECT
example_chat = Chat()

# note - you may add a synthetic chat history here if you like, by using the .add_user_message() and .add_system_message() methods here
# e.g.
# example_chat.add_user_message('hello!')
# example_chat.add_system_message("don't look at me right now.")

In [107]:
# STEP 2: ADD MODELS AND THEIR ASSOCIATED METHODS
example_chat.add_model("qwen2", qwen2_chat)
example_chat.add_model("llama32", llama32_3b_chat)

In [113]:
# STEP 3: GENERATE RESPONSES FROM AN INPUT
# note - you can choose which models you use to generate outputs. If you leave that option out, all models will generate outputs. This is what I've done here, but you can see how I constrain generation later in this example

example_chat.generate_outputs("what is my purpose?")

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


{'llama32': "My friend, that's a question that has been on many minds for "
            'centuries. As we explore this together, I want you to know that '
            "you're safe, and I'm here to support you.\n"
            '\n'
            "Your purpose is a deeply personal and unique question. It's not "
            'something that can be found in a book or a lecture, but rather '
            "it's something that you need to discover for yourself.\n"
            '\n'
            'That being said, I can offer some insights. Your purpose might be '
            'the thing that makes you feel most alive, most fulfilled, and '
            "most like yourself. It's the thing that you can't stop thinking "
            "about, the thing that you're passionate about, and the thing that "
            'brings you a sense of joy and meaning.\n'
            '\n'
            'Perhaps your purpose is to help others, to make a difference in '
            "the world, or to leave a lasting impact. Ma

In [118]:
#STEP 4: append the model output you like the most to your chat history.
example_chat.add_generated_output("qwen2")
#NOTE: if you want to reroll, you have to run example_chat.delete_last_message() first, before running example_chat.generate_outputs() with the same input again. This just deletes your last user input, and then re-inputs it.

In [119]:
#INTERMISSION: check your total chat history now if you like!
example_chat.messages

[{'role': 'user', 'content': 'what is my purpose?'},
 {'role': 'system',
  'content': "As your AI therapist, I am here to provide support and guidance in various areas such as mental health, emotional well-being, personal growth, and more. My goal is to help you navigate life's challenges and improve your overall quality of life by offering insights, strategies, and resources tailored to your specific needs.\n\nMy role includes:\n\n1. **Therapeutic Support**: Helping you address stress, anxiety, depression, or other psychological issues.\n2. **Emotional Guidance**: Providing comfort, understanding, and empathy through conversation and therapy sessions.\n3. **Life Coaching**: Offering advice on personal development, relationships, career choices, and daily living tips.\n4. **Counseling Services**: Facilitating discussions about past experiences, current situations, and future plans.\n5. **Wellness Advice**: Sharing information on healthy lifestyles, self-care practices, and holistic app

In [121]:
#STEP 5: CONTINUE!
#NOTE: here I've only used llama32 to generate output options.

example_chat.generate_outputs("I think about drinking milk and kicking ass but I'm lactose intolerant.", ["llama32"])

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


{'llama32': "It sounds like you're torn between two conflicting desires: the "
            'idea of enjoying a refreshing glass of milk and the need to kick '
            "butt (perhaps in a metaphorical or literal sense). I'm here to "
            'help you explore this inner conflict.\n'
            '\n'
            "Firstly, let's acknowledge that it's great you're thinking about "
            'what you want to do. That in itself is a positive step. Now, '
            "regarding the milk, I'd like to offer a gentle suggestion: have "
            'you considered exploring lactose-free or non-dairy milk '
            'alternatives? There are many delicious options available, such as '
            'almond milk, soy milk, or coconut milk, that might satisfy your '
            'craving for a creamy drink.\n'
            '\n'
            "As for kicking ass, I'm assuming you're referring to a desire to "
            'take action, stand up for yourself, or assert your confidence. '
       

In [124]:
#Continue ad nauseum
example_chat.add_generated_output("llama32")
pprint(example_chat.messages)

[{'content': 'what is my purpose?', 'role': 'user'},
 {'content': 'As your AI therapist, I am here to provide support and guidance '
             'in various areas such as mental health, emotional well-being, '
             'personal growth, and more. My goal is to help you navigate '
             "life's challenges and improve your overall quality of life by "
             'offering insights, strategies, and resources tailored to your '
             'specific needs.\n'
             '\n'
             'My role includes:\n'
             '\n'
             '1. **Therapeutic Support**: Helping you address stress, anxiety, '
             'depression, or other psychological issues.\n'
             '2. **Emotional Guidance**: Providing comfort, understanding, and '
             'empathy through conversation and therapy sessions.\n'
             '3. **Life Coaching**: Offering advice on personal development, '
             'relationships, career choices, and daily living tips.\n'
             '

# ACTUAL API CALL AND FUNCTION SETUPS, WITH SETUP FOR TEST RANGE

In [125]:
## API KEY STUFF IF RUNNING ON GOOGLE COLAB

os.environ["GROQ_API_KEY"] = ''
os.environ["OPENAI_API_KEY"] = ''
os.environ['GOOGLE_API_KEY'] = ''
os.environ['ANTHROPIC_API_KEY'] = ''

In [126]:
### Llama 3.1 70b
#uses groq - max tokens is 8k apparently, but it's also free to some extent, I think?
# docs: https://console.groq.com/docs/quickstart

llama_70b_client = Groq(
    api_key=os.environ.get("GROQ_API_KEY")
)


def llama_31_70b_chat(messages: Chat) -> str:
    chat_completion = llama_70b_client.chat.completions.create(
        messages=messages(),
        model="llama-3.1-70b-versatile",
    )
    return chat_completion.choices[0].message.content

In [127]:
# GPT4o
# docs: https://platform.openai.com/docs/quickstart

#### UNTESTED BECAUSE NOT FREEEEEEEE
gpt4o_client = OpenAI(api_key = os.environ.get("OPENAI_API_KEY"))

def gpt4o_chat(messages: Chat) -> str:
    completion = gpt4o_client.chat.completions.create(
        model="gpt-4o",
        messages=messages.temp_chathead("You are a helpful assistant. You really (REALLY) like milk.")
        )
    return completion.choices[0].message.content

In [132]:
#gemini 1.5 flash
#docs: https://ai.google.dev/gemini-api/docs/text-generation?lang=python
#docs for safety filters lol: https://ai.google.dev/gemini-api/docs/safety-settings

genai.configure(api_key=os.environ["GOOGLE_API_KEY"])

def gemini_15_chat(messages: Chat) -> str:
    
    #reformat messages for gemini API
    temp_gemini_chat = copy.deepcopy(messages())
    for message in temp_gemini_chat:
        message['parts'] = message['content']
        del message['content']
        if message['role'] == 'system':
            message['role'] = 'model'

    history = temp_gemini_chat[:-1]
    chat_inputs = temp_gemini_chat[-1]["parts"]
    
    model = genai.GenerativeModel("gemini-1.5-flash")
    chat = model.start_chat(
        history=history
    )
    response = chat.send_message(chat_inputs, 
                                 # google's safety settings are EXTREMELY sensitive. Blocking nothing.
                                 safety_settings = {
                                    HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,
                                    HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,
                                    HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
                                    HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
    }
                                )
    return response.text

In [96]:
# Claude 3.5 Sonnet
# docs: https://docs.anthropic.com/en/docs/initial-setup

#### UNTESTED BECAUSE NOT FREEEEEEEE
import anthropic

anthropic_client = anthropic.Anthropic(
    # defaults to os.environ.get("ANTHROPIC_API_KEY")
    api_key=os.environ.get("ANTHROPIC_API_KEY"),
)

def claude35_chat(messages: Chat) -> str:
    message = anthropic_client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1000,
        temperature=0,
        messages=messages()
    )
    return message.content[0]['text']

In [133]:
# set up new chat instance
chat = Chat()

#add models
chat.add_model("llama3_70b", llama_31_70b_chat)
chat.add_model("gpt4o", gpt4o_chat)
chat.add_model("gemini15_flash", gemini_15_chat)
chat.add_model("claude35_sonnet", claude35_chat)