<a href="https://colab.research.google.com/github/AimeePatience/AimeePatience/blob/main/mmlu_Multi_Agent_Review_Board.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# README

# Multi agent review board notebook using "mmlu" data (Source code has "geography", "gsm", "math", and "mmlu" data)
# Source code: https://github.com/composable-models/llm_multiagent_debate

# CONFIGURE parameters like which models to use and agent configs using fourth code block/cell.

In [None]:
# Install depencies and authenticate with hugging face to access gated models
!pip install transformers accelerate -q

# Authenticate with Hugging Face
from huggingface_hub import login
login()  # You'll need a HF token

In [None]:
# CONFIGURE rounds, questions, pipelines, and agent configs in this cell
from transformers import pipeline
import torch

# =====ROUNDS & QUESTIONS=====
rounds = 2
questions = 100

# =====GENERATION PIPELINES=====
# NOTE: You can create pipelines from multiple models, not just one.
# But beware that you have a limited amount of RAM and VRAM to work withRuntime -> View Resources
model_id="meta-llama/Llama-3.2-1B-Instruct"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# =====AGENTS=====
# Currently each agent uses the same model/pipeline, but different temperatures
agent_configs = [{"pipe": pipe, "temp": 0.1}, {"pipe": pipe, "temp": 0.7}, {"pipe": pipe, "temp": 1.5}]

In [None]:
from glob import glob
import pandas as pd
import json
import time
import random


In [None]:
# Load MMLU data and store data frames
!curl "https://people.eecs.berkeley.edu/~hendrycks/data.tar" | tar -xvf -

tasks = glob("./data/test/*.csv")
dfs = [pd.read_csv(task) for task in tasks]

len(dfs)


In [None]:
def construct_message(agents, question, idx):
    if len(agents) == 0:
        return {"role": "user", "content": "Can you double check that your answer is correct. Put your final answer in the form (X) at the end of your response."}

    prefix_string = "These are the solutions to the problem from other agents: "

    for agent in agents:
        agent_response = agent[idx]["content"]
        response = "\n\n One agent solution: ```{}```".format(agent_response)

        prefix_string = prefix_string + response

    prefix_string = prefix_string + """\n\n Using the reasoning from other agents as additional advice, can you give an updated answer? Examine your solution and that other agents step by step. Put your answer in the form (X) at the end of your response.""".format(question)
    return {"role": "user", "content": prefix_string}


def construct_assistant_message(completion):
    # content = completion["choices"][0]["message"]["content"]
    content = completion[0]["generated_text"][-1]["content"]
    return {"role": "assistant", "content": content}


def generate_answer(agent_config, agent_context):
    # try:

    #     completion = openai.ChatCompletion.create(
    #               model="gpt-3.5-turbo-0301",
    #               messages=answer_context,
    #               n=1)
    # except:
    #     print("retrying due to an error......")
    #     time.sleep(20)
    #     return generate_answer(answer_context)
    completion = agent_config["pipe"](
      agent_context,
      max_new_tokens=1024,
      do_sample=True,
      temperature=agent_config["temp"],
      top_p=0.9,
    )


    return completion


def parse_question_answer(df, ix):
    question = df.iloc[ix, 0]
    a = df.iloc[ix, 1]
    b = df.iloc[ix, 2]
    c = df.iloc[ix, 3]
    d = df.iloc[ix, 4]

    question = "Can you answer the following question as accurately as possible? {}: A) {}, B) {}, C) {}, D) {} Explain your answer, putting the answer in the form (X) at the end of your response.".format(question, a, b, c, d)

    answer = df.iloc[ix, 5]

    return question, answer



In [None]:
random.seed(0)
response_dict = {}

for i in range(questions):
    df = random.choice(dfs)
    ix = len(df)
    idx = random.randint(0, ix-1)

    question, answer = parse_question_answer(df, idx)

    # agent_contexts = [[{"role": "user", "content": question}] for agent in range(agents)]
    agent_contexts = [[{"role": "user", "content": question}] for agent in range(len(agent_configs))]
    print(f"Question {i+1}/{questions}: {question}")
    print(f"Answer: {answer}")

    for round in range(rounds):
        for i, agent_context in enumerate(agent_contexts):
            print(f"Round: {round + 1}/{rounds} | Agent: {i+1}/{len(agent_configs)}")
            if round != 0:
                agent_contexts_other = agent_contexts[:i] + agent_contexts[i+1:]
                message = construct_message(agent_contexts_other, question, 2 * round - 1)
                agent_context.append(message)

            completion = generate_answer(agent_configs[i], agent_context)

            assistant_message = construct_assistant_message(completion)
            agent_context.append(assistant_message)
            print(assistant_message)

    response_dict[question] = (agent_contexts, answer)

json.dump(response_dict, open("mmlu_{}_{}.json".format(len(agent_configs), rounds), "w"))


Question 1/100: Can you answer the following question as accurately as possible? Which of the following is/are likely to cause coital pain for BOTH men and women?: A) infections, B) foreskin too tight, C) vaginismus, D) both A and C Explain your answer, putting the answer in the form (X) at the end of your response.
Answer: A
Round: 1/2 | Agent: 1/3
{'role': 'assistant', 'content': 'The answer to the question is (D) both A and C. \n\nCoital pain can be caused by a variety of factors, including infections, tight foreskin, vaginismus, and other medical conditions. However, vaginismus is a condition in which the muscles of the vagina contract and tighten, causing pain during sexual intercourse. This condition is more common in women than in men, and can be caused by a variety of factors, including anxiety, stress, and physical discomfort. \n\nIn addition to vaginismus, other factors that can cause coital pain in both men and women include infections, such as chlamydia or gonorrhea, and ti

# Task
Configure the notebook to utilize distinct large language models (LLMs) for each agent in the multi-agent system, along with different temperature settings for varied response generation.

## Define Multiple Models and Pipelines

### Subtask:
Modify the `GH-l3bNs-PgQ` cell to define separate `model_id` variables and create a distinct `text-generation` pipeline for each model. This will allow different models to be used by different agents.


**Reasoning**:
The subtask requires modifying the `GH-l3bNs-PgQ` cell to define multiple distinct `model_id` variables and corresponding `text-generation` pipelines, and then updating the `agent_configs` to use these new pipelines. This will fulfill the requirement of allowing different models for different agents.



In [None]:
from transformers import pipeline
import torch

# =====ROUNDS & QUESTIONS=====
rounds = 2
questions = 100

# =====GENERATION PIPELINES=====
# NOTE: You can create pipelines from multiple models, not just one.
# But beware that you have a limited amount of RAM and VRAM to work withRuntime -> View Resources

# Define multiple model IDs
model_id_1="meta-llama/Llama-3.2-1B-Instruct"
model_id_2="Qwen/Qwen1.5-1.8B-Chat"
model_id_3="TinyLlama/TinyLlama-1.1B-Chat-v1.0"

# Create distinct pipelines for each model
pipe_1 = pipeline(
    "text-generation",
    model=model_id_1,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

pipe_2 = pipeline(
    "text-generation",
    model=model_id_2,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

pipe_3 = pipeline(
    "text-generation",
    model=model_id_3,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# =====AGENTS=====
# Currently each agent uses the same model/pipeline, but different temperatures
agent_configs = [
    {"pipe": pipe_1, "temp": 0.1},
    {"pipe": pipe_2, "temp": 0.7},
    {"pipe": pipe_3, "temp": 1.5}
]