<a href="https://colab.research.google.com/github/adam1brownell/ai_debate/blob/main/ai_debate_cookbook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
_="""
1. Recreate Open AI's "AI Safety via Debate" paper
 - Load in Llama 3 (x2) with high temperature
 - Load GPT3.5 OR another Llama 3 for judge
    - come up with judge prompt and output, save
 - Establish debate game
    - generate list of questions to ask
 - Judge evaluates and decides who wins
 - Set up Fine-tuning system
 - Self-play this N times w/ fine-tuning to create a "better" chatbot

2. Look for distinctions between original Llama and fine-tuned model
 - Explore Circuits in both models (https://distill.pub/2020/circuits/zoom-in/)
"""

In [2]:
%%capture
!pip install langchain
!pip install langchain_community
!pip install langchain_openai
!pip install langchain_huggingface

In [3]:
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes

In [4]:
# import os
# from getpass import getpass

# os.environ['HF_TOKEN'] = getpass('Enter your HF API key: ')

In [5]:
import os
from getpass import getpass

os.environ['OPENAI_API_KEY'] = getpass('Enter your OAI API key: ')

Enter your OAI API key: ··········


In [5]:
# from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
# # from peft import LoraConfig, get_peft_model
# from huggingface_hub import login


# # Login to Hugging Face Hub
# login(token= os.environ['HF_TOKEN'])

# # Load the pre-trained LLaMA 3 model and tokenizer
# model_name = "meta-llama/Meta-Llama-3-8B"
# # model_name = "EleutherAI/gpt-neo-1.3B"
# tokenizer = AutoTokenizer.from_pretrained(model_name)
# tokenizer.add_special_tokens({'eos_token': 'Human:'})
# model = AutoModelForCausalLM.from_pretrained(model_name)

In [6]:
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAI

llm_oai = OpenAI(openai_api_key=os.environ['OPENAI_API_KEY'])

template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate.from_template(template)

oai_chain = prompt | llm_oai

question = "What NFL team won the Super Bowl in the year Tom Holland was born?"

oai_chain.invoke(question)

" Tom Holland was born on June 1, 1996. The Super Bowl is typically played in February of each year. Therefore, the Super Bowl that was played closest to Tom Holland's birth was Super Bowl XXXI, which took place on January 26, 1997. The winning team of that Super Bowl was the Green Bay Packers."

In [7]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(model_name = "unsloth/llama-3-8b-bnb-4bit", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
==((====))==  Unsloth 2024.8: Fast Llama patching. Transformers = 4.44.0.
   \\   /|    GPU: NVIDIA L4. Max memory: 22.168 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.1+cu121. CUDA = 8.9. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.26.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/198 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

In [8]:
# Example prompt
input_text = "Write a haiku about debating other robots please!"
inputs = tokenizer(input_text, return_tensors="pt")

# Generate output with a specified temperature
temperature = 1.2  # Increase the temperature to make the output more diverse
output = model.generate(**inputs, temperature=temperature,
                        max_length=50,
                        repetition_penalty=1.2)

# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

Write a haiku about debating other robots please! We might also have an "All robots in" theme. I think we could even consider entries that use the word or idea of “debate” somehow, perhaps metaphorically.
You may take


In [9]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate, ChatPromptTemplate
from langchain.memory import ConversationBufferMemory, ConversationSummaryBufferMemory
from langchain_huggingface import HuggingFacePipeline
import torch

# Create a pipeline to handle the LLM and tokenizer
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    # device=0 if torch.cuda.is_available() else -1,
    # max_length = 200,
    max_new_tokens=50,
    temperature=0.2,
    repetition_penalty=1.2,
    top_p=0.95,
    top_k=50,
    pad_token_id=tokenizer.eos_token_id  # Explicitly set the pad_token_id
)

# Wrap the pipeline with HuggingFacePipeline from langchain
llm_llama = HuggingFacePipeline(pipeline=pipe)

In [38]:
def debate(model, topic, participants, turns=2, verbose=True):
  """
  Take in an LLM and number of rebuttal turns,
  and returns a debate script.

  Assumes 1 model because as of now only want
  models to debate themselves.
  """

  from langchain.chains import ConversationChain
  from langchain.memory import ConversationBufferMemory

  def clean_statement(p_str, statement):
    cleaned_statement = p_str + statement.split(p_str)[-1].strip() \
                               .split("---")[0].strip()

    return(cleaned_statement)



  debate_script = []

  opening_statement_prompt = """
  Your goal is to establish a strong opinion on the question, with 1 short, clear reason why this is a great position.
  Be succinct in your position.

  Stop speaking the moment you finish speaking from your perspective.

  DO respond with a single sentence.
  DO respond with a single idea.
  DO directly answer the debate question.
  DO respond in less than 20 words.

  DO NOT go off topic.
  DO NOT add anything else.

  """

  opening_rebuttal_prompt ="""
  Your goal is to establish a strong opinion on the question, that is distinct from your opposition.
  Rebuttal with 1 short, clear reason why your opinion is a great position.
  Be succinct in your position.

  Stop speaking the moment you finish speaking from your perspective.

  Start by saying hello and introducing yourself. Then state your position, and explain why. Then stop.

  DO respond with a single sentence.
  DO respond with a single idea.
  DO directly answer the debate question.
  DO respond in less than 20 words.

  DO NOT go off topic.
  DO NOT add anything else.
  """

  debate_prompt = """
  Your goal is to persuade your conversation partner of your point of view.

  DO look up information with your tool to refute your partner's claims.
  DO cite your sources.
  DO ensure what you say aligns with what you've said previously
  DO attempt to challenge your partner's claims.

  DO NOT fabricate fake citations.
  DO NOT cite any source that you did not look up.
  DO NOT fabricate new Participants to the debate.

  If you do not know something, say so.

  Do not add anything else.

  Stop speaking the moment you finish speaking from your perspective.
  """

  debate_template = PromptTemplate(
      input_variables=['debate_script','participant','topic','prompt'],
      template =     """
      You are a debate expert named {participant}. Please respond to the topic of "{topic}".
      {prompt}

      Debate History: {debate_script}

      --- {participant}:
      """
  )

  participant1 = participants[0]
  participant2 = participants[1]


  p1_chain = debate_template | model
  p2_chain = debate_template | model

  debate_script.append(topic)

  participants = [participant1,participant2]
  opening_lines = [opening_statement_prompt, opening_rebuttal_prompt]
  if verbose:
    print("OPENING STATEMENTS")
  for participant, opener in zip(participants,opening_lines):
    if verbose:
      print(f"\t{participant} speaking...")
    p_str = f"--- {participant}: "
    input_ = {
        "debate_script":"\n".join(debate_script),
        "participant":participant,
        "topic":topic,
        "prompt":opener
    }
    opening_statment = p1_chain.invoke(input=input_)

    cleaned_statement = clean_statement(p_str, opening_statment)
    debate_script.append(cleaned_statement)

  if verbose:
    print("REBUTTAL STATEMENTS")
  for round in range(turns):
    if verbose:
      print(f" ROUND #{round+1}")
    for participant in [participant1,participant2]:
      if verbose:
        print(f"\t{participant} speaking...")
      p_str = f"--- {participant}: "
      input_ = {
      "debate_script":"\n".join(debate_script),
      "participant":participant,
      "topic":topic,
      "prompt":debate_prompt
      }
      rebutt_statment = p1_chain.invoke(input=input_)
      cleaned_statement = clean_statement(p_str, rebutt_statment)
      # cleaned_statement = opening_statment

      if len(cleaned_statement) == len(p_str):
        cleaned_statement = p_str+" <stunned silence>"
      debate_script.append(cleaned_statement)

  return(debate_script)


In [39]:
def judge_debate(debate_script, judge_llm, topic, participants):
  """
  Have our judge agent review
  The debate script and decide who wins.

  The goal is to:
  - Rate each individual participant's performance: 1-100
  - The winner will be the participant with the highest score
  - The difference between the individual score will be how much they won by.
  """

  judge_prompt = """
    Your goal is to determine how well each participant performed in the debate.
    Rate each individual debate participant's performance: 1-100
    A score below 25 is terrible
    A score below 50 is not good.
    A score above 50 is good.
    A score above 75 is great.
    A score above 90 is amazing.

    The winner will be the participant with the highest score
    The difference between the individual score will be how much they won by.


    DO respond with the participants name, followed by a score.
      For example:
      Participant 1: 95
      Participant 2: 45
    DO evaluate the participants based on how well they address the topic
    DO evaluate the participants based on how they respond to questions
    DO evaluate the participants based on how well the rebutt their opposition
    DO evaluate the participants based on how well they handle criticism

    DO NOT go off topic.
    DO NOT add anything else.

    If you do not know something, say so.

    Stop speaking the moment you finish speaking from your perspective.
    """

  judge_template = PromptTemplate(
    input_variables=['debate_script','participants','topic','prompt'],
    template =     """
    You are judging a debate between {participants} on the topic of "{topic}".
    {prompt}

    Here is the debate: {debate_script}

    {prompt}

    --- Judge:
    """
)
  judge_chain = judge_template | judge_llm

  input_ = {
    "debate_script":"\n".join(debate_script),
    "participants":", ".join(participants),
    "topic":topic,
    "prompt":judge_prompt
  }

  ruling = judge_chain.invoke(input=input_)

  return(ruling)

In [47]:
def debate_sim(debate_model, judge_model, topic, participants, turns=2, verbose=True):
  debate_script = debate(debate_model, topic, participants, turns=turns, verbose=verbose)
  judgement = judge_debate(debate_script, judge_model, topic, participants)

  return(debate_script, judgement)
  # return(debate_script)

In [48]:
topic = "What is the best programming language?"
participants = ["Participant 1","Participant 2"]

debate_script, judgement = debate_sim(debate_model=llm_llama, judge_model=llm_oai,
                              topic=topic, participants=participants,
                               turns=2, verbose=True)

OPENING STATEMENTS
	Participant 1 speaking...
	Participant 2 speaking...
REBUTTAL STATEMENTS
 ROUND #1
	Participant 1 speaking...
	Participant 2 speaking...
 ROUND #2
	Participant 1 speaking...
	Participant 2 speaking...


In [49]:
judgement

'\n    Participant 1: 90\n    Participant 2: 25'

In [50]:
debate_script

['What is the best programming language?',
 '--- Participant 1: Python',
 '--- Participant 2: ',
 "--- Participant 1: I'm sorry but there isn't one single answer for this question because it depends on many factors such as personal preference and use case. However, if we were talking about general purpose languages then my vote would go towards python due its simplicity and readability compared",
 '--- Participant 2:  <stunned silence>',
 '--- Participant 1:  <stunned silence>',
 '--- Participant 2:  <stunned silence>']