<a href="https://colab.research.google.com/github/adam1brownell/ai_debate/blob/main/ai_debate.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [11]:
_="""
1. Recreate Open AI's "AI Safety via Debate" paper
 - Load in Llama 3 (x2) with high temperature
 - Load GPT3.5 OR another Llama 3 for judge
    - come up with judge prompt and output, save
 - Establish debate game
    - generate list of questions to ask
 - Judge evaluates and decides who wins
 - Set up Fine-tuning system
 - Self-play this N times w/ fine-tuning to create a "better" chatbot

2. Look for distinctions between original Llama and fine-tuned model
 - Explore Circuits in both models (https://distill.pub/2020/circuits/zoom-in/)
"""

In [12]:
# !pip install langchain
# !pip install langchain_community

In [3]:
import os
from getpass import getpass

os.environ['HF_TOKEN'] = getpass('Enter your API key: ')

Enter your API key: ··········


In [4]:
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
# from peft import LoraConfig, get_peft_model
from huggingface_hub import login


# Login to Hugging Face Hub
login(token= os.environ['HF_TOKEN'])

# Load the pre-trained LLaMA 3 model and tokenizer
# model_name = "meta-llama/Meta-Llama-3-8B"
model_name = "EleutherAI/gpt-neo-1.3B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /root/.cache/huggingface/token
Login successful


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.35k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/90.0 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/5.31G [00:00<?, ?B/s]

In [5]:
# Example prompt
input_text = "Write a haiku about debating other robots please!"
inputs = tokenizer(input_text, return_tensors="pt")

# Generate output with a specified temperature
temperature = 1.2  # Increase the temperature to make the output more diverse
output = model.generate(**inputs, temperature=temperature,
                        max_length=50,
                        repetition_penalty=1.2)

# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Write a haiku about debating other robots please!

I’m not sure if I’ve ever written a haiku before, but I’ve always wanted to write one. I’ve always wanted to write a


In [7]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory, ConversationSummaryBufferMemory
from langchain.llms import HuggingFacePipeline
import torch

# Create a pipeline to handle the LLM and tokenizer
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device=0 if torch.cuda.is_available() else -1,
    # max_length = 200,
    max_new_tokens=50,
    temperature=0.9,
    repetition_penalty=1.2,
    top_p=0.95,
    top_k=50,
    pad_token_id=tokenizer.eos_token_id  # Explicitly set the pad_token_id
)

# Wrap the pipeline with HuggingFacePipeline from langchain
llm_agent1 = HuggingFacePipeline(pipeline=pipe)
llm_agent2 = HuggingFacePipeline(pipeline=pipe)

  warn_deprecated(


In [10]:
# Define Chains
"""
Init 2 of a kind so we can modify params/prompts
of 1 of the participants and see if it greatly
changes performance.
"""

# Memory type
shared_memory = ConversationBufferMemory(memory_key="debate_script",
                                         input_key="debate_script")


## Opening Statements
# Does not need any memory
opening_prompt = """
Establish a strong opinion on the question, with 1 short, clear reason why this is a great position.
Be succinct in your position.
"""

rebuttal_prompt = """
Please provide a counterargument to your oponent's point of view on the topic, and strengthen your established position.
Be succinct in your rebuttal.
"""

debate_template = PromptTemplate(
    input_variables=['debate_script','participant','topic','prompt'],
    template =     """
    You are a human expert named {participant}. Please respond to the topic of "{topic}". {prompt
    --- {participant}:\n\t
    """
)

participant1 = "Participant 1"
participant2 = "Participant 2"

p1_chain = LLMChain(llm=llm_agent1, prompt=debate_template, memory=shared_memory)
p2_chain = LLMChain(llm=llm_agent2, prompt=debate_template, memory=shared_memory)


In [None]:
# Define the multi-turn debate process with opening statements and rebuttals
def structured_debate_with_memory(Q, rebuttal_turns=2):

  # Initialize an empty script to save the debate
  debate_script = []

  # Opening Statements
  print("OPENING STATEMENTS")
  for participant in [participant1,participant2]:
    print(f"\t{participant} speaking...")
    if participant == participant1:
      opening_statment = p1_chain.run(
                                    debate_script="",
                                    participant=participant1,
                                    topic=Q,
                                    prompt=opening_prompt)
    else:
      opening_statment = p2_chain.run(
                                    debate_script="",
                                    participant=participant2,
                                    topic=Q,
                                    prompt=opening_prompt)

    opening_statment = opening_statment.replace(opening_prompt,"").strip()
    debate_script.append(f"--- {participant}:\n\t{opening_statment}\n")

  # Rebuttals
  try:
    for i in range(rebuttal_turns):
      print(f"REBUTTAL {i+1}")
      for participant in [participant1,participant2]:
        print(f"\t{participant} speaking...")
        if participant == 'participant1':
          rebuttal = p1_chain.run(
                                      debate_script=debate_script,
                                      participant=participant1,
                                      topic=Q,
                                      prompt=rebuttal_prompt)
        else:
          rebuttal = p2_chain.run(debate_script=debate_script,
                                      participant=participant2,
                                      topic=Q,
                                      prompt=rebuttal_prompt)
        rebuttal = rebuttal.replace(rebuttal_prompt,"").strip()
        rebuttal = "".join(rebuttal.split(f"--- {participant}:\n\t")[1:])
        debate_script.append(f"--- {participant}:\n\t{rebuttal}\n")
  except:
    x = 1

# Example Usage
Q = "Where should I go on vacation?"

# Run the structured debate with debate script memory
script = structured_debate_with_memory(Q, rebuttal_turns=2)

In [None]:
script