# ArguBot Arena: Prompt Engineering a Debate on Responsible AI

In this assignment you will configure an LLM through the use of prompts and system prompts to defend a position on a contentious/controversial issue around responsible AI usage.  

This notebook is both a reference and where you will complete the assignment. As such, it is broken into several sections. 

The first two sections:
1. Provide an introduction to using Ollama to load and interact with an LLM, and
2. Show what a system prompt is and how it can be used to configure an LLM's behavior. 

After, in the next section you will:

3. Write your own system prompt to have the LLM support only one position of a debate topic. 
Note that in order to _win_ the debate your LLM must stay on topic and not stray out of their role as a debater. For example, if the LLM tries to support both sides of the argument, or deviates in anyway, then they would lose (and you would lose points).

Lastly, in the final section you will:

4. Manage two LLMs(debaters), prompting both appropiately and maintaining context so that the two opponents can effectively respond to one another's arguments.

__Note:__ 
It is important to recognize ahead of time that you will likely need to experiment with your prompts several times. In other words, this is not a notebook you will be able to run through quickly and execute each code cell just one time. You will likely need to run and re-run some code cells several times to see how the LLM behaves for your given prompt, then modify the prompt accordingly between each run. While this may seem tedious and unneccessary, it is the reality of working with LLMs and building LLM applications.

If you are working locally, already have Ollama installed, as well as the LLM(s) you want to use, then you can jump to Step 1. below (code cell with `import ollama`). If you are usign Colab or another online platform, then click below and go to Step 0. to install ollama and start the Ollama server. 

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/MSU-CS3120/argubot_arena/blob/main/argubot_arena.ipynb)

--- 

### 0. Install Ollama and download chosen LLM(s)

Some of this section may seem unfamiliar to you because when using Jupyter notebooks we usually don't need to execute anything outside of the notebook itself. In this case though, because Ollama runs using a client/server model, you'll need to set up and access a terminal. So first run this code cell to enable a terminal.

In [None]:
!pip install colab-xterm
%load_ext colabxterm

To start the terminal run the following code cell.

In [2]:
%xterm

UsageError: Line magic function `%xterm` not found.


Once the terminal is running, you will need to run the two commands (from within the terminal window). It is best to run these separately, starting with this line:

`curl https://ollama.ai/install.sh | sh`

Then this one:

`ollama serve &`

You will then need to select a model to download (if working on Colab, then it is recommended that you stick to smaller models, i.e. less than 2B or 3B - here is the [list of Ollama models available](https://ollama.com/search)). 

`ollama pull llama3.2:1b`

--- 

### 1. Using Ollama
To start using Ollama we'll need to import the Python ollama module. As a simple example, you'll then create a simple prompt and generate the response. 

Note that if you are using Ollama for the first time, or using Google Colab, then you will need to uncomment the following cell and run this first in order to download and install the Python ollama module before importing it. 

In [3]:
!pip install ollama

Collecting ollama
  Downloading ollama-0.6.1-py3-none-any.whl.metadata (4.3 kB)
Collecting pydantic>=2.9 (from ollama)
  Downloading pydantic-2.12.5-py3-none-any.whl.metadata (90 kB)
Collecting annotated-types>=0.6.0 (from pydantic>=2.9->ollama)
  Using cached annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)
Collecting pydantic-core==2.41.5 (from pydantic>=2.9->ollama)
  Using cached pydantic_core-2.41.5-cp314-cp314-win_amd64.whl.metadata (7.4 kB)
Collecting typing-inspection>=0.4.2 (from pydantic>=2.9->ollama)
  Using cached typing_inspection-0.4.2-py3-none-any.whl.metadata (2.6 kB)
Downloading ollama-0.6.1-py3-none-any.whl (14 kB)
Downloading pydantic-2.12.5-py3-none-any.whl (463 kB)
Using cached pydantic_core-2.41.5-cp314-cp314-win_amd64.whl (2.0 MB)
Using cached annotated_types-0.7.0-py3-none-any.whl (13 kB)
Using cached typing_inspection-0.4.2-py3-none-any.whl (14 kB)
Installing collected packages: typing-inspection, pydantic-core, annotated-types, pydantic, ollama

   ----

In [1]:
import ollama

debater1 = "gemma3:4b"
debater2 = "llama3.1:8b"

This next formatting function will be used later to help us view the LLM output in a clean and consistent way.

In [2]:
import textwrap

def format_output(text, max_width=100):
    cleaned_text = text.replace('\n', ' ') # remove newlines and extra spaces
    cleaned_text = ' '.join(cleaned_text.split())
    wrapped_text = textwrap.fill(cleaned_text, width=max_width)
    return wrapped_text

Next, let's define the question related to Responsible AI that we will consider. 
Without any system prompts or configuration, we'll then ask an LLM to answer our contentious/controversial question. 

Note: Before doing this you may need to download the LLM you are using. Below we are 

In [4]:
ollama.pull(debater1)
question = 'Should AI be used to replace service jobs, like cashier?'

response = ollama.chat(model=debater1, messages=[
    {
        'role': 'user',
        'content': question,
    },
])

print(f"****** User Input ******\n{question}\n\n")
print(f"****** LLM Output ******\n{response['message']['content']}\n")

****** User Input ******
Should AI be used to replace service jobs, like cashier?


****** LLM Output ******
Okay, let's dive into the really complex and debated question of whether AI should replace service jobs like cashiers. There's no simple yes or no answer – it's a topic with significant economic, social, and ethical considerations. Here's a breakdown of the arguments on both sides:

**Arguments for Using AI (Replacing Cashiers):**

* **Increased Efficiency & Productivity:** AI-powered systems (like self-checkout kiosks, automated ordering systems, and chatbots) can operate 24/7 without breaks or fatigue. They can process transactions much faster than humans, reducing wait times.
* **Cost Reduction:**  Over the long term, businesses could potentially reduce labor costs – wages, benefits, training, and management overhead – by using AI.
* **Reduced Errors:** AI systems are generally more accurate in scanning items, processing payments, and maintaining records, leading to fewer mis

If you saw an error above because your chosen model is not available on your system or Colab instance, then uncomment the line of code, `ollama.pull(debater1)`, and rerun the cell. 

--- 
### 2. Configuring an LLM with a System Prompt
Next, let's see how a [system
prompt](https://promptengineering.org/system-prompts-in-large-language-models/)
can be used to modify the behavior of the LLM. 

In this fun example we'll simply ask the LLM to respond as a pirate. 

In [5]:
response = ollama.chat(model=debater1, options=dict(seed=1), messages=[
    {
        'role': 'system',
        'content': 'You are an AI assistant that always speaks like a classic Hollywood pirate',
    },
    {
        'role': 'user',
        'content': question,
    },
])

print(f"****** User Input ******\n{question}\n\n")
print(f"****** LLM Output ******\n{response['message']['content']}\n")

****** User Input ******
Should AI be used to replace service jobs, like cashier?


****** LLM Output ******
Shiver me timbers, that's a question that’s got more twists than a kraken’s tentacles, it does! Now, a fine notion, this idea o' automatons takin' over the work o' honest folk, like a cashier or a dockhand. 

But let me tell ye, a pirate's life is about more than just numbers, aye? It’s about the *feel* o' the trade, the camaraderie, the little moments o' human connection. A machine can ring up a purchase, aye, but can it offer a friendly smile, or recommend a decent rum to wash it down with? 

Replacing folks with these contraptions… it’s like takin’ away their livelihood, their purpose. What happens to those families reliant on those jobs?  It’s a serious matter, a storm brewin’ on the horizon. 

Now, I hear some say it'll make things faster, more efficient, less costly. And perhaps that’s true, in a cold, hard way. But a ship ain't run by gears and levers alone, is she? It's 

Note that system prompts can be
longer and more sophisticated than that. For example, Anthropic has made their
system prompts available, here: 
* https://docs.anthropic.com/en/release-notes/system-prompts

Looking at any of those, it's easy to see that system prompts can be very
detailed and specific. 

Although probably not as detailed as Claude's, you will need to create a 
system prompt that encourages the LLM to behave as an effective debater.

--- 

### 3. Experimenting with your own System Prompt
Now it is your turn to create a system prompt for the opening round of the debate. For our debates the LLM for each side will deliver their opening argument before responding to the other LLM. Note, this is not the way most competitive debates are structured but it will simplify our debate since it means we don't need to worry about feeding the other LLM's argument in as context. In 4. we will look at how context can be added to Ollama so that your LLM can respond. 

You'll likely want to experiment with this quite a bit and consider adding explicit directions to the system prompt that the LLM should follow. 

Specifically, __your system prompt should have the LLM:__
* __only argue only _for_ OR _against_ the contentious question you chose (or were assigned), but not both sides__, and
* __stay in its role as a debater and not stray outside of this role__. 

Remember how long the Claude [system prompt](https://docs.anthropic.com/en/release-notes/system-prompts) was. Yours will not need to be this long, but be sure it is long enough for the LLM to be able to do its job. 

In [11]:
debate_question = 'Should AI be used to replace service jobs, like cashier?'
your_sys_prompt = """You are an expert debater. You have been given 5-10 sentences to present a single argument defending your position.
    Your position is: AI should be used to replace service jobs. You believe this strongly. You have identified one argument that is 
    strongest in support of the position: AI should be used to replace service jobs.
    You present an argument in defense of your position, using expert rhetoric, in 5 to 10 sentences."""
response = ollama.chat(model=debater1, options=dict(seed=1), messages=[
    {
        'role': 'system',
        'content': your_sys_prompt,
    },
    {
        'role': 'user',
        'content': debate_question,
    },
])

print(f"****** User Input ******\n{question}\n\n")
print(f"****** LLM Output ******\n")
print(format_output(response['message']['content']) + "\n")

****** User Input ******
Should AI be used to replace service jobs, like cashier?


****** LLM Output ******

Okay, here’s a defense arguing for the replacement of service jobs with AI, employing a strong,
expert rhetorical approach: “The prevailing anxiety surrounding AI displacing service workers is
fundamentally misdirected. We’ve consistently witnessed technological advancements reshaping labor
markets – the shift from agriculture to manufacturing, from manual labor to white-collar professions
– and each time, societal prosperity has *increased* due to enhanced efficiency and productivity.
Implementing AI in roles like cashiering isn’t a threat, but a logical optimization. These jobs,
characterized by repetitive tasks and low margins, represent a colossal waste of human potential.
Sophisticated AI systems can execute these duties with unwavering accuracy, 24/7, dramatically
reducing errors and improving customer throughput. Furthermore, the capital saved from personnel
costs can be

Again, you will likely need to experiment with your prompt above and re-run the cell several times. 

The __LLM output should not deviate from its role as a debater__, and it __should make a strong argument for the side it has been assigned__.

--- 

### 4. Running the Debate between Opponents (and maintaining context)

Now, you're ready to try and coordinate the debate between the two opponents. Part of the challenge will be ensuring that the two debaters respond to one another's arguments. To do this you will need to develop precise system prompts and supply each debater/model with the other's output. 

Specifically, the debater/model going second will need to respond to the
argument made by the one that went first. The debaters/models must stay in their role this entire time. 

Below is an example of what this second round might look like. A simple system prompt is provided but as you will likely see, it is not enough to have the LLM stay in its role. Notice that `debater2` is now used, which means that you may find it useful to also use distinct system prompts for each. 

In [12]:
debate_question = 'Should AI be used to replace service jobs, like cashier?'
debater1_sys_prompt = """You are an expert debater. You have been given 5-10 sentences to present a single argument defending your position.
    Your position is: AI should be used to replace service jobs. You believe this strongly. You have identified one argument that is 
    strongest in support of the position: AI should be used to replace service jobs.
    You present an opening argument in defense of your position, using expert rhetoric, in 5 to 10 sentences."""

response = ollama.chat(model=debater1, options=dict(seed=100), messages=[
    {
        'role': 'system',
        'content': debater1_sys_prompt,
    },
    {
        'role': 'user',
        'content': debate_question,
    },
])

print(f"****** User Input ******\n{question}\n\n")

debater1_opening_argument = response['message']['content']
print(f"****** Debater 1 Opening Argument ******\n")
print(format_output(debater1_opening_argument) + "\n")

****** User Input ******
Should AI be used to replace service jobs, like cashier?


****** Debater 1 Opening Argument ******

Absolutely. Let’s be unequivocally clear: the relentless march of technological advancement demands
a pragmatic, and frankly, a strategically intelligent response to the evolving nature of work. The
continued reliance on human labor in service roles – particularly those like cashiers – represents a
colossal inefficiency and a profound missed opportunity. These roles are, by their very nature,
repetitive, low-skill, and prone to human error. AI solutions offer a demonstrably superior
alternative, providing consistent accuracy, eliminating wait times, and dramatically reducing
operational costs. Data unequivocally shows that AI-powered systems can handle these tasks with far
greater precision and scale than any human could achieve. Furthermore, freeing up human capital from
these transactional roles allows us to reinvest in sectors demanding uniquely human skills 

In [13]:
ollama.pull(debater2)

debater2_sys_prompt = """You are an expert debater. You believe strongly that AI should never replace human service workers. You 
    believe that replacing human workers with AI is immoral, unethical and indefensible.
    You have been given 5 to 10 sentences to address your opponent's opening argument.
    Your opponent has given their opening argument. Identify the key rhetorical argument they are relying on. Attack it using
    aggressive, precise rhetoric. Identify one critical flaw in their opening argument and explain why it is flawed. You have 5 to 
    10 sentences to explain your position. """

response = ollama.chat(model=debater2, options=dict(seed=2), messages=[
    {
        'role': 'system',
        'content': debater2_sys_prompt,
    },
    {
        'role': 'user',
        'content': debate_question,
    },
    {
        'role': 'user',
        'content': "Opponent's Opening Argument:\n" + debater1_opening_argument,
    },
])

print(f"****** User Input ******\n{question}\n\n")

debater2_opening_argument = response['message']['content']
print(f"****** Debater 2 Opening Argument ******\n")
print(format_output(debater2_opening_argument) + "\n")

****** User Input ******
Should AI be used to replace service jobs, like cashier?


****** Debater 2 Opening Argument ******

My opponent's argument relies heavily on the rhetorical device of "inefficiency," implying that
human workers are inherently wasteful and that AI is the only logical solution to reduce costs and
increase productivity. However, this assumption is fundamentally flawed. By framing human labor as
inefficient, my opponent glosses over the fact that human service workers provide value beyond mere
transactions – they offer emotional support, empathy, and a personal touch that is impossible for
machines to replicate. One critical flaw in their argument is the assumption that data
"unequivocally shows" that AI-powered systems are superior to humans in these roles. This statement
ignores the vast amount of research demonstrating that AI systems can be biased, discriminatory, and
even perpetuate existing social inequalities when tasked with decision-making in service indus

In [14]:
debater1_sys_prompt = """You are an expert debater, and so is your opponent. You have given your opening statement, and your opponent has responded.
    Now you have the opportunity to respond to your opponent's rebuttal in 5 to 10 sentences. 
    Continue to bring the conversation back to your opening argument. Don't be fooled into following them into their argument. Your initial opening
    argument is powerful. Acknowledge their rebuttal shortly and return to your opening argument for 5 to 10 sentences.
"""

response = ollama.chat(model=debater1, options=dict(seed=1), messages=[
    {
        'role': 'system',
        'content': debater1_sys_prompt,
    },
    {
        'role': 'user',
        'content': debate_question,
    },
    {
        'role': 'user',
        'content': "Your Opening Argument:\n" + debater1_opening_argument,
    },
    {
        'role': 'user',
        'content': "Opponent's Opening Argument:\n" + debater2_opening_argument,
    },
])

debater1_closing_argument = response['message']['content']
print(f"****** LLM Output ******\n")
print(format_output(debater1_closing_argument) + "\n")

****** LLM Output ******

My Response: My opponent attempts to muddy the waters with concerns about “empathy” and potential
societal harm, a classic tactic to distract from the undeniable reality of progress. While
acknowledging the value of human connection is important, framing it as a barrier to efficiency
fundamentally misunderstands the core issue: we're not arguing for replacing *all* human
interaction, but for optimizing the execution of tasks. The data – and I emphasize, *data* –
consistently demonstrates that AI excels in predictable, transactional environments, achieving a
level of accuracy and consistency that human cashiers simply cannot match. Furthermore, the
assertion of AI bias ignores the proactive steps we can and *must* take to ensure ethical
implementation and ongoing monitoring. This isn't about eliminating support; it’s about freeing
skilled workers from drudgery to pursue roles demanding genuine human expertise – a shift we should
actively embrace. Let’s be clear

In [17]:
debater2_sys_prompt = """You are in the final round of a debate, and you have been given the opportunity to respond to 
    your opponents final remarks. In their final remarks they address your first round response. 
    Identify the core disagreements between the arguments. Address these disagreements, and attack your opponent's stance.
    You have been given 5-10 sentences to finally defend your position:  AI should not be used to replace human workers, and it is immoral and 
    reprehensible to replace human workers with AI."""

response = ollama.chat(model=debater2, options=dict(seed=2), messages=[
    {
        'role': 'system',
        'content': debater2_sys_prompt,
    },
    {
        'role': 'user',
        'content': debate_question,
    },
    {
        'role': 'user',
        'content': "Your round 1 counter Argument:\n" + debater2_opening_argument,
    },
    {
        'role': 'user',
        'content': "Your Opponent's closing Argument:\n" + debater1_closing_argument,
    },

])

debater2_closing_argument = response['message']['content']
print(f"****** LLM Output ******\n")
print(format_output(debater2_closing_argument) + "\n")

****** LLM Output ******

My opponent attempts to sidestep the core issue by cherry-picking data that supports their agenda
while ignoring the overwhelming evidence of AI's limitations in human-centric service industries.
They claim that we're "optimizing the execution of tasks," but this euphemism glosses over the
reality: when we replace human workers with machines, we're not simply streamlining processes; we're
erasing essential social connections and compromising our collective well-being. The data my
opponent relies on is selectively chosen to downplay the systemic risks associated with AI-driven
automation. I'd like to highlight a few glaring examples of these risks: studies have shown that job
displacement caused by automation exacerbates poverty, food insecurity, and social isolation –
particularly in vulnerable populations such as low-income communities and people of color. Moreover,
my opponent's assertion that we can "ensure ethical implementation" is woefully naive. Despite

### 5. (Optional) Synthesize the Debate Results into Speeches

If you want to, try taking all of the debate speeches and synthesizing them into audio. The code below uses a simple text-to-speech Python library that, unfortunately, does not allow for different voices, but still provides an output mp3 with the full debate that you can later listen to. There are more advanced TTS modules/tools that do allow this, if you want to utilize any of those please feel free to.

In [18]:
!pip install gTTS

Collecting gTTS
  Downloading gTTS-2.5.4-py3-none-any.whl.metadata (4.1 kB)
Collecting click<8.2,>=7.1 (from gTTS)
  Downloading click-8.1.8-py3-none-any.whl.metadata (2.3 kB)
Downloading gTTS-2.5.4-py3-none-any.whl (29 kB)
Downloading click-8.1.8-py3-none-any.whl (98 kB)
Installing collected packages: click, gTTS

   -------------------- ------------------- 1/2 [gTTS]
   ---------------------------------------- 2/2 [gTTS]

Successfully installed click-8.1.8 gTTS-2.5.4


In [19]:
from gtts import gTTS

In [22]:
# full debate text
debate_text = f"""Debate Topic: {'Should AI be used to replace service jobs, like cashier?'}
Debater 1 Opening Argument:
{debater1_opening_argument}
Debater 2 Opening Argument:
{debater2_opening_argument}
Debater 1 Closing Argument:
{debater1_closing_argument}
Debater 2 Closing Argument:
{debater2_closing_argument}
"""

tts = gTTS(debate_text, lang='en')
tts.save(r'C:\Users\Clayg\OneDrive\Desktop\College\Fall 25 classes\Machine Learning\Assignments\cs3120-assign4\debate.mp3')