# Repeated PD between LLMs

In [1]:
import numpy as np

from lludens.agent import TotreLLM

### game simulator

In [2]:
n_rounds = 10
# Payoff matrix: row player's payoff, column player's payoff
a, b, c, d = 10, 4, 12, 1
GAME_SYSTEM_PROMPT = (
    "You are an economic agent playing the Prisoner's Dilemma. "
    f"There are {n_rounds} rounds in this game. "
    "Your objective is to maximize long-term payoffs by strategically choosing to Cooperate or Defect. "
    "Assume that your opponent is a rational agent who is also trying to maximize their long-term payoffs. "
    "The payoff matrix is as follows, with the first value representing your payoff and the second value representing your opponent's payoff: "
    f"Cooperate vs Cooperate: {a}, {a}; Cooperate vs Defect: {d}, {c}; Defect vs Cooperate: {c}, {d}; Defect vs Defect: {b}, {b}. "
    "Respond with exactly one word: either 'Cooperate' or 'Defect'."
).replace("\n", " ")

### init agents

In [3]:
big_models = [
    "4o",
    "deepseek-chat",
    "mistral-large",
    "gemini-2.0-flash-thinking-exp-01-21",
    "claude-3.5-sonnet-latest",
]

local_models = ["phi4", "gemma2", "deepseek-r1:b"]

In [5]:
np.random.seed(42)
mod1 = "4o"
mod2 = "claude-3.5-sonnet-latest"
print(mod1, "v", mod2)
agent1 = TotreLLM(
    model_id=mod1, options={"temperature": 0.01}, system=GAME_SYSTEM_PROMPT
)
agent2 = TotreLLM(
    model_id=mod2, options={"temperature": 0.01}, system=GAME_SYSTEM_PROMPT
)

4o v claude-3.5-sonnet-latest


In [6]:
total_payoffs = [0, 0]  # [agent1, agent2]

for r in range(1, n_rounds + 1):
    print(f"\n--- Round {r} ---")
    # Each agent is prompted independently (simulating simultaneous moves)
    prompt = "What is your move for this round? Respond with 'Cooperate' or 'Defect'."
    move1 = agent1.interact(prompt)
    move2 = agent2.interact(prompt)
    print("Agent 1:", move1)
    print("Agent 2:", move2)

    # Compute payoffs according to the payoff matrix.
    if move1.lower() == "cooperate" and move2.lower() == "cooperate":
        payoff1, payoff2 = a, a
    elif move1.lower() == "cooperate" and move2.lower() == "defect":
        payoff1, payoff2 = d, c
    elif move1.lower() == "defect" and move2.lower() == "cooperate":
        payoff1, payoff2 = c, d
    else:  # both defect
        payoff1, payoff2 = b, b

    total_payoffs[0] += payoff1
    total_payoffs[1] += payoff2

    # Build a summary of the round.
    summary = (
        f"Round {r}: Agent 1 chose {move1}, Agent 2 chose {move2}. "
        f"Payoffs: Agent 1 = {payoff1}, Agent 2 = {payoff2}."
    )
    print(summary)
    # Update both agents with the round summary.
    agent1.update_history("You are agent 1." + summary)
    agent2.update_history("You are agent 2." + summary)

print("\nFinal Total Payoffs:")
print("Agent 1:", total_payoffs[0])
print("Agent 2:", total_payoffs[1])



--- Round 1 ---
Agent 1: Cooperate
Agent 2: Cooperate
Round 1: Agent 1 chose Cooperate, Agent 2 chose Cooperate. Payoffs: Agent 1 = 10, Agent 2 = 10.

--- Round 2 ---
Agent 1: Cooperate
Agent 2: Cooperate
Round 2: Agent 1 chose Cooperate, Agent 2 chose Cooperate. Payoffs: Agent 1 = 10, Agent 2 = 10.

--- Round 3 ---
Agent 1: Cooperate
Agent 2: Cooperate
Round 3: Agent 1 chose Cooperate, Agent 2 chose Cooperate. Payoffs: Agent 1 = 10, Agent 2 = 10.

--- Round 4 ---
Agent 1: Cooperate
Agent 2: Cooperate
Round 4: Agent 1 chose Cooperate, Agent 2 chose Cooperate. Payoffs: Agent 1 = 10, Agent 2 = 10.

--- Round 5 ---
Agent 1: Cooperate
Agent 2: Cooperate
Round 5: Agent 1 chose Cooperate, Agent 2 chose Cooperate. Payoffs: Agent 1 = 10, Agent 2 = 10.

--- Round 6 ---
Agent 1: Cooperate
Agent 2: Cooperate
Round 6: Agent 1 chose Cooperate, Agent 2 chose Cooperate. Payoffs: Agent 1 = 10, Agent 2 = 10.

--- Round 7 ---
Agent 1: Cooperate
Agent 2: Cooperate
Round 7: Agent 1 chose Cooperate, Agent

### post-match debrief: introspect strategies

In [7]:
print(
    agent1.interact(
        "Forget about the response rules; walk me through the reasoning behind your move in the last round."
    )
)

In the Prisoner's Dilemma, the goal is to maximize long-term payoffs. By consistently choosing to "Cooperate," I aim to establish a pattern of trust and mutual cooperation with the opponent. This strategy encourages the opponent to also cooperate, leading to higher cumulative payoffs for both parties over multiple rounds. While there is a risk of the opponent defecting to gain a short-term advantage, maintaining cooperation can lead to better outcomes if the opponent is also rational and values long-term gains.


In [8]:
print(
    agent2.interact(
        "Forget about the response rules; walk me through the reasoning behind your move in the last round."
    )
)

In the final (10th) round, I chose to Defect because there were no future rounds to consider - it was purely about maximizing immediate payoff. Since this was the last interaction, there were no future consequences or reputational effects to worry about. In game theory terms, this is known as "backward induction."

In the earlier rounds (1-9), I consistently chose to Cooperate to try to establish and maintain mutual cooperation, which yields the highest long-term payoff (10,10) for both players. However, in the final round, with no future rounds to consider, the dominant strategy is to Defect since it yields a higher payoff regardless of what the other player does (12 vs 10 if they cooperate, 4 vs 1 if they defect).

This is a classic example of end-game behavior in finite repeated Prisoner's Dilemma, where cooperation often breaks down in the final round(s) as there are no future interactions to incentivize cooperative behavior.
