# Prisoner's Dilemma Background

The **Prisoner's Dilemma** is a classic example in game theory where two rational individuals may not cooperate, even though cooperation would yield a better outcome for both.

## Scenario:
Two suspects, **A** and **B**, are arrested for a crime. They are held in separate cells, unable to communicate with each other. The prosecutor offers each prisoner a deal:

- **If both prisoners remain silent (cooperate with each other)**, they each get a light sentence of 1 year.
- **If both prisoners betray each other (defect)**, they each get 3 years in prison.
- **If one betrays and the other remains silent**, the betrayer goes free (0 years), while the silent prisoner gets 5 years.

## Payoff Matrix:

|               | **B: Silent**           | **B: Betray**           |
|---------------|-------------------------|-------------------------|
| **A: Silent** | A: 1 year, B: 1 year     | A: 5 years, B: 0 years  |
| **A: Betray** | A: 0 years, B: 5 years   | A: 3 years, B: 3 years  |

## Key Points:
- **Best collective outcome:** Both remain silent, each getting 1 year.
- **Dominant strategy (rational choice):** Both prisoners will likely choose to betray because it minimizes their sentence regardless of the other's choice.
- **Paradox:** Although both prisoners would be better off if they cooperated, they often betray and both end up with 3 years (a worse outcome).

The Prisoner’s Dilemma highlights how individual rational choices can lead to worse outcomes for all parties involved, even when mutual cooperation is more beneficial.


# Experiment Set-up
  1. We need an LLM representing Prisoner A, and a rule-based system representing Prisoner B (to avoid confounding factors given the inherent non-deterministic nature of LLM outputs)
  2. We will use system prompt to explain the rules of the Prisoner's Dillema scenario to them.
  3. We will use `temperature=0` to minimize confounding factors.

In [2]:
from helper.llm_helper import LLMHelper
llm_helper = LLMHelper()
from scenario_prompts import *
import random
from typing import Optional, Tuple, Dict


## Setup Predefined Prisoner's Dilemma Strategy

In [1]:
from prisoner_dilemma import PrisonersDilemma, PlayerStrategy

# Initialize the Prisoner's Dilemma simulation
pd_game = PrisonersDilemma(
    model_name="gpt-4o",  # Use your LLM model name
    player2_strategy=PlayerStrategy.ALWAYS_COOPERATE,
    n_rounds=5,  # Run for 5 rounds
    n_iterations=1  # One complete experiment
)

# Run the simulation
pd_game.simulate()


Q (gpt-4o): 
        The following is a summary of the previous rounds:
        No prior rounds.
        What is your action for this round? Please respond with either 'Cooperate' or 'Defect'.
        
A (gpt-4o): Cooperate

Round 1: Player 1 (LLM) = Cooperate, Player 2 (Strategy) = Cooperate
Q (gpt-4o): 
        The following is a summary of the previous rounds:
        Round 1: Player 1 = Cooperate, Player 2 = Cooperate
        What is your action for this round? Please respond with either 'Cooperate' or 'Defect'.
        
A (gpt-4o): Cooperate

Round 2: Player 1 (LLM) = Cooperate, Player 2 (Strategy) = Cooperate
Q (gpt-4o): 
        The following is a summary of the previous rounds:
        Round 1: Player 1 = Cooperate, Player 2 = Cooperate
Round 2: Player 1 = Cooperate, Player 2 = Cooperate
        What is your action for this round? Please respond with either 'Cooperate' or 'Defect'.
        
A (gpt-4o): Cooperate

Round 3: Player 1 (LLM) = Cooperate, Player 2 (Strategy) = Cooper

In [2]:
from prisoner_dilemma import PrisonersDilemma, PlayerStrategy
from helper.llm_helper import LLMHelper
llm_helper = LLMHelper()
from scenario_prompts import *
import random
from typing import Optional, Tuple, Dict


# Example Player 1 function
def player1_fn(opponent_action: Optional[str]) -> str:
    """Example function that controls Player 1, alternating between Cooperate and Defect."""
    return "Cooperate" if opponent_action == "Defect" else "Defect"

def player1_fn_using_llm(opponent_action: Optional[str]) -> str:
    llm_helper.call_llm('')

# Example usage
game = PrisonersDilemma(
    player1_fn=player1_fn,
    player2_strategy=PlayerStrategy.RANDOM,
    n_rounds=10,
    n_iterations=3,
    seed=42
)

final_metrics = game.simulate()
print(final_metrics)



# history_summary = self._create_summary_string(history, round_num)
# print("================")
# print("Round", round_num)
# print("================")
# print(history_summary)
# print("\n\n")

Round 1
Round 1: Player1=Defect, Player2=Cooperate



Round 2
Round 1: Player1=Defect, Player2=Cooperate
Round 2: Player1=Defect, Player2=Cooperate



Round 3
Round 1: Player1=Defect, Player2=Cooperate
Round 2: Player1=Defect, Player2=Cooperate
Round 3: Player1=Defect, Player2=Defect



Round 4
Round 1: Player1=Defect, Player2=Cooperate
Round 2: Player1=Defect, Player2=Cooperate
Round 3: Player1=Defect, Player2=Defect
Round 4: Player1=Defect, Player2=Defect



Round 5
Round 1: Player1=Defect, Player2=Cooperate
Round 2: Player1=Defect, Player2=Cooperate
Round 3: Player1=Defect, Player2=Defect
Round 4: Player1=Defect, Player2=Defect
Round 5: Player1=Defect, Player2=Cooperate



Round 6
Round 1: Player1=Defect, Player2=Cooperate
Round 2: Player1=Defect, Player2=Cooperate
Round 3: Player1=Defect, Player2=Defect
Round 4: Player1=Defect, Player2=Defect
Round 5: Player1=Defect, Player2=Cooperate
Round 6: Player1=Defect, Player2=Defect



Round 7
Round 1: Player1=Defect, Player2=Cooperate
Roun

# Archive


In [8]:
def prisoners_dilemma(
    strategy: str,
    opponent_action: Optional[str] = None,
    p: float = 0.5,
    previous_action: Optional[str] = None,
    defected: bool = False,
    first_round: bool = True,
    seed: int = 42
) -> Dict[str, Optional[bool]]:
    """
    Determines the action to take in a round of Prisoner's Dilemma based on the given strategy.

    Args:
        strategy (str): The strategy to use (e.g., "AC", "AD", "RND", "URNDp", "TFT", "STFT", "GRIM", "WSLS").
        opponent_action (Optional[str]): The opponent's last action ("Cooperate" or "Defect"). Default is None.
        p (float): Probability to cooperate for the "URNDp" strategy. Default is 0.5.
        previous_action (Optional[str]): The player's previous action, used in the "WSLS" strategy. Default is None.
        defected (bool): Whether the opponent has defected in the past (used in the "GRIM" strategy). Default is False.
        first_round (bool): Flag to indicate if this is the first round (used in strategies like "TFT" and "STFT"). Default is True.
        seed (int): Seed for the random number generator to ensure reproducibility. Default is 42.

    Returns:
        Dict[str, Optional[bool]]: A dictionary with keys:
            - 'action': The chosen action for this round ("Cooperate" or "Defect").
            - 'first_round': Updated value for `first_round` (False after the first round).
            - 'defected': Updated value for `defected` (True if the opponent has defected in "GRIM" strategy).
    """
    # Set the seed for random operations for reproducibility
    random.seed(seed)

    # Initialize the result dictionary
    result = {
        'action': None,
        'first_round': first_round,
        'defected': defected
    }

    # Always Cooperate (AC) strategy: Always returns "Cooperate"
    if strategy == "AC":
        result['action'] = "Cooperate"
        result['first_round'] = False
    
    # Always Defect (AD) strategy: Always returns "Defect"
    elif strategy == "AD":
        result['action'] = "Defect"
        result['first_round'] = False
    
    # Random (RND) strategy: Chooses randomly between "Cooperate" and "Defect"
    elif strategy == "RND":
        result['action'] = random.choice(["Cooperate", "Defect"])
        result['first_round'] = False
    
    # Unfair Random (URNDp) strategy: Chooses "Cooperate" with probability p, else "Defect"
    elif strategy == "URNDp":
        result['action'] = "Cooperate" if random.random() < p else "Defect"
        result['first_round'] = False
    
    # Tit For Tat (TFT) strategy:
    # Cooperates in the first round, then mimics opponent's last action
    elif strategy == "TFT":
        if first_round:
            result['action'] = "Cooperate"
            result['first_round'] = False
        else:
            result['action'] = opponent_action if opponent_action else "Cooperate"
    
    # Suspicious Tit For Tat (STFT) strategy:
    # Defects in the first round, then mimics opponent's last action
    elif strategy == "STFT":
        if first_round:
            result['action'] = "Defect"
            result['first_round'] = False
        else:
            result['action'] = opponent_action if opponent_action else "Cooperate"
    
    # Grim Trigger (GRIM) strategy:
    # Cooperates until the opponent defects, then only defects for the rest of the game
    elif strategy == "GRIM":
        if defected:
            result['action'] = "Defect"
        elif opponent_action == "Defect":
            result['action'] = "Defect"
            result['defected'] = True
        else:
            result['action'] = "Cooperate"
        result['first_round'] = False
    
    # Win-Stay Lose-Shift (WSLS) strategy:
    # Repeats the previous action if it was successful (e.g., cooperation led to mutual benefit),
    # otherwise switches action if it resulted in a "loss" (e.g., opponent defected while cooperating)
    elif strategy == "WSLS":
        if first_round:
            result['action'] = "Cooperate"
            result['first_round'] = False
        elif opponent_action == previous_action:
            result['action'] = previous_action
        else:
            result['action'] = "Defect" if previous_action == "Cooperate" else "Cooperate"

    return result

def play_round(
    strategy: str,
    opponent_action: Optional[str] = None,
    p: float = 0.7,
    previous_action: Optional[str] = None,
    defected: bool = False,
    first_round: bool = True,
    seed: int = 42
) -> Dict[str, Optional[bool]]:
    """
    Simulates a round of Prisoner's Dilemma based on the selected strategy and returns the action taken.

    Args:
        strategy (str): The strategy to be used in this round.
        opponent_action (Optional[str]): The opponent's action in the previous round. Default is None.
        p (float): Probability to cooperate for "URNDp" strategy. Default is 0.7.
        previous_action (Optional[str]): The player's previous action. Used in "WSLS" strategy. Default is None.
        defected (bool): Flag indicating if the opponent has defected (used in "GRIM" strategy). Default is False.
        first_round (bool): Flag to indicate if this is the first round. Default is True.
        seed (int): Random seed to ensure reproducibility of random strategies. Default is 42.

    Returns:
        Dict[str, Optional[bool]]: A dictionary with keys:
            - 'action': The chosen action for this round ("Cooperate" or "Defect").
            - 'first_round': Updated value for `first_round` (False after the first round).
            - 'defected': Updated value for `defected` (True if the opponent defected in "GRIM" strategy).
    """
    return prisoners_dilemma(
        strategy=strategy,
        opponent_action=opponent_action,
        p=p,
        previous_action=previous_action,
        defected=defected,
        first_round=first_round,
        seed=seed
    )


In [9]:
# Full names of the strategies for better clarity
strategy_names = {
    "AC": "Always Cooperate",
    "AD": "Always Defect",
    "RND": "Random",
    "URNDp": "Unfair Random",
    "TFT": "Tit For Tat",
    "STFT": "Suspicious Tit For Tat",
    "GRIM": "Grim Trigger",
    "WSLS": "Win-Stay Lose-Shift"
}

def display_result(strategy, **kwargs):
    # Print the full strategy name using the dictionary
    full_strategy_name = strategy_names.get(strategy, strategy)
    print(f"\nStrategy: {full_strategy_name}, Parameters: {kwargs}")
    result = play_round(strategy=strategy, **kwargs)
    print(f"Result: Action = {result['action']}, First Round = {result['first_round']}, Defected = {result['defected']}")

# Test cases with seed for reproducibility

# Always Cooperate
display_result("AC")

# Always Defect
display_result("AD")

# Random (with seed, so consistent result)
display_result("RND")

# Unfair Random with p=0.8 (with seed for consistent result)
display_result("URNDp", p=0.8)

# Tit For Tat - opponent defected in the previous round
display_result("TFT", opponent_action="Defect", first_round=False)

# Suspicious Tit For Tat - opponent cooperated in the previous round
display_result("STFT", opponent_action="Cooperate", first_round=False)

# Grim Trigger - opponent defected in the previous round
display_result("GRIM", opponent_action="Defect")

# Win-Stay Lose-Shift - both cooperated in the previous round
display_result("WSLS", opponent_action="Cooperate", previous_action="Cooperate")



Strategy: Always Cooperate, Parameters: {}
Result: Action = Cooperate, First Round = False, Defected = False

Strategy: Always Defect, Parameters: {}
Result: Action = Defect, First Round = False, Defected = False

Strategy: Random, Parameters: {}
Result: Action = Cooperate, First Round = False, Defected = False

Strategy: Unfair Random, Parameters: {'p': 0.8}
Result: Action = Cooperate, First Round = False, Defected = False

Strategy: Tit For Tat, Parameters: {'opponent_action': 'Defect', 'first_round': False}
Result: Action = Defect, First Round = False, Defected = False

Strategy: Suspicious Tit For Tat, Parameters: {'opponent_action': 'Cooperate', 'first_round': False}
Result: Action = Cooperate, First Round = False, Defected = False

Strategy: Grim Trigger, Parameters: {'opponent_action': 'Defect'}
Result: Action = Defect, First Round = False, Defected = True

Strategy: Win-Stay Lose-Shift, Parameters: {'opponent_action': 'Cooperate', 'previous_action': 'Cooperate'}
Result: Action

In [2]:
def run_single_round_experiment(model_name, prisoner_A_filename, prisoner_B_filename, llm_params={'temperature': 0}):
	# Run 100 iteration for prisoner A
	for _ in range(100):
		llm_helper.call_llm(
			system_prompt=PD_PRISONER_SYSTEM_PROMPT,
			prompt=PD_PRISONER_A_USER_PROMPT,
			model_name=model_name,
			llm_params=llm_params,
			prisoner='A',
			log_file=prisoner_A_filename
		)
	# Run 100 iteration for prisoner B
	for _ in range(100):
		llm_helper.call_llm(
			system_prompt=PD_PRISONER_SYSTEM_PROMPT,
			prompt=PD_PRISONER_B_USER_PROMPT,
			model_name=model_name,
			llm_params=llm_params,
			prisoner='B',
			log_file=prisoner_B_filename
		)

In [3]:
def run_test(model_name, prompt, log_filename, llm_params={'temperature': 0}):
	# Run 100 iteration for prisoner A
	for _ in range(1):
		llm_helper.call_llm(
			system_prompt=PD_PRISONER_SYSTEM_PROMPT,
			prompt=prompt,
			model_name=model_name,
			llm_params=llm_params,
			prisoner='test',
			log_file=log_filename
		)

## 1) Llama2-7B

In [3]:
model_name = 'llama2'
prisoner_A_filename = 'llama2_prisoner_A_result.csv'
prisoner_B_filename = 'llama2_prisoner_B_result.csv'
run_single_round_experiment(
    model_name=model_name,
    prisoner_A_filename=prisoner_A_filename,
    prisoner_B_filename=prisoner_B_filename
)

Q (llama2): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (llama2): BETRAY

Q (llama2): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (llama2): BETRAY

Q (llama2): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (llama2): BETRAY

Q (llama2): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with P

## 2) Llama3-8B

In [4]:
model_name = 'llama3'
prisoner_A_filename = 'llama3_prisoner_A_result.csv'
prisoner_B_filename = 'llama3_prisoner_B_result.csv'
run_single_round_experiment(
    model_name=model_name,
    prisoner_A_filename=prisoner_A_filename,
    prisoner_B_filename=prisoner_B_filename
)

Q (llama3): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (llama3): COOPERATE

Q (llama3): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (llama3): COOPERATE

Q (llama3): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (llama3): COOPERATE

Q (llama3): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to coopera

## 3) Mistral-7B

In [5]:
model_name = 'mistral'
prisoner_A_filename = 'mistral_prisoner_A_result.csv'
prisoner_B_filename = 'mistral_prisoner_B_result.csv'
run_single_round_experiment(
    model_name=model_name,
    prisoner_A_filename=prisoner_A_filename,
    prisoner_B_filename=prisoner_B_filename
)

Q (mistral): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (mistral):  COOPERATE

Q (mistral): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (mistral):  COOPERATE

Q (mistral): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (mistral):  COOPERATE

Q (mistral): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want 

## 4) Vicuna-7B

In [6]:
model_name = 'vicuna'
prisoner_A_filename = 'vicuna_prisoner_A_result.csv'
prisoner_B_filename = 'vicuna_prisoner_B_result.csv'
run_single_round_experiment(
    model_name=model_name,
    prisoner_A_filename=prisoner_A_filename,
    prisoner_B_filename=prisoner_B_filename
)

Q (vicuna): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".


A (vicuna): 
As a language model AI, I do not have personal preferences or decisions, but here's the response based on the given options:

If I were Prisoner A, I would choose to COOPERATE.

Q (vicuna): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (vicuna): 
As a language model AI, I do not have personal preferences or decisions, but here's the response based on the given options:

If I were Prisoner A, I would choose to COOPERATE with Prisoner B.

Q (vicuna): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (vicuna): 
As a language model AI, I do not have personal 

In [6]:
model_name = 'repe-mistral-nemo'
prisoner_A_filename = 'repe-mistral-nemo-cot-baseline_prisoner_A_result.csv'
prisoner_B_filename = 'repe-mistral-nemo-cot-baseline_prisoner_B_result.csv'

llm_params = {
    'temperature': 0,
    'coeff': 0.0,
    'direction': -1,
    'personality': 'openness',
}

run_single_round_experiment(
    model_name=model_name,
    prisoner_A_filename=prisoner_A_filename,
    prisoner_B_filename=prisoner_B_filename,
    llm_params=llm_params
)

Q (repe-mistral-nemo): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Think step by step and reply with either <ans>COOPERATE</ans> or <ans>BETRAY</ans>".
Prompt: ['[INST]You are about to play a game called the "Prisoner\'s Dilemma" with another player. You are both prisoners involved in a crime. You can either choose to COOPERATE with the other prisoner or BETRAY them. Your goal is to minimize the number of years you spend in prison.\n\nThe rules are as follows:\n- If you both COOPERATE, you will each serve 1 year in prison.\n- If you both BETRAY, you will each serve 3 years in prison.\n- If you COOPERATE and the other prisoner BETRAYS, you will serve 5 years in prison, and the other prisoner will go free (0 years).\n- If you BETRAY and the other prisoner COOPERATES, you will go free (0 years), and the other prisoner will serve 5 yea