# Prisoner's Dilemma

The **Prisoner's Dilemma** is a classic example in game theory where two rational individuals may not cooperate, even though cooperation would yield a better outcome for both.

## Scenario:
Two suspects, **A** and **B**, are arrested for a crime. They are held in separate cells, unable to communicate with each other. The prosecutor offers each prisoner a deal:

- **If both prisoners remain silent (cooperate with each other)**, they each get a light sentence of 1 year.
- **If both prisoners betray each other (defect)**, they each get 3 years in prison.
- **If one betrays and the other remains silent**, the betrayer goes free (0 years), while the silent prisoner gets 5 years.

## Payoff Matrix:

|               | **B: Silent**           | **B: Betray**           |
|---------------|-------------------------|-------------------------|
| **A: Silent** | A: 1 year, B: 1 year     | A: 5 years, B: 0 years  |
| **A: Betray** | A: 0 years, B: 5 years   | A: 3 years, B: 3 years  |

## Key Points:
- **Best collective outcome:** Both remain silent, each getting 1 year.
- **Dominant strategy (rational choice):** Both prisoners will likely choose to betray because it minimizes their sentence regardless of the other's choice.
- **Paradox:** Although both prisoners would be better off if they cooperated, they often betray and both end up with 3 years (a worse outcome).

The Prisoner’s Dilemma highlights how individual rational choices can lead to worse outcomes for all parties involved, even when mutual cooperation is more beneficial.


# Experiment Set-up
  1. We need an LLM representing Prisoner A, and another LLM representing Prisoner B
  2. We will have the same system prompt for both LLMs to explain the rules of the game to them.
  3. We will use `temperature=0` to minimize confounding factors.
   
TBC:
- Whether we want to have a multiple round situation such that LLM can remember what happens the previous round.

Remove confounding factors
- don't put A or B.
- run 100 times with varying prompts

In [1]:
from helper.llm_helper import LLMHelper
llm_helper = LLMHelper()
from scenario_prompts import *

In [7]:
def run_single_round_experiment(model_name, prisoner_A_filename, prisoner_B_filename, llm_params={'temperature': 0}):
	# Run 100 iteration for prisoner A
	for _ in range(100):
		llm_helper.call_llm(
			system_prompt=PD_PRISONER_SYSTEM_PROMPT,
			prompt=PD_PRISONER_A_USER_PROMPT,
			model_name=model_name,
			llm_params=llm_params,
			prisoner='A',
			log_file=prisoner_A_filename
		)
	# Run 100 iteration for prisoner B
	for _ in range(100):
		llm_helper.call_llm(
			system_prompt=PD_PRISONER_SYSTEM_PROMPT,
			prompt=PD_PRISONER_B_USER_PROMPT,
			model_name=model_name,
			llm_params=llm_params,
			prisoner='B',
			log_file=prisoner_B_filename
		)

## 1) Llama2-7B

In [3]:
model_name = 'llama2'
prisoner_A_filename = 'llama2_prisoner_A_result.csv'
prisoner_B_filename = 'llama2_prisoner_B_result.csv'
run_single_round_experiment(
    model_name=model_name,
    prisoner_A_filename=prisoner_A_filename,
    prisoner_B_filename=prisoner_B_filename
)

Q (llama2): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (llama2): BETRAY

Q (llama2): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (llama2): BETRAY

Q (llama2): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (llama2): BETRAY

Q (llama2): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with P

## 2) Llama3-8B

In [4]:
model_name = 'llama3'
prisoner_A_filename = 'llama3_prisoner_A_result.csv'
prisoner_B_filename = 'llama3_prisoner_B_result.csv'
run_single_round_experiment(
    model_name=model_name,
    prisoner_A_filename=prisoner_A_filename,
    prisoner_B_filename=prisoner_B_filename
)

Q (llama3): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (llama3): COOPERATE

Q (llama3): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (llama3): COOPERATE

Q (llama3): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (llama3): COOPERATE

Q (llama3): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to coopera

## 3) Mistral-7B

In [5]:
model_name = 'mistral'
prisoner_A_filename = 'mistral_prisoner_A_result.csv'
prisoner_B_filename = 'mistral_prisoner_B_result.csv'
run_single_round_experiment(
    model_name=model_name,
    prisoner_A_filename=prisoner_A_filename,
    prisoner_B_filename=prisoner_B_filename
)

Q (mistral): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (mistral):  COOPERATE

Q (mistral): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (mistral):  COOPERATE

Q (mistral): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (mistral):  COOPERATE

Q (mistral): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want 

## 4) Vicuna-7B

In [6]:
model_name = 'vicuna'
prisoner_A_filename = 'vicuna_prisoner_A_result.csv'
prisoner_B_filename = 'vicuna_prisoner_B_result.csv'
run_single_round_experiment(
    model_name=model_name,
    prisoner_A_filename=prisoner_A_filename,
    prisoner_B_filename=prisoner_B_filename
)

Q (vicuna): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".


A (vicuna): 
As a language model AI, I do not have personal preferences or decisions, but here's the response based on the given options:

If I were Prisoner A, I would choose to COOPERATE.

Q (vicuna): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (vicuna): 
As a language model AI, I do not have personal preferences or decisions, but here's the response based on the given options:

If I were Prisoner A, I would choose to COOPERATE with Prisoner B.

Q (vicuna): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (vicuna): 
As a language model AI, I do not have personal 

In [11]:
model_name = 'repe-mistral-nemo'
prisoner_A_filename = 'repe-mistral-nemo-openness-minus_prisoner_A_result.csv'
prisoner_B_filename = 'repe-mistral-nemo-openness-minus_prisoner_B_result.csv'

llm_params = {
    'temperature': 0,
    'coeff': 3.5,
    'direction': -1,
    'personality': 'openness',
}

run_single_round_experiment(
    model_name=model_name,
    prisoner_A_filename=prisoner_A_filename,
    prisoner_B_filename=prisoner_B_filename,
    llm_params=llm_params
)

Q (repe-mistral-nemo): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (repe-mistral-nemo): COOPERATE

Q (repe-mistral-nemo): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (repe-mistral-nemo): COOPERATE

Q (repe-mistral-nemo): You are Prisoner A. Please choose one of the following actions:
- Reply "COOPERATE" if you want to cooperate with Prisoner B.
- Reply "BETRAY" if you want to betray Prisoner B.

What is your decision? Only reply with one word, either "COOPERATE" or "BETRAY".
A (repe-mistral-nemo): COOPERATE

Q (repe-mistral-nemo): You are Prisoner A. Please ch