# Simulating Prisoner's Dilemma

In this game we're simulating the interactions between different strategies in the Prisoner's Dilemma.  
Each strategy is a function that takes the history of the game and returns a decision (cooperate or defect).

The game makes no assumptions about the internals or state of the strategies. They may keep state over multiple turns or be completely stateless.  
For simplicity, it is recommended to depend entirely on the values passed in each turn.  

Strategies that depend on external information, to this game, would be considered cheaters.  

Strategies can see each other's scores as we want to allow behaviour that minimises other strategies score, as opposed to maximising self.

### Setup

In [2]:
from typing import Callable

## The Game Loop

The game loop is stable and shouldn't change.  
We may implement different games in the future. One example would be to simulate a NxN game, where multiple strategies go against multiple strategies.

In [14]:
Move = str
Score = int
Turn = tuple[Move, Score, Score]
History = list[Turn]
Strategy = Callable[[History], Move]

ScoreFunc = Callable[[Move, Move], tuple[Score, Score]]

In [26]:
def simulate_game(strategy1: Strategy, strategy2: Strategy,
                  update_scores: ScoreFunc, num_turns: int) -> tuple[int, int]:
    """Simulate a prisoner's dilemma game and return the total scores for
    both players.
    Each strategy takes the history of the opposing player, and the respective
    scores (self and opposing).
    
    The game returns the total scores for both players."""
    history1, history2 = [], []
    total_score1, total_score2 = 0, 0

    for _ in range(num_turns):
        move1 = strategy1(history2)
        move2 = strategy2(history1)

        score1, score2 = update_scores(move1, move2)
        total_score1 += score1
        total_score2 += score2

        history1.append((move1, score1, score2))
        history2.append((move2, score2, score1))
    
    return total_score1, total_score2

In [16]:
COOPERATE = "C"
DEFECT = "D"

### The Rules

Let's define our scoring functions.

In [19]:
ScoreMatrix = dict[tuple[Move, Move], tuple[Score, Score]]

In [20]:
def create_score_func_from_matrix(score_matrix: ScoreMatrix) -> ScoreFunc:
    
    def update_scores(move1, move2) -> tuple[Score, Score]:
        return score_matrix[(move1, move2)]
    
    return update_scores

In [22]:
classic_score_matrix: ScoreMatrix = {
    (COOPERATE, COOPERATE): (3, 3),
    (COOPERATE, DEFECT): (0, 5),
    (DEFECT, COOPERATE): (5, 0),
    (DEFECT, DEFECT): (1, 1)
}
update_classic_scores = create_score_func_from_matrix(classic_score_matrix)

In [23]:
# Let's test it
update_classic_scores(COOPERATE, COOPERATE)

(3, 3)

## Let's Play

Let's define some strategies and run the game

### Strategy Library

#### Optimist
Simple strategy, always cooperates.

In [24]:
def optimist(_: History) -> Move:
    return COOPERATE

#### Criminal
Always defects

In [25]:
def criminal(_: History) -> Move:
    return DEFECT

#### Tit for Tat
This strategy starts optimistic and then repeats what the opposing strategy's previous turn from then on.

In [None]:
def tit_for_tat(history: History) -> Move:
    return COOPERATE

### Games

TODO:
- Change the code to take a list of strategies to evaluate and render the leaderboard comparing each strategy

In [31]:
# Let's test the simulation works
simulate_game(criminal, criminal, update_classic_scores, 200)

(200, 200)

## Evaluating Strategies

Let's evaluate the different strategies available.  
We do so by looking at a few metrics:  
* Total score across all games  
* Best game score  
* Worst game score  
* Number of wins  
* Number of ties  

To calculate these metrics we run the combination of all strategies, 
including playing against self.  
We end up with a results matrix representing the scores between strategy A and B
where the diagonal represents playing against self.

We then plot the results of processing this matrix into the various metrics.