# Payoff Matrices (part 1)

> This module contains payoff matrices for different evolutionary games
>
> Part 1 contains payoff matrices for the following games
> - DSAIR
> - DSAIR with peer punishment or reward
> - DSAIR with voluntary commitments
> - DSAIR with collective risk
>
> Note that all of the payoff matrices here are replications of the models from The Anh et al. 2020, 2021, 2022.

In [None]:
#| default_exp payoffs

In [None]:
#| hide
#| export
from nbdev.showdoc import *
from fastcore.test import test_eq, test_close
from gh_pages_example.utils import *
from gh_pages_example.types import *
import typing

import numpy as np
import nptyping

In [None]:
np.set_printoptions(suppress=True) # don't use scientific notation

## DSAIR Model Paramaters

| keyword | value type | range | optional | description | 
|---------|------------|-------|----------|-------------|
| b | NDArray | b > 0| | The size of the per round benefit of leading the AI development race|
| c | NDArray | c > 0| | The cost of implementing safety recommendations per round|
| s | NDArray | s > 1| | The speed advantage from choosing to ignore safety recommendations|
| p | NDArray | [0, 1]| | The probability that unsafe firms avoid an AI disaster|
| B | NDArray | B >> b| | The size of the prize from winning the AI development race|
| W | NDArray | $$[10, 10^6]$$| | The anticipated timeline until the development race has a winner if everyone behaves safely|
| pfo | NDArray | [0, 1]|Yes| The probability that firms who ignore safety precautions are found out|
| epsilon | NDArray | ϵ > 0|Yes| The cost of setting up a voluntary commitment|
| ω | NDArray | [0, 1]|Yes| Noise in arranging an agreement, with some probability they fail to succeed in making an agreement|

In [None]:
show_doc(ModelTypeDSAIR)

---

[source](https://github.com/PaoloBova/gh-pages-example/blob/main/gh_pages_example/types.py#LNone){target="_blank" style="float:right; font-size:smaller"}

### ModelTypeDSAIR

>      ModelTypeDSAIR (b:gh_pages_example.types.Array1D,
>                      c:gh_pages_example.types.Array1D,
>                      s:gh_pages_example.types.Array1D,
>                      p:gh_pages_example.types.Array1D,
>                      B:gh_pages_example.types.Array1D,
>                      W:gh_pages_example.types.Array1D,
>                      pfo:gh_pages_example.types.Array1D=None,
>                      α:gh_pages_example.types.Array1D=None,
>                      γ:gh_pages_example.types.Array1D=None,
>                      epsilon:gh_pages_example.types.Array1D=None,
>                      ω:gh_pages_example.types.Array1D=None,
>                      collective_risk:gh_pages_example.types.Array1D=None)

This is the schema for the inputs to a DSAIR model.

Note: This schema is not enforced and is here purely for documentation
purposes.

|    | **Type** | **Default** | **Details** |
| -- | -------- | ----------- | ----------- |
| b | Array1D |  | benefit: The size of the per round benefit of leading the AI development race, b>0 |
| c | Array1D |  | cost: The cost of implementing safety recommendations per round, c>0 |
| s | Array1D |  | speed: The speed advantage from choosing to ignore safety recommendations, s>1 |
| p | Array1D |  | avoid_risk: The probability that unsafe firms avoid an AI disaster, p ∈ [0, 1] |
| B | Array1D |  | prize: The size of the prize from winning the AI development race, B>>b |
| W | Array1D |  | timeline: The anticipated timeline until the development race has a winner if everyone behaves safely, W ∈ [10, 10**6] |
| pfo | Array1D | None | detection risk: The probability that firms who ignore safety precautions are found out, pfo ∈ [0, 1] |
| α | Array1D | None | the cost of rewarding/punishing a peer |
| γ | Array1D | None | the effect of a reward/punishment on a developer's speed |
| epsilon | Array1D | None | commitment_cost: The cost of setting up and maintaining a voluntary commitment, ϵ > 0 |
| ω | Array1D | None | noise: Noise in arranging an agreement, with some probability they fail to succeed in making an agreement, ω ∈ [0, 1] |
| collective_risk | Array1D | None | The likelihood that a disaster affects all actors |

In [None]:
show_doc(Array1D)

---

[source](https://github.com/PaoloBova/gh-pages-example/blob/main/gh_pages_example/types.py#LNone){target="_blank" style="float:right; font-size:smaller"}

### Array1D

>      Array1D (ModelVector:nptyping.base_meta_classes.NDArray)

An alias for a 1D numpy array.

|    | **Type** | **Details** |
| -- | -------- | ----------- |
| ModelVector | NDArray | A 1D numpy array suitable for stacks of scalar parameter values |

In [None]:
#| export
valid_dtypes = typing.Union[float, list[float], np.ndarray, dict]
def build_DSAIR(b:valid_dtypes=4, # benefit: The size of the per round benefit of leading the AI development race, b>0
                c:valid_dtypes=1, # cost: The cost of implementing safety recommendations per round, c>0
                s:valid_dtypes={"start":1, # speed: The speed advantage from choosing to ignore safety recommendations, s>1
                                "stop":5.1,
                                "step":0.1}, 
                p:valid_dtypes={"start":0, # avoid_risk: The probability that unsafe firms avoid an AI disaster, p ∈ [0, 1]
                                "stop":1.02,
                                "step":0.02}, 
                B:valid_dtypes=10**4, # prize: The size of the prize from winning the AI development race, B>>b
                W:valid_dtypes=100, # timeline: The anticipated timeline until the development race has a winner if everyone behaves safely, W ∈ [10, 10**6]
                pfo:valid_dtypes=0, # detection risk: The probability that firms who ignore safety precautions are found out, pfo ∈ [0, 1]
                α:valid_dtypes=0, # the cost of rewarding/punishing a peer
                γ:valid_dtypes=0, # the effect of a reward/punishment on a developer's speed
                epsilon:valid_dtypes=0, # commitment_cost: The cost of setting up and maintaining a voluntary commitment, ϵ > 0
                ω:valid_dtypes=0, # noise: Noise in arranging an agreement, with some probability they fail to succeed in making an agreement, ω ∈ [0, 1]
                collective_risk:valid_dtypes=0, # The likelihood that a disaster affects all actors
                β:valid_dtypes=0.01, # learning_rate: the rate at which players imitate each other
                Z:int=100, # population_size: the number of players in the evolutionary game
                strategy_set:list[str]=["AS", "AU"], # the set of available strategies
                exclude_args:list[str]=['Z', 'strategy_set'], # a list of arguments that should be returned as they are
                override:bool=False, # whether to build the grid if it is very large
                drop_args:list[str]=['override', 'exclude_args', 'drop_args'], # a list of arguments to drop from the final result
               ) -> dict: # A dictionary containing items from `ModelTypeDSAIR` and `ModelTypeEGT`
    """Initialise baseline DSAIR models for all combinations of the provided
    parameter valules. By default, we create models for replicating Figure 1
    of Han et al. 2021."""
    
    saved_args = locals()
    models = model_builder(saved_args,
                           exclude_args=exclude_args,
                           override=override,
                           drop_args=drop_args)
    return models

## DSAIR Payoff Matrix (Short Run)

| Strategy | Safe | Unsafe |
|----------|---|---|
| **Safe** | $$\frac{b}{2} - c$$|  $$\frac{b}{s+1} - c$$ |
| **Unsafe** | $$b \frac{s}{s+1}$$| $$\frac{b}{2} $$|

In [None]:
#| export
def payoffs_sr(models:dict, # A dictionary containing the items in `ModelTypeDSAIR`
              ) -> dict : # The `models` dictionary with added payoff matrix `payoffs_sr`
    """The short run payoffs for the DSAIR game."""
    s, b, c = [models[k] for k in ['s', 'b', 'c']]
    πAA = -c + b/2
    πAB = -c + b/(s+1)
    πBA = s*b/(s+1)
    πBB = b/2
    
    # Promote all stacks to 3D arrays
    πAA = πAA[:, None, None]
    πAB = πAB[:, None, None]
    πBA = πBA[:, None, None]
    πBB = πBB[:, None, None]
    matrix = np.block([[πAA, πAB], 
                       [πBA, πBB]])
    return {**models, 'payoffs_sr':matrix}

In [None]:
show_doc(payoffs_sr)

---

[source](https://github.com/PaoloBova/gh-pages-example/blob/main/gh_pages_example/payoffs.py#L55){target="_blank" style="float:right; font-size:smaller"}

### payoffs_sr

>      payoffs_sr (models:dict)

The short run payoffs for the DSAIR game.

|    | **Type** | **Details** |
| -- | -------- | ----------- |
| models | dict | A dictionary containing the items in `ModelTypeDSAIR` |
| **Returns** | **dict** | **The `models` dictionary with added payoff matrix `payoffs_sr`** |

## DSAIR Payoff Matrix (Short Run) with probability of being found out

| Strategy | Safe | Unsafe |
|----------|---|---|
| **Safe** | $$\frac{b}{2} - c$$|  $$(1 - p_{fo}) \frac{b}{s+1} + p_{fo} b - c$$ |
| **Unsafe** | $$ (1 - p_{fo}) b \frac{s}{s+1}$$| $$(1 - p_{fo}^2) \frac{b}{2} $$|

In [None]:
#| export

def payoffs_sr_pfo_extension(models):
    """The short run payoffs for the DSAIR game with a chance of unsafe
    behaviour being spotted."""
    s, b, c, pfo = [models[k] for k in ['s', 'b', 'c', 'pfo']]
    πAA = -c + b/2
    πAB = -c + b/(s+1) * (1 - pfo) + pfo * b
    πBA = (1 - pfo) * s * b / (s+1)
    πBB = (1 - pfo**2) * b/2
    
    # Promote all stacks to 3D arrays
    πAA = πAA[:, None, None]
    πAB = πAB[:, None, None]
    πBA = πBA[:, None, None]
    πBB = πBB[:, None, None]
    matrix = np.block([[πAA, πAB],
                       [πBA, πBB]])
    return {**models, 'payoffs_sr':matrix}

## DSAIR Payoff Matrix (Long Run)

Denote $\pi$ as one of the short run payoff matrices discussed above with rows and columns indexed by letters A, B, ...

| Strategy | Always Safe | Always Unsafe |
|----------|---|---|
| **Always Safe** | $$πAA + \frac{B}{2W}$$|  $$πAB$$ |
| **Always Unsafe** | $$p \, (s \frac{B}{W} + πBA)$$| $$p \, (s \frac{B}{2W} + πBB)$$|

*Note: In a model where we suffer a collective risk of an AI disaster if the winner is unsafe, payoffs for firms who play safe when facing an unsafe firm are also multiplied by $p$.*

In [None]:
#| export

def payoffs_lr(models:dict, # A dictionary containing the items in `ModelTypeDSAIR`
              ) -> dict : # The `models` dictionary with added payoff matrix `payoffs`
    """The long run average payoffs for the DSAIR game."""
    # All 1D arrays must be promoted to 3D Arrays for broadcasting
    s, p, B, W = [models[k][:, None, None]
                  for k in ['s', 'p', 'B', 'W']]
    πAA,πAB,πBA,πBB = [models['payoffs_sr'][:, i:i+1, j:j+1]
                       for i in range(2) for j in range(2)]    
    πAA = πAA + B/(2*W)
    πAB = πAB
    πBA = p*(s*B/W + πBA)
    πBB = p*(s*B/(2*W) + πBB)
    payoffs = np.block([[πAA, πAB],
                        [πBA, πBB]])
    return {**models, 'payoffs': payoffs}

## DSAIR Payoff Matrix with punishments (Long Run)

Denote $\pi$ as one of the short run payoff matrices discussed above with rows and columns indexed by letters A, B, ...

**Always Safe** and **Always Unsafe** play as they usually do.

**Punish Unsafe** always plays Safe. However, they will pay a cost to punish their co-player if the co-player plays Unsafe.


| Strategy | Always Safe | Always Unsafe | Punish Unsafe |
|----------|---|---|---|
| **Always Safe** | $$πAA + \frac{B}{2W}$$|  $$πAB$$ | $$πAA + \frac{B}{2W}$$ |
| **Always Unsafe** | $$p \, (s \frac{B}{W} + πBA)$$| $$p \, (s \frac{B}{2W} + πBB)$$| punished_payoff|
| **Punish Unsafe** | $$πAA + \frac{B}{2W}$$| sanctioner_payoff | $$πAA + \frac{B}{2W}$$ |

In [None]:
#| export

def punished_and_sanctioned_payoffs(models:dict, # A dictionary containing the items in `ModelTypeDSAIR`
                                   ) -> dict : # The `models` dictionary with added payoff matrix `payoffs`:
    """Compute the payoffs for the punished and sanctioner players in a DSAIR
    model with peer punishment."""
    # All 1D arrays must be promoted to 3D Arrays for broadcasting
    s,b,c, p, B, W, pfo = [models[k][:, None, None]
                      for k in ['s', 'b', 'c', 'p', 'B', 'W', 'pfo']]
    α, γ = [models[k][:, None, None] for k in ['α', 'γ']]
    πAA,πAB,πBA,πBB = [models['payoffs_sr'][:, i:i+1, j:j+1]
                       for i in range(2) for j in range(2)]
    
    s_punished = s - γ
    s_sanctioner = 1 - α
    sum_of_speeds = np.maximum(1e-20, s_punished + s_sanctioner)
    punished_wins = (s_punished > 0) & (((W-s)*np.maximum(0, s_sanctioner))
                                        <= ((W-1) * s_punished))
    punished_draws = (s_punished > 0) & (((W-s) * s_sanctioner)
                                         == ((W-1) * s_punished))
    sanctioner_wins = (s_sanctioner > 0) & (((W-s) * s_sanctioner)
                                            >= ((W-1)*np.maximum(0,s_punished)))
    no_winner = (s_punished <= 0) & (s_sanctioner <= 0)

    both_speeds_positive = (s_punished > 0) & (s_sanctioner > 0)
    only_sanctioner_speed_positive = (s_punished <= 0) & (s_sanctioner > 0)
    only_punisher_speed_positive = (s_punished > 0) & (s_sanctioner <= 0)

    p_loss = np.where(punished_wins | punished_draws, p, 1)
    R = np.where(no_winner,
                 1e50,
                 1 + np.minimum((W-s)/ np.maximum(s_punished, 1e-10),
                                (W-1)/ np.maximum(s_sanctioner, 1e-10)))
    B_s = np.where(sanctioner_wins, B, np.where(punished_draws, B/2, 0))
    B_p = np.where(punished_wins, B, np.where(punished_draws, B/2, 0))
    b_s = np.where(both_speeds_positive,
                   (1-pfo) * b * s_sanctioner / sum_of_speeds + pfo * b,
                   np.where(only_sanctioner_speed_positive, b, 0))
    b_p = np.where(both_speeds_positive,
                   (1-pfo) * b * s_punished / sum_of_speeds,
                   np.where(only_punisher_speed_positive, (1 - pfo)*b, 0))
    sanctioner_payoff = (1 / R) * (πAB + B_s - (b_s - c)) + (b_s - c)
    # sanctioner_payoff = (1 / R) * (πAB + B_s + (R-1)*(b_s - c))
    punished_payoff = (p_loss / R) * (πBA + B_p - b_p) + p_loss * b_p
    # punished_payoff = (p_loss / R) * (πBA + B_p + (R-1)*b_p)
    return {**models,
            'sanctioner_payoff':sanctioner_payoff,
            'punished_payoff':punished_payoff}

Below, I test that we produce expected results for the punished and sanctioned payoffs. 

In [None]:
models = build_DSAIR(b=4,
                     c=1,
                     p=0.25,
                     s=1.5,
                     B=10**4,
                     W=10**2,
                     pfo=0,
                     α=np.array([0]),
                     γ=np.array([0]),
                     β=0.01,
                     Z=100,
                     strategy_set=["AS", "AU", "PS"],
                     collective_risk=0)

results = thread_macro(models,
                       payoffs_sr,
                       punished_and_sanctioned_payoffs)

expected_result = (1/4 * 3 / 200 * (12/5 + 10**4 + 197/3 * 12/5))
test_eq(results['punished_payoff'], expected_result)

In [None]:
models = build_DSAIR(b=4,
                     c=1,
                     p=0.25,
                     s=1.5,
                     B=10**4,
                     W=10**2,
                     pfo=0,
                     α= np.arange(0, 3, 0.1),
                     γ= np.arange(0, 3, 0.1),
                     β=0.01,
                     Z=100,
                     strategy_set=["AS", "AU", "PS"],
                     collective_risk=0)

results = thread_macro(models,
                       payoffs_sr,
                       punished_and_sanctioned_payoffs)

In [None]:
def expected_fn1(α, γ):
    p_punish = np.where((3/2 - γ) * (100 - 1) > (1 - α) * (100 - 3/2),
                        1/4,
                        1)
    origin_speed = np.where((3/2 - γ) * (100 - 1) > (1 - α) * (100 - 3/2),
                         3/2, 
                         1)
    win_speed = np.where((3/2 - γ) * (100 - 1) > (1 - α) * (100 - 3/2),
                         (3/2 - γ), 
                         (1 - α))
    Bp = np.where((3/2 - γ) * (100 - 1) > (1 - α) * (100 - 3/2),
                  10**4,
                  np.where((3/2 - γ) * (100 - 1) == (1 - α) * (100 - 3/2),
                           10**4 / 2,
                           0))
    sum_of_speeds = np.maximum(1e-20, (3/2 - γ) + (1 - α))
    b_p = np.where((3/2 > γ) & (1 > α),
                   4 * (3/2 - γ) / sum_of_speeds,
                   np.where((3/2 > γ),
                            4,
                            0))
    R_inv = (np.maximum(0, win_speed) 
                          / (100 - origin_speed + np.maximum(0, win_speed)))
    punished_payoff = (p_punish * R_inv * (12/5 + Bp)
                       + p_punish * b_p
                       - p_punish * b_p * R_inv
                      )
    return punished_payoff

In [None]:
test_close(results['punished_payoff'][:, 0, 0],
           expected_fn1(results['α'], results['γ']))

In [None]:
def expected_fn2(α, γ):
    origin_speed = np.where((3/2 - γ) * (100 - 1) > (1 - α) * (100 - 3/2),
                         3/2, 
                         1)
    win_speed = np.where((3/2 - γ) * (100 - 1) > (1 - α) * (100 - 3/2),
                         (3/2 - γ), 
                         (1 - α))
    Bs = np.where((3/2 - γ) * (100 - 1) < (1 - α) * (100 - 3/2),
                  10**4,
                  np.where((3/2 - γ) * (100 - 1) == (1 - α) * (100 - 3/2),
                           10**4 / 2,
                           0))
    sum_of_speeds = np.maximum(1e-20, (3/2 - γ) + (1 - α))
    b_s = np.where((3/2 > γ) & (1 > α),
                   4 * (1 - α) / sum_of_speeds,
                   np.where((1 > α),
                            4,
                            0))
    R_inv = (np.maximum(0, win_speed) 
             / (100 - origin_speed + np.maximum(0, win_speed)))
    punished_payoff = (R_inv * (3/5 + Bs)
                       + (b_s - 1)
                       - (b_s - 1) * R_inv
                      )
    return punished_payoff

In [None]:
test_close(results['sanctioner_payoff'][:, 0, 0],
           expected_fn2(results['α'], results['γ']))

In [None]:
#| export
def payoffs_lr_peer_punishment(models:dict, # A dictionary containing the items in `ModelTypeDSAIR`
              ) -> dict : # The `models` dictionary with added payoff matrix `payoffs`:
    """The long run average payoffs for the DSAIR game with peer punishment."""
    # All 1D arrays must be promoted to 3D Arrays for broadcasting
    s,b,c, p, B, W = [models[k][:, None, None]
                      for k in ['s', 'b', 'c', 'p', 'B', 'W']]
    α, γ = [models[k][:, None, None] for k in ['α', 'γ']]
    πAA,πAB,πBA,πBB = [models['payoffs_sr'][:, i:i+1, j:j+1]
                       for i in range(2) for j in range(2)]
    models = punished_and_sanctioned_payoffs(models)
    
    ΠAA = πAA + B/(2*W)
    ΠAB = πAB
    ΠAC = πAA + B/(2*W)
    ΠBA = p*(s*B/W + πBA)
    ΠBB = p*(s*B/(2*W) + πBB)
    ΠBC = models["punished_payoff"]
    ΠCA = πAA + B/(2*W)
    ΠCB = models["sanctioner_payoff"]
    ΠCC = πAA + B/(2*W)
    matrix = np.block([[ΠAA, ΠAB, ΠAC], 
                       [ΠBA, ΠBB, ΠBC],
                       [ΠCA, ΠCB, ΠCC],
                       ])
    return {**models, 'payoffs':matrix}

### Expressions for the sanctioner and punished payoffs

For convenience we denote a number of new variables to simplify the expressions for the sanctioner and punished payoffs.

\begin{equation}
\text{sanctioner payoff} = \frac{1}{R} (\pi AB + B_s + (R-1) (b_s - c))\\
\end{equation}

\begin{equation}
\text{punished payoff} = \frac{p_{punish}}{R} (πBA + B_p + (R-1) b_p)\\
\end{equation}

*Note: In a model where we suffer a collective risk of an AI disaster if the winner is unsafe, payoffs for firms who play safe when facing an unsafe firm are also multiplied by $p_{punish}$.*

We can read the above payoffs as telling us the average payoffs over the R rounds of the race for each firm, assuming the punishment is levied at the end of the first round and the remaining $R - 1$ rounds are played with the punishment in effect.

Note that $s_{\beta}$ denotes the new speed of the firm who is punished and $s_{\alpha}$ as the speed of the firm who levies the punishment.

Below we denote the four possible outcomes (ignoring disaster) of a race between a sanctioner and a punished firm:

\begin{equation}
\text{punished wins} = (s_{\beta} > 0) \, \& \, (\frac{W-s}{s_{\beta}} <= \frac{W-1}{s_{\alpha}})
\end{equation}

\begin{equation}
\text{sanctioner wins} = (s_{\alpha} > 0) \, \& \, (\frac{W-1}{s_{\alpha}} <= \frac{W-s}{s_{\beta}})
\end{equation}

\begin{equation}
\text{draw} = (s_{\beta} > 0) \, \& \, (\frac{W-s}{s_{\beta}} = \frac{W-1}{s_{\alpha}})
\end{equation}

\begin{equation}
\text{no winner} = (s_{\beta} <= 0)  \, \& \,  (s_{\alpha} <= 0)
\end{equation}

We can use the above expressions to define the following variables:

$p_{punish}$ is the probability of avoiding an AI disaster if a punishment is levied and depends on who wins the race.

\begin{equation}
p_{punish} = \begin{cases} 0 & \text{sanctioner wins | no winner} \\
p & otherwise
\end{cases}
\end{equation}

$R$ is the number of rounds that the race lasts for; the race ends when the first firm reaches the finish line.

\begin{equation}
R = \begin{cases} \infty & \text{no winner} \\
\frac{W - 1}{s_{\alpha}} & \text{sanctioner wins} \\
\frac{W - s}{s_{\beta}} & \text{punished wins | draw} \\
\end{cases}
\end{equation}

$B_s$ is the prize that the sanctioner receives at the end of the race.

\begin{equation}
B_s = \begin{cases} B & \text{sanctioner wins} \\
\frac{B}{2} & \text{draw} \\
0 & otherwise \\
\end{cases}
\end{equation}

$B_p$ is the prize that the punished receives at the end of the race.

\begin{equation}
B_p = \begin{cases} B & \text{punished wins} \\
\frac{B}{2} & \text{draw} \\
0 & otherwise \\
\end{cases}
\end{equation}

$b_s$ is the benefit the sanctioner receives each round, they only gain a benefit if their speed is positive but gain the whole benefit if they are the only firm with positive speed.

\begin{equation}
b_s = \begin{cases} p_{fo} b + (1-p_{fo}) b \frac{s_{\alpha}}{s_{\alpha} + s_{\beta}} & s_{\alpha}, s_{\beta} > 0\\
b & s_{\alpha} > 0 >= s_{\beta} \\
0 & s_{\alpha} <= 0 \\
\end{cases}
\end{equation}

$b_p$ is the benefit the punished receives each round, they only gain a benefit if their speed is positive but gain the whole benefit if they are the only firm with positive speed.

\begin{equation}
b_p = \begin{cases} (1-p_{fo}) b \frac{s_{\beta}}{s_{\alpha} + s_{\beta}} & s_{\alpha}, s_{\beta} > 0\\
b & s_{\beta} > 0 >= s_{\alpha} \\
0 & s_{\beta} <= 0 \\
\end{cases}
\end{equation}

## DSAIR Payoff Matrix with rewards (Long Run)

Denote $\pi$ as one of the short run payoff matrices discussed above with rows and columns indexed by letters A, B, ...

**Always Safe** and **Always Unsafe** play as they usually do.

**Reward Safe** always plays Safe. However, they will pay a cost to reward their co-player if the co-player plays Safe.

| Strategy | Always Safe | Always Unsafe | Reward Safe |
|----------|---|---|---|
| **Always Safe** | $$πAA + \frac{B}{2W}$$|  $$πAB$$ | $$πAA + \frac{B (1 + s_{\beta})}{W}$$ |
| **Always Unsafe** | $$p \, (s \frac{B}{W} + πBA)$$| $$p \, (s \frac{B}{2W} + πBB)$$| $$p \, (s \frac{B}{W} + πBA)$$|
| **Reward Safe** | $$ πAA $$| $$ πAB $$| $$πAA + \frac{B (1 + s_{\beta} - s_{\alpha})}{2W}$$ |

In [None]:
#| export

def payoffs_lr_peer_reward(models:dict, # A dictionary containing the items in `ModelTypeDSAIR`
              ) -> dict : # The `models` dictionary with added payoff matrix `payoffs`:
    """The long run average payoffs for the DSAIR game with peer punishment."""
    # All 1D arrays must be promoted to 3D Arrays for broadcasting
    s,b,c, p, B, W = [models[k][:, None, None]
                      for k in ['s', 'b', 'c', 'p', 'B', 'W']]
    α, γ = [models[k][:, None, None] for k in ['α', 'γ']]
    πAA,πAB,πBA,πBB = [models['payoffs_sr'][:, i:i+1, j:j+1]
                       for i in range(2) for j in range(2)]
    
    s_rewarded = 1 + γ
    s_helper = np.maximum(0, 1 - α)
    s_colaborative = np.maximum(0, 1 + γ - α)
    ΠAA = πAA + B/(2*W)
    ΠAB = πBA
    ΠAC = πAA + B * s_rewarded / W
    ΠBA = p*(s*B/W + πBA)
    ΠBB = p*(s*B/(2*W) + πBB)
    ΠBC = p*(s*B/W + πBA)
    ΠCA = πAA
    ΠCB = πAB
    ΠCC = πAA + B * s_colaborative/(2*W)
    matrix = np.block([[ΠAA, ΠAB, ΠAC], 
                       [ΠBA, ΠBB, ΠBC],
                       [ΠCA, ΠCB, ΠCC],
                       ])
    return {**models, 'payoffs':matrix}

## DSAIR Payoff Matrix with voluntary commitments (Long Run)

Denote $\pi$ as one of the short run payoff matrices discussed above with rows and columns indexed by letters A, B, ...

The strategies below are less obvious than in earlier models. **Always Safe Out** and **Always Unsafe Out** are the same strategies we are used to.

**Always Safe In** is willing to form a commitment to play Safe. Otherwise, they will always play Unsafe.

**Always Unsafe In** is willing to form a commitment but will violate it by always playing Unsafe. This way, they anticipate that they can encourage other firms to play safe and so pull ahead of them in the race.

**Punish Violator** is willing to form a commitment to play Safe. Otherwise, they will always play Unsafe. If the coparty to the commitment violates the commitment by playing Unsafe, then this player pays a cost to levy a punishment on the violator.

| Strategy| Always Safe Out | Always Unsafe Out |  Always Safe In | Always Unsafe In  | Punish Violator |
|----------|---|---|---|---|---|
| **Always Safe Out** | $$πAA + \frac{B}{2W}$$| $$πAB$$ | $$πAB$$ | $$πAB$$ | $$πAB$$|
| **Always Unsafe Out** | $$p \, (s \frac{B}{W} + πBA)$$| $$p \, (s \frac{B}{2W} + πBB)$$| $$p \, (s \frac{B}{2W} + πBB)$$| $$p \, (s \frac{B}{2W} + πBB)$$ | $$p \, (s \frac{B}{2W} + πBB)$$ |
| **Always Safe In** | $$p \, (s \frac{B}{W} + πBA)$$|  $$p \, (s \frac{B}{2W} + πBB)$$ | $$πAA + \frac{B}{2W} - \epsilon$$| $$πAB - \epsilon$$| $$πAA + \frac{B}{2W} - \epsilon$$ |
| **Always Unsafe In** | $$p \, (s \frac{B}{W} + πBA)$$| $$p \, (s \frac{B}{2W} + πBB)$$| $$p \, (s \frac{B}{W} + πBA) - \epsilon$$| $$p \, (s \frac{B}{2W} + πBB) - \epsilon$$ | punished_payoff - ϵ |
| **Punish Violator** | $$p \, (s \frac{B}{W} + πBA)$$| $$p \, (s \frac{B}{2W} + πBB)$$| $$πAA + \frac{B}{2W} - \epsilon$$ | sanctioner_payoff - ϵ| $$πAA + \frac{B}{2W} - \epsilon$$ |

The punished and sanctioner payoffs above are exactly the same as in the model with punishments above, so I do not repeat this here.

In [None]:
#| export

def payoffs_lr_voluntary(models:dict, # A dictionary containing the items in `ModelTypeDSAIR`
              ) -> dict : # The `models` dictionary with added payoff matrix `payoffs`:
    """The long run average payoffs for the DSAIR game with voluntary
    commitments."""
    # All 1D arrays must be promoted to 3D Arrays for broadcasting
    s,b,c, p, B, W = [models[k][:, None, None]
                      for k in ['s', 'b', 'c', 'p', 'B', 'W']]
    α, γ, ϵ = [models[k][:, None, None] for k in ['α', 'γ', 'epsilon']]
    πAA,πAB,πBA,πBB = [models['payoffs_sr'][:, i:i+1, j:j+1]
                       for i in range(2) for j in range(2)]
    models = punished_and_sanctioned_payoffs(models)
    
    ΠAA = πAA + B/(2*W)
    ΠAB = πAB
    ΠAC = πAB
    ΠAD = πAB
    ΠAE = πAB
    ΠBA = p*(s*B/W + πBA)
    ΠBB = p*(s*B/(2*W) + πBB)
    ΠBC = p*(s*B/(2*W) + πBB)
    ΠBD = p*(s*B/(2*W) + πBB)
    ΠBE = p*(s*B/(2*W) + πBB)
    ΠCA = p*(s*B/W + πBA)
    ΠCB = p*(s*B/(2*W) + πBB)
    ΠCC = πAA + B/(2*W) - ϵ
    ΠCD = πAB - ϵ
    ΠCE = πAA + B/(2*W) - ϵ
    ΠDA = p*(s*B/W + πBA)
    ΠDB = p*(s*B/(2*W) + πBB)
    ΠDC = p*(s*B/W + πBA) - ϵ
    ΠDD = p*(s*B/(2*W) + πBB) - ϵ
    ΠDE = models['punished_payoff'] - ϵ
    ΠEA = p*(s*B/W + πBA) - ϵ
    ΠEB = p*(s*B/(2*W) + πBB)
    ΠEC = πAA + B/(2*W) - ϵ
    ΠED = models['sanctioner_payoff'] - ϵ
    ΠEE = πAA + B/(2*W) - ϵ
    matrix = np.block([[ΠAA, ΠAB, ΠAC, ΠAD, ΠAE], 
                       [ΠBA, ΠBB, ΠBC, ΠBD, ΠBE],
                       [ΠCA, ΠCB, ΠCC, ΠCD, ΠCE],
                       [ΠDA, ΠDB, ΠDC, ΠDD, ΠDE],
                       [ΠEA, ΠEB, ΠEC, ΠED, ΠEE]
                       ])
    return {**models, 'payoffs':matrix}

## DSAIR Payoff Matrix (Long Run) with collective risk

Denote $\pi$ as one of the short run payoff matrices discussed above with rows and columns indexed by letters A, B, ...

| Strategy | Always Safe | Always Unsafe |
|----------|---|---|
| **Always Safe** | $$πAA + \frac{B}{2W}$$|  $$p \, πAB$$ |
| **Always Unsafe** | $$p \, (s \frac{B}{W} + πBA)$$| $$p^2 \, (s \frac{B}{2W} + πBB)$$|

In [None]:
def payoffs_lr_collective(models:dict, # A dictionary containing the items in `ModelTypeDSAIR`
              ) -> dict : # The `models` dictionary with added payoff matrix `payoffs`:
    """Long run average payoffs for the DSAIR model with collective risk."""
    # All 1D arrays must be promoted to 3D Arrays for broadcasting
    s,b,c, p, B, W = [models[k][:, None, None]
                      for k in ['s', 'b', 'c', 'p', 'B', 'W']]
    collective_risk = models["collective_risk"][:, None, None]
    πAA,πAB,πBA,πBB = [models['payoffs_sr'][:, i:i+1, j:j+1]
                       for i in range(2) for j in range(2)]
    πAA = πAA + B/(2*W)
    πAB = πAB * (1 - (1-p)*risk_shared)
    πBA = p*(s*B/W + πBA)
    πBB = p*(s*B/(2*W) + πBB) * (1 - (1-p)*risk_shared)
    matrix = np.block([[πAA, πAB],
                       [πBA, πBB]])
    return {**models, 'payoffs':matrix}

In [None]:
#| hide
import nbdev; nbdev.nbdev_export()