In [1]:
import imp

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sympy as sym
import axelrod as axl
import axelrod.interaction_utils as iu

import testzd as zd

C, D = axl.Action.C, axl.Action.D

In [2]:
parameters = imp.load_source('parameters', 'data/raw/parameters.py')

# Extortionate zero determinant.

In [1], given a match between 2 memory one strategies the concept of Zero Determinant strategies is introduced. It was showed that a player $p\in\mathbb{R}^4$ against a player $q\in\mathbb{R}^4$ could force a linear relationship between the scores.

Assuming the following:

- The utilities for player $p$: $S_x = (R, S, T, P)$ and for player $q$: $S_y = (R, T, S, P)$.
- The normalised long run score for player $p$: $s_x$ and for player $q$: $s_y$.
- Given $p=(p_1, p_2, p_3, p_4)$ a transformed (but equivalent) vector: $\tilde p=(p_1 - 1, p_2 - 1, p_3, p_4)$, similarly: $\tilde q=(1 - q_1, 1 - q_2, q_3, q_4)$

The main result of [1] is that:

if $\tilde p = \alpha S_x + \beta S_y + \gamma 1$ **or** if $\tilde q = \alpha S_x + \beta S_y + \gamma 1$ then:

$$
\alpha s_x + \beta s_y + \gamma 1 = 0
$$

where $\alpha, \beta, \gamma \in \mathbb{R}$

As an example consider the `extort-2` strategy defined in [2]. This is given by:

$$p=(8/9, 1/2, 1/3, 0)$$

Let us use the `Axelrod` library [4, 5] to simulate some matches, here it is against some of the best strategies in the Axelrod library:

In [3]:
extort2 = axl.ZDExtort2()
players = (extort2, axl.EvolvedFSM16())
axl.seed(0)
match = axl.Match(players, turns=parameters.TURNS)
interactions = match.play()
scores = match.final_score_per_turn()
np.round((scores[0] - 1) / (scores[1] - 1), 3)

1.998

In [4]:
players = (extort2, axl.EvolvedANN5())
axl.seed(0)
match = axl.Match(players, turns=parameters.TURNS)
interactions = match.play()
scores = match.final_score_per_turn()
np.round((scores[0] - 1) / (scores[1] - 1), 3)

2.0

In [5]:
players = (extort2, axl.PSOGamblerMem1())
axl.seed(0)
match = axl.Match(players, turns=parameters.TURNS)
interactions = match.play()
scores = match.final_score_per_turn()
np.round((scores[0] - 1) / (scores[1] - 1), 3)

2.051

In [6]:
players = (extort2, extort2)
axl.seed(0)
match = axl.Match(players, turns=parameters.TURNS)
interactions = match.play()
scores = match.final_score_per_turn()
(scores[0] - 1) / (scores[1] - 1)

1.0

We see that `extort2` beats all these strategies but gets a low score against itself.

In [1], in fact a specific type of Zero determinant strategy is considered, indeed if: $\gamma=-(\alpha + \beta)P$ then the relationship $\chi = S_X / S_Y$ holds where $\chi = \frac{-\beta}{\alpha}$ so that the $S_X - P$ will be at $\chi$ times bigger than $S_Y - P$ as long as $\chi > 1$. We can obtain a simple linear equation and an inequality that checks if a strategy is of this form:
$$

In [7]:
p = np.array([8 / 9, 1 / 2, 1 / 3, 0])
zd.is_ZD(p)

True

In [8]:
np.round(p, 3)

array([0.889, 0.5  , 0.333, 0.   ])

Note however that even if there is a slight measurement error then these equations will fail:

In [9]:
np.random.seed(0)
approximate_p = p + 10 ** -5 * np.random.random(4)
np.round(np.max(np.abs(p - approximate_p)), 3)

0.0

In [10]:
zd.is_ZD(approximate_p),

(False,)

Thus, this work proposes a statistical approach for recognising extortionate behaviour. This uses a least squares minimisation approach for the underlying linear algebraic problem being solved.

In [11]:
x, SSError = zd.compute_least_squares(approximate_p)
alpha, beta = x
chi = -beta / alpha
np.round(chi, 3)

2.0

We see that in the case of an approximation of `extort2` we recover the value of $\chi=2$ (to the fifth decimal place).

The value that is in fact being minised is called: $\text{SSError}$. This in fact gives us a measure of how far from being an extortionate strategy a given strategy vector $p$ is.

While all strategies are not necessarily memory one: so do not necessarily have a representation as a 4 dimensional vector. There transition rates from all states to any action can still be measured.

Let us see how this works, using the 3 strategies above:

In [12]:
def get_p_from_interactions(interactions):
    vectors = []
    for state_counter in iu.compute_state_to_action_distribution(interactions):
        p = []
        for state in ((C, C), (C, D), (D, C), (D, D)):
            try:
                p.append(state_counter[(state, C)] / (state_counter[(state, C)]  + state_counter[(state, D)] ) )
            except ZeroDivisionError:
                p.append(np.NaN)
        vectors.append(p)
    return vectors

In [13]:
players = (extort2, axl.EvolvedFSM16())
axl.seed(0)
match = axl.Match(players, turns=parameters.TURNS)
interactions = match.play()
p = get_p_from_interactions(interactions=interactions)[1]

In [14]:
np.round(p, 3)

array([0.375, 0.472, 0.544, 0.517])

We can check how close this strategy is to being extortionate:

In [15]:
x, SSError = zd.compute_least_squares(p)
np.round(SSError, 3)

0.215

In [16]:
players = (extort2, axl.EvolvedANN5())
axl.seed(0)
match = axl.Match(players, turns=parameters.TURNS)
interactions = match.play()
p = get_p_from_interactions(interactions=interactions)[1]
x, SSError = zd.compute_least_squares(p)
SSError

nan

This strategy in fact does not visit all states so it is not possible to give a valid calculation:

In [17]:
p

[1.0, nan, 0.8, 0.0]

In [18]:
players = (extort2, axl.PSOGambler2_2_2())
axl.seed(0)
match = axl.Match(players, turns=parameters.TURNS)
interactions = match.play()
p = get_p_from_interactions(interactions=interactions)[1]
x, SSError = zd.compute_least_squares(p)
np.round(SSError, 3)

0.175

So it seems that the `PSOGambler2_2_2` is "more" extortionate than the other two. Note: it is certainly not an extortionate strategy as $p_4 > 0$:

In [19]:
np.round(p, 3)

array([0.105, 0.518, 0.002, 0.505])

We can actually classify all potential extortionate strategies which is Figure 1 of the paper.

The paper extends this work to consider a LARGE number of strategies, and identifies if and when strategies actually exhibit extortionate behaviour.

We note that the strategies that exhibit strong evolutionary fitness are ones that are able to adapt their behaviour: they do not extort strong strategies (thus cooperation evolves) but they do extort weaker ones. For example, here is a list of strategies against which `EvolvedFSM16` is close to being ZD ($\text{SSError} < 0.05$) and is close to being extortionate: ($p_4 < 0.05$):

In [20]:
for opponent in parameters.PLAYER_GROUPS["full"]:
    players = (axl.EvolvedFSM16(), opponent)
    axl.seed(0)
    match = axl.Match(players, turns=parameters.TURNS)
    interactions = match.play()
    p = get_p_from_interactions(interactions=interactions)[0]
    x, SSError = zd.compute_least_squares(p)
    if SSError < 0.05 and p[3] < 0.05:
        alpha, beta = x
        scores = match.final_score_per_turn()
        print(f"vs {opponent}, chi={round(-beta / alpha, 2)}, (S_X - 1)/(S_Y - 1)={round((scores[0] - 1) / (scores[1] - 1), 2)}")

vs AntiCycler, chi=-5.25, (S_X - 1)/(S_Y - 1)=-4.02
vs Arrogant QLearner, chi=-12.09, (S_X - 1)/(S_Y - 1)=-6.24
vs Bush Mosteller: 0.5, 0.5, 3.0, 0.5, chi=-2.79, (S_X - 1)/(S_Y - 1)=-5.75
vs Cautious QLearner, chi=-12.09, (S_X - 1)/(S_Y - 1)=-6.24
vs Colbert, chi=-8.44, (S_X - 1)/(S_Y - 1)=-4.15
vs Hesitant QLearner, chi=-12.09, (S_X - 1)/(S_Y - 1)=-6.24
vs Knowledgeable Worse and Worse, chi=-11.62, (S_X - 1)/(S_Y - 1)=-5.97
vs Prober 4, chi=-47.75, (S_X - 1)/(S_Y - 1)=1.25
vs Random: 0.5, chi=-3.02, (S_X - 1)/(S_Y - 1)=-4.29
vs Risky QLearner, chi=-12.09, (S_X - 1)/(S_Y - 1)=-6.24
vs Stochastic Cooperator, chi=-4.2, (S_X - 1)/(S_Y - 1)=-6.12
vs ThueMorseInverse, chi=-5.25, (S_X - 1)/(S_Y - 1)=-4.04
vs Tranquilizer, chi=9.25, (S_X - 1)/(S_Y - 1)=1.57
vs Tullock: 11, chi=3.11, (S_X - 1)/(S_Y - 1)=0.52
vs Worse and Worse, chi=-6.72, (S_X - 1)/(S_Y - 1)=-5.41
vs Worse and Worse 2, chi=-8.75, (S_X - 1)/(S_Y - 1)=0.62
vs ZD-Mem2, chi=-3.97, (S_X - 1)/(S_Y - 1)=-35.48


This work shows here that not only is there a mathematical basis for suspicion: the calculation of $\text{SSError}$ but that some high performing strategies seem to exhibit suspicious behaviour that allows them to adapt.

## References

[1] Press, William H., and Freeman J. Dyson. "Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent." Proceedings of the National Academy of Sciences 109.26 (2012): 10409-10413

[2] Stewart, Alexander J., and Joshua B. Plotkin. "Extortion and cooperation in the Prisoner’s Dilemma." Proceedings of the National Academy of Sciences 109.26 (2012): 10134-10135.

[3] Golub, Gene H., and Charles F. Van Loan. Matrix computations. Vol. 3. JHU Press, 2012.

[4] The Axelrod project developers. Axelrod: v4.2.0. 2016. http://doi.org/10.5281/zenodo.1252994

[5] Knight, Vincent, et al. "An Open Framework for the Reproducible Study of the Iterated Prisoner’s Dilemma." Journal of Open Research Software 4.1 (2016).