# Non-cooperative Game Theory

In non-cooperative game theory, we study games in which players cannot make binding agreements and so they cannot form coalitions. Non-cooperative games are also called *strategic games*.
Strategic games tipically describe situations of conflict between self-interested players, who, as a result of their actions (choices) want to maximise their own uitility.
In strategic games, players typically have different utilities and different incentives to choose their actions. 

When we consider strategic games, we are typically interested in answering the following questions:
- What is the optimal outcome?
- What is the a player's best response another player's action?
- Which course of action represents an equilibrium in a given game? 

## Definition of Normal Form games
Let's begin by formalizing games and defining what a normal form game is.
An $N$ player normal form game consists of
- A finite set of $N$ players
- Action set for the players $\{ A_1, A_2, \dots, A_N \}$ 
- A payoff (utility) function for each player $i \in N$: $u_i: A_1 \times A_2 \times \dots \times A_N \longrightarrow \mathbb{R}$

In this notebook we will limit our treatment to 2 player games. The concepts we are going to see can be extended to games of arbitrary number of players. However, going beyond two player games is hard and the best known algorithms to solve games have exponential complexity.
In fact, even for general 2 player games the best known algorithms to compute all the equilibria run in exponential time.


### Representing strategic 2 player games
The standard way to represent 2 player games is by representing each player payoff function with a matrix.
Let's consider the classic *prisoner dilemma* game defined by the following table:
$$
\begin{pmatrix}
-1, -1 & -4, 0 \\
0, -4, & -3, -3
\end{pmatrix}
$$
The standard way to represent this payoff is to use two matrices; the matrix $A$ describes the payoff of the row player and the matrix $B$ describes the payoff for the column player:
$$ A =
\begin{pmatrix}
-1 & -4 \\
0 & -3
\end{pmatrix}
\qquad
B = 
\begin{pmatrix}
-1 & 0 \\
-4, & -3
\end{pmatrix}
$$

In [88]:
import numpy as np
A = np.array([[-1, -4], [0, -3]])
B = np.array([[-1, 0], [-4, -3]])

print("A = \n", A)
print("B = \n", B)

A = 
 [[-1 -4]
 [ 0 -3]]
B = 
 [[-1  0]
 [-4 -3]]


## Best Response and Nash Equilibrium
If a player $i$ knew what all the other players would play, it would be easy to chose the best action.
Let $a_{-i} = \lang a_1, \dots, a_{i-1}, a_{i+1}, \dots, a_n \rang$ be the action profile of all the other players but $i$ and let $a = (a_{-i}, a_i)$ be the full action profile
> **Definition (Best response):**  </br>
$a_i^* \in \text{BR}(a_{-i})\; \text{iff}\; \forall a_i \in A_i,\; u_i(a_i^*, a_{-i}) \ge u_i(a_i, a_{-i})$

That is, the best response might not be unique, but an action $a_i^*$ is a player $i$'s best response to an opponents' action profile $a_{-i}$, if and only if the utility given by $a_i^*$ is at least as big as the utility provided by all the other actions available to player $i$.

The problem is that a player would not know what the other players would do. But this is not a big issue, since the notion of best response is actually used to define another notion, much more useful in practice: Nash equilibrium. </br>
The idea of Nash equilibrium is to look for stable action profiles. That is, sets of actions for each player such that, once the actions of all the other players are known, no player wants to change their action.

> **Definition (Nash Equilibrium)** </br>
The action profile $a = \lang a_1, \dots, a_n \rang$ is a *pure strategy Nash equilibrium* if and only if $\forall i\; a_i \in \text{BR}(a_{-i})$

## Nash equilibria in different games
In the following we analyse the possible Nash equilibria in the different types of games we often encounter when we analyse games between two players

### Nash equilibrium in the game "Prisoner's dilemma"
The prisoner dilemma is a game with the following structure:

|   | C   | D   |           
:---|:---:|:---:|           
| **C** | $b$, $b$ | $d$, $a$ | 
| **D** | $a$, $d$ | $c$, $c$ |

with $a > b > c > d$. **C** and **D** indicate the two possible actions for each player: Cooperate, and Defect. A player can cooperate with the other by not accusing the other player and can defect, instead, by accusing the other player.

The table below describes one specific instance of the prisoner dilemma:

|   | C   | D   |
:---|:---:|:---:|
| **C** |-1, -1 | -4,  0 |
| **D** | 0, -4 | -3, -3 |

If both prisoners cooperate, they get a light punishment. If they do not cooperate, they get a more severe punishment. If one cooperates and the other doesn't, then the cooperator gets the maximum punishment and the one that does not cooperate gets free.
In this game, the dominant strategy is to defect no matter what the other person does. Since the two prisoners don't know what the other prisoner will do, this strategy is the only one that allows them to reduce the maximum punishment. This strategy is the only pure strategy Nash equilibrium in this game.

### Nash equilibrium in games of pure coordination
A game of pure coordination is a game with the following structure:

|   | left   | right   |           
:---|:---:|:---:|           
| **left** | $a$, $a$ | 0, 0 | 
| **right** | 0, 0 | $a$, $a$ |

We can think of such a game as follows: the two players are walking towards each other in a road. If they both walk on their right or on their left, they avoid a collition and get a positive payoff. If they do not coordinate and walk one on their right and the other on their left, they will have a collision. This game has two pure strategy Nash equilibria: (**left**, **left**) and (**right**, **right**).

### Nash equilibrium in the game "Battle of the sexes"
The game battle of the sexes is a game that contains elements of both competition and cooperation. The two player version of such a game has the following normal form representation:

|   | A   | C   |           
:---|:---:|:---:|           
| **A** | 2, 1 | 0, 0 | 
| **C** | 0, 0 | 1, 2 |

This game describes the situation of a couple who wants to go to the cinema to watch a movie. The couple is considering two movies: the husband is heavily leaning towards an action movie (denoted with **A**); the wife is heavily leaning towards a comedy (denoted with **C**). More importantly, they want to go together. So, if they go to watch the movie **A** the husband gets a payoff of 2 and the wife gets a payoff of 1; conversely, if they go to watch the movie **C** the husband gets a payoff of 1 and the wife gets a payoff of 2. In case they decide to watch two different movies (i.e. the husband goes for the action movie and the wife goes for the comedy), they do not get any payoff. So, this game has two pure strategy Nash equilibria, where the best pure response is to match what the other player chooses. In this regards, the game is similar to a pure coordination game. However, there is an important difference between the battle of the sexes and a pure coordination game: the payoff is asymmetrical. If one of the players always gets it their way, the other player will loose quite a lot. In order to maximise both players' payoffs, **ideally**, the couple should organise in such a way to go to watch action movies 50% of the times and comedy the other 50% of the times. However, they have no way to do this if they do not know in advance what the other player will play.

The difference between games of pure coordination and games of battle of the sexes will become more evident when we will introduce *mixed strategies* and *mixed equilibria*.

### Nash equilibtium in the game "Matching Pennies"
The "Matching Pennies" game is a zero-sum game described by the following matrix:

|   | Head   | Tail   |           
:---|:---:|:---:|           
| **Head** | 1, -1 | -1, 1 | 
| **Tail** | -1, 1 | 1, -1 |

This is a zero-sum game. In fact, the sum of the payoffs in each cell of the matrix is 0.
In this game, the two players have to choose between head and tail. The row player wins if the the colulmn player chooses the same option chosen by the row player. Conversely, the column player wins if the players' choices are different.
This game does not have any *pure strategy* Nash equilibrium. This happens because there is no action profile such that both players would not change their action after knowing what the opponent has played. If the row player plays Head (respective Tail) the column player wants to play Tail (respective Head). But if the column player plays Head (respective Tail) then the row player wants to play Head (respective Tail). So, there is no *pure* choice here that would be stable for both players, no matter what the other player does.

However, if we ponder the "Matching Pennies" game we easily come up with the fact that since one player does not know what the other player plays, the best thing a player can do is to chose completely at random (in this case with probability 0.5) between Head and Tail. Such a strategy, which involves a probability distribution over the alternatives, is known as *mixed strategy*. We will see mixed strategies later on in this notebook.

## Testing wether a pure strategy is a Best Response and/or a Nash equilibrium
Let's now write a function that, given a 2-person game in normal form, determines whether a given pure strategy is a best response.
As we did before, we represent the game with two payoff matrices; one for each player. We call these matrices P1 and P2. The strategy will be given as a binary vector for each player. We call these vectors s1 and s2.
If a player selects the second strategy, then the second component of the vector is set to one and all the other components will be set to zero.
The function will check, for both players, if the strategy is the best response. As such, the function will return a tuple of two boolean values.

Once we can write such a function, we can check if a pure strategy is a pure Nash equilibrium easily.

In [89]:
def is_best_response(P1, P2, s1, s2):
    assert len(np.where(s1 == 1)[0]) == 1 and len(np.where(s2 == 1)[0]) == 1, "Players can choose only one action!"
    return (np.argmax(P1[:, np.where(s2 == 1)]) == np.where(s1 == 1)[0][0], np.argmax(P2[np.where(s1 == 1), :]) == np.where(s2 == 1)[0][0])

def is_nash_equilibrium(P1, P2, s1, s2):
    return all(is_best_response(P1, P2, s1, s2))

In [90]:
# The following game is a prisoner dilemma
A = np.array([[-1, -4], [0, -3]])
B = np.array([[-1, 0], [-4, -3]])

# Player one (the row player) plays "defect"; player two (the column player) plays "cooperate"
s_1 = np.array([0, 1])
s_2 = np.array([1, 0])

# It should print (True, False) since the best response for player two, given that player one has chosen "defect", is to defect.
print(is_best_response(A, B, s_1, s_2))

(True, False)


In [91]:
# The following game is a prisoner dilemma
A = np.array([[-1, -4], [0, -3]])
B = np.array([[-1, 0], [-4, -3]])

# The two players play the Nash equilibrium 
s_1 = np.array([0, 1])
s_2 = np.array([0, 1])

# It should print True
print(is_nash_equilibrium(A, B, s_1, s_2))

True


In [92]:
A = np.array([[2, 3], [1, 4]])
B = np.array([[1, 4], [3, 4]])

s_1 = np.array([1, 0])
s_2 = np.array([1, 0])

# It should print (True, False)
print(is_best_response(A, B, s_1, s_2))

(True, False)


In [93]:
A = np.array([[2, 3], [1, 4]])
B = np.array([[1, 4], [3, 4]])

s_1 = np.array([0, 1])
s_2 = np.array([0, 1])

# It should print True
print(is_nash_equilibrium(A, B, s_1, s_2))

True


## Mixed strategies and Nash equilibrium
Now that we have an understanding of the many tipologies of games we can encounter and we know that playing a deterministic action in a game such as the "Matching Pennies" is a pretty bad idea, let us formally define the concept of strategy.
> **Definition (Strategy):** </br>
> A strategy $s_i$ for a player $i$ is any probability distribution over the possible actions $A_i$
>
> It follows that:
> - *pure* strategies are strategies in which only one action is played with probability 1
> - *mixed* strategies are strategies in which more than one action is played with positive probability
>
> The actions involved in a strategy are called the *support* of the strategy.
>
> Finally, we denote the set of all strategies for the player $i$ with $S_i$ and the set of all strategy profiles as $S = S_1 \times \dots \times S_n$

### Utilities under mixed strategies
Under mixed strategies, if we want to know what is a player's utility for a given strategy profile $s \in S$, we can no longer read it in the payoff matrix; we need to compute the expected utility as follows:
<!-- $$
\begin{aligned}
&u_i(s) = \sum_{a \in A} u_i(a)\,\text{Pr}(a|s) \\
&\text{Pr}(a|s) = \prod_{j \in N} s_j(a_j)
\end{aligned}
$$ -->


$$
u_i(s) = \sum_{a \in A} u_i(a)\,\text{Pr}(a|s)
$$
where
$$
\text{Pr}(a|s) = \prod_{j \in N} s_j(a_j)
$$

That is, the expected utility of a player $i$ under a mixed strategy profile $s$ is the sum, over all the action profiles in the game, of $i$'s utility of the action profile times the probability that the action profile would happend under the strategy profile $s$. These probabilities are given by the product of each player's probability to play that specific action.

Intuitively, in a normal form game, this would be the sum of the value of each cell in player $i$'s payoff matrix, multiplied by its respective probability. As an example, consider the "Matching Pennies" payoff matrix for the row player:

$$
A = \begin{pmatrix}
1 & -1 \\
-1 & 1
\end{pmatrix}
$$

Suppose that both the row player and the column player are playing a mixed strategy $s_A = s_B = [0.5, 0.5]$. Each cell of $A$ has probability of being played equal to $0.5 \times 0.5 = 0.25$. The utility of the row player, given the action profile $s = [s_A, s_B] = [[0.5, 0.5], [0.5, 0.5]]$ is $0.25 - 0.25 -0.25 + 0.25 = 0$. 

## Best responses and mixed Nash equilibrium

Now we can extend the previous definitions of Best Response and Nash equilibrium to accommodate mixed strategies.

> **Definition (Best response):**  </br>
$s_i^* \in \text{BR}(s_{-i})\; \text{iff}\; \forall s_i \in S_i,\; u_i(s_i^*, s_{-i}) \ge u_i(s_i, s_{-i})$

That is, a strategy $s_i^*$ is a player $i$'s best response to an opponents' stratey profile $s_{-i}$, if and only if the utility given by $s_i^*$ is at least as big as the utility provided by all the other actions available to player $i$.

> **Definition (Nash Equilibrium)** </br>
The strategy profile $s = \lang s_1, \dots, s_n \rang$ is a *Nash equilibrium* if and only if $\forall i\; s_i \in \text{BR}(s_{-i})$

By introducing mixed strategies, we can find Nash equilibria for games such as "Matching Pennies" which doesn't have pure Nash equilibrium. In fact, one of the reasons for which John Nash won the Nobel prize for Economics is that he proved the following theorem.

> **Theorem (Nash 1950):** </br>
Every finite game (i.e. a game with finite number of players and finite number of actions) has a Nash equilibrium.

## Testing whether a mixed strategy is a Best Response and/or a Nash equilibrium
In a two player game $(A, B) \in \mathbb{R}^{m \times n^2}$ a strategy  $\sigma_r^*$ of the row player is a best response to a column player's strategy $\sigma_c$ if and only if:
$$
\sigma_r^* = \argmax_{\sigma_r \in S_1} \sigma_r A \sigma_c^T
$$
where $S_1$ denotes the space of all strategies for the row player and $A$ is the row player's payoff matrix.

Similarly, for the column player, a mixed strategy $\sigma_c^*$ is a best response to a row player's strategy $\sigma_r$ if and only if:
$$
\sigma_c^* = \argmax_{\sigma_c \in S_2} \sigma_r B \sigma_c^T
$$
where $S_2$ denotes the space of all strategies for the column player and $B$ is the column player's payoff matrix.

In reality, when we test whether a strategy is a best response in a two player game, we use the *general condition for a best response*: In a two player game $(A, B) \in \mathbb{R}^{m \times n^2}$ a strategy $\sigma_r^*$ of the row player is a best response to a column player's strategy $\sigma_c$ if and only if:
$$
\sigma^*_{r,i} > 0 \Longrightarrow (A \sigma_c^T)_i = \max_{k \in \mathcal{A}_2}(A \sigma_c^T)_k \forall i \in \mathcal{A_1}
$$
Likewise, a strategy $\sigma_c^*$ of the column player is a best response to a row player's strategy $\sigma_r$ if and only if:
$$
\sigma^*_{c,i} > 0 \Longrightarrow (\sigma_r B)_i = \max_{k \in \mathcal{A}_1}(\sigma_r B)_k \forall i \in \mathcal{A_2}
$$
Using this last formulation, let's implement a function (similar to the one we implemented above) to test whether a mixed strategy is a best response. Again, the function takes in input the two payoff matrices and the two strategies and outputs a tuple of bolean values.
Once we have implemented this function, a function to check whether a couple of strategies are a Nash equilibrium follows trivially.

In [94]:
def is_best_response(P1, P2, s1, s2):
    assert (s1.sum() == 1 and s2.sum() == 1), "Strategies need to be probability distributions (they have to sum up to one!)"

    # Consider the row player first
    util_P1 = (P1 @ s2.T).round(10)     # Rounding for numerical stability reasons
    max_utility = util_P1.max()
    is_row_best_strategy = (util_P1[s1 > 0] == max_utility).all()

    # Now we consider the column player
    util_P2 = (s1 @ P2).round(10)
    max_utility = util_P2.max()
    is_column_best_strategy = (util_P2[s2 > 0] == max_utility).all()

    return(is_row_best_strategy, is_column_best_strategy)

def is_nash_equilibrium(P1, P2, s1, s2):
    return all(is_best_response(P1, P2, s1, s2))

In [95]:
# Battle of the sexes
A = np.array([[2, 0], [0, 1]])
B = np.array([[1, 0], [0, 2]])

# The players decide to randomise and play 2/3 of the times the action associated with their maximum utility and 1/3 of the time the second best choice.
s_1 = np.array([2/3, 1/3])
s_2 = np.array([1/3, 2/3])

# It should print True
print(is_best_response(A, B, s_1, s_2))
print(is_nash_equilibrium(A, B, s_1, s_2))

(True, True)
True


In [96]:
# Battle of the sexes
A = np.array([[2, 0], [0, 1]])
B = np.array([[1, 0], [0, 2]])

# The row player does not best respond to the column player.
s_1 = np.array([2/3, 1/3])
s_2 = np.array([1/2, 1/2])

# It should print (False, True)
print(is_best_response(A, B, s_1, s_2))
# It should print False
print(is_nash_equilibrium(A, B, s_1, s_2))

(False, True)
False


In [97]:
# Note that the function to check if a mixed strategy is a best response works also for pure strategies!

# Battle of the sexes
A = np.array([[2, 0], [0, 1]])
B = np.array([[1, 0], [0, 2]])

# Both players coordinate and play the same strategy: this is an equilibrium in the battle of the sexes
s_1 = np.array([1, 0])
s_2 = np.array([1, 0])

print(is_nash_equilibrium(A, B, s_1, s_2))

True


In [98]:
# Battle of the sexes
A = np.array([[2, 0], [0, 1]])
B = np.array([[1, 0], [0, 2]])

# We try with an "illegal" strategy for player one (the probabilities do not sum up to one)
s_1 = np.array([0.8, 0.8])
s_2 = np.array([1, 0])

# This should throw an assertion error!
print(is_nash_equilibrium(A, B, s_1, s_2))

AssertionError: Strategies need to be probability distributions (they have to sum up to one!)

## Computing mixed Nash equilibria
Computing mixed Nash equilibria in general N-player games is surprisingly hard. All known algorithms exhibit exponential computational complexity in the worst case. Complexity analyses of the problem of finding Nash equilibria have concluded that finding Nash equilibria is PPAD complete. As such, we are not going to implement an algorithm to find a Nash equilibrium in a general game. Instead, in the following, we are going to implement algorithms to compute mixed equilibria in:
1. Two player zero-sum games, where we can compute a mixed Nash equilibrium in polynomial time by solving a linear program;
2. Two player games described by 2x2 matrices, where we can compute the mixed Nash equilibrium by solving a two linear equations.


### Computing mixed Nash equilibria in 2-player zero-sum games
A two player normal form game with payoff matrices $(A, B)$ is said to be a zero-sum game if and only if $A = -B$. As such, zero-sum games can be represented with just one matrix. Let's consider a zero-sum game with a payoff matrix $A \in \mathbb{R}^{m \times n}$ and a column player with strategy $y \in \mathbb{R}^n$, the row player is aiming to find a best response strategy $x \in \mathbb{R}^m$ which corresponds to:
$$
\max_{i \le m} (xAy^T)_i
$$
The column player, with therir choice of $y$ is able to define the upper-bound $v$ to $\max_{i \le m} (xAy^T)_i$. Since the game is zero-sum, the column player wants to choose a strategy $y$ such that $v$ is as low as possible. Thus, $\max_{i \le m} (xAy^T)_i = \min\{v \in \mathbb{R} | Ay^T \le \mathbf{\vec{1}}v\}$.
As such, the *minimax* strategy $y$ of the column player is a solution to the following linear program:
$$
\begin{aligned}
    &\min_{v, y} v \\
    \text{s.t.:} \\
    &Ay^T \le \mathbf{\vec{1}}v\\
    &y \in S_2
\end{aligned}
$$
Similarly, the *maximin* strategy x for the row player is given by a solution to the following linear program:
$$
\begin{aligned}
    &\max_{x, u} u \\
    \text{s.t.:} \\
    &xA \ge \mathbf{\vec{1}}u\\
    &x \in S_1
\end{aligned}
$$
John von Neumann, in 1928, proved that if there exists optimal values of

1. the max-min value $u$ and the max-min strategy $x$
2. the min-max value $v$ and the min-max strategy $y$

then $u = v$. This is know as the *minimax* theorem, and it's considered the starting point of game theory!

#### Reformulation of the linear program for zero-sum games
Given a row player payoff matrix with $m$ rows and $n$ columns $A \in \mathbb{R}^{m \times n}$, the mixed equilibrium for a zero-sum game can be computed with the following equivalent linear program:
$$
\begin{aligned}
&\min_{x \in \mathbb{R}^{(m+1) \times 1}} cx \\
\text{s.t.:}\\
&\begin{align*}
    M_{\text{ub}}x &\le b_{\text{ub}} \\
    M_{\text{eq}}x &= b_{\text{eq}} \\
    x_i & \ge 0 \qquad \forall i \le m
&\end{align*}
\end{aligned}
$$
where the coefficients are defined as follows:
$$
\begin{aligned}
    c &= (\underbrace{0, \dots, 0}_{m}, -1) && c\in\{0, 1\}^{1 \times (m + 1)}\\
    M_{\text{ub}} &= \begin{pmatrix}(-A^T)_{11}&\dots&(-A^T)_{1m}&1\\
                                    \vdots     &\ddots&\vdots           &1\\
                                    (-A^T)_{n1}&\dots&(-A^T)_{nm}&1\end{pmatrix} && M_{\text{ub}}\in\mathbb{R}^{n\times (m + 1)}\\
    b_{\text{ub}} &= (\underbrace{0, \dots, 0}_{n})^T && b_{\text{ub}}\in\{0\}^{n\times 1}\\
    M_{\text{eq}} &= (\underbrace{1, \dots, 1}_{m}, 0) && M_{\text{eq}}\in\{0, 1\}^{1\times(m + 1)}\\
    b_{\text{eq}} &= 1 \\
\end{aligned}
$$

Following the above reformulation, we can write a function to compute the mixed minimax strategy and Nash equilibrium for zero-num games. We use the *linprog* function from scipy.optimize


In [99]:
import scipy.optimize

def minimax_LP(payoff_matrix):
    nrows, ncols = payoff_matrix.shape

    #Setting the variables
    c = np.zeros(shape=(1, nrows + 1))
    c[0, -1] = -1
    M_ub = np.hstack((-payoff_matrix.T, np.ones(shape=(ncols, 1))))
    b_ub = np.zeros(shape=(ncols, 1))
    M_eq = np.ones(shape=(1, nrows + 1))
    M_eq[0, -1] = 0
    b_eq = 1

    # Setting the LP optimizer
    res = scipy.optimize.linprog(
        c=c,
        A_ub=M_ub,
        b_ub=b_ub,
        A_eq=M_eq,
        b_eq=b_eq,
    )
    return res.x[:-1]

In [100]:
# Rock paper scissor game
A = np.array([
    [0, 1, -1],
    [-1, 0, 1],
    [1, -1, 0]
])

# It should print [1/3, 1/3, 1/3]
print(minimax_LP(A))

[0.33333333 0.33333333 0.33333333]


In [101]:
# Matching Pennies game
A = np.array([
    [1, -1],
    [-1, 1]
])

# It should print [1/2, 1/2]
print(minimax_LP(A))

[0.5 0.5]


In [102]:
# Random Matrix game with values in [a, b]
np.random.seed(42)
N = 5
a = 1
b = 100
A = np.random.randint(low=a, high=b, size=(N,N))

# It should print [0, 0.56291391, 0.43708609, 0, 0] if you keep the random seed to 42
print(minimax_LP(A))

[0.         0.56291391 0.43708609 0.         0.        ]


### Computing mixed Nash equilibria in games described by $2 \times 2$ matrices
If we are dealing with non zero-sum games, things becomea bit more complicated and the minimax principle we have seen before does no longer apply. For $2 \times 2$ matrices, there still exist a simple algorithm we can use to compute a mixed Nash equilibrium. we are going to see and implement this aglorithm here. The idea we are going to see is called *indifference condition*. This idea can be generalised to implement algorithm to compute equilibra in generic games, although those algorithms are not efficient. 

Let's recall that a *support* is the set of pure strategies that receive positive probability under the mixed strategy of the players. In a $2 \times 2$ game, a support for a mixed strategy always involves all the 2 actions. As such, a strategy for the row player is a probability distribution $\sigma_r = (p, 1-p)$ over the two possible actions and, similarly, a strategy for the column player is a probability distribution $\sigma_c = (q, 1-q)$. Recalling the general condition for best response discussed previously, if $\sigma_r$ is a best response to $\sigma_c$, then $(A\sigma_c^T)_i = \text{max}_{k\in\{1, 2\}} (A\sigma_c^T)_k \text{ for all }i \in \{1, 2\}$. In other words, let $A$ be the payoff matrix of the row player, if $\sigma_r$ is a best response to $\sigma_c$, the following *indifference condition* must hold: $qA_{11} + (1-q)A_{12} = qA_{21} + (1-q)A_{22}$. From this indifference condition it follows that: 

$$
q = \frac{A_{22} - A_{12}}{A_{11} - A_{12} - A_{21} + A_{22}}
$$

Likewise, reversing the previous reasoning on the column player, which best responds to the row player, we have that $pB_{11} + (1-p)B_{21} = pB_{12} + (1-p)B_{22}$, which implies:

$$
p = \frac{B_{22} - B_{21}}{B_{11} - B_{12} - B_{21} + B_{22}}
$$

We can now implement a function to find a mixed Nash equilibrium in $2 \times 2$ games.

In [103]:
def mixed_nash_equilibrium_2x2(A, B):
    q = (A[1, 1] - A[0, 1])/(A[0, 0] - A[0, 1] - A[1, 0] + A[1, 1])
    p = (B[1, 1] - B[1, 0])/(B[0, 0] - B[0, 1] - B[1, 0] + B[1, 1])

    q_ = 1 - q
    p_ = 1 - p

    sigma_A = np.array([p, p_])
    sigma_B = np.array([q, q_])

    return (sigma_A, sigma_B)

In [104]:
# Battle of the sexes
A = np.array([[2,0],[0,1]])
B = np.array([[1,0],[0,2]])

s1,s2 = mixed_nash_equilibrium_2x2(A, B)
# It should print [2/3, 1/3] [1/3, 2/3]
print(s1, s2)
is_nash_equilibrium(A, B, s1, s2)

[0.66666667 0.33333333] [0.33333333 0.66666667]


True

In [105]:
# We can clearly compute the equilibrium also for 2x2 zero-sum games
# Matching Pennies
A = np.array([
    [1, -1],
    [-1, 1]
])

s1,s2 = mixed_nash_equilibrium_2x2(A, -A)
# It should print [1/2, 1/2] [1/2, 1/2]
print(s1, s2)
is_nash_equilibrium(A,-A, s1,s2)

[0.5 0.5] [0.5 0.5]


True

In [106]:
A = np.array([[4,5],[6,3]])
B = np.array([[2,1],[0,3]])

s1,s2 = mixed_nash_equilibrium_2x2(A, B)

# Equilibrium should be [0.75, 0.25] [0.5, 0.5]
print(s1, s2)
is_nash_equilibrium(A, B, s1, s2)

[0.75 0.25] [0.5 0.5]


True

In [107]:
A = np.array([[3,1],[0,2]])
B = np.array([[2,1],[0,3]])

s1,s2 = mixed_nash_equilibrium_2x2(A, B)

# Equilibrium should be [0.75, 0.25] [0.25, 0.75]
print(s1, s2)
is_nash_equilibrium(A, B, s1, s2)

[0.75 0.25] [0.25 0.75]


True

In [108]:
A = np.array([[2,3],[3,-1]])
B = np.array([[-5,-6],[-2,0]])

s1,s2 = mixed_nash_equilibrium_2x2(A, B)

# Equilibrium should be [2/3, 1/3] [0.8, 0.2]
print(s1, s2)
is_nash_equilibrium(A, B, s1, s2)

[0.66666667 0.33333333] [0.8 0.2]


True