-
Notifications
You must be signed in to change notification settings - Fork 69
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implement lp formulation of minimax theorem (#217)
* Write formulation of LP in docs. * Write functions to build components of LP * Implement LP formulation * Add more tests and documentation. * Add how to documentation. Closes #23
- Loading branch information
1 parent
be4530a
commit f215795
Showing
8 changed files
with
369 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
.. _how-to-use-minimax: | ||
|
||
Use the minimax theorem | ||
======================= | ||
|
||
One of the algorithms implemented in :code:`Nashpy` is based on :ref:`the | ||
minimax theorem <the-minimax-theorem>`, this is implemented as a | ||
method on the :code:`Game` class:: | ||
|
||
>>> import nashpy as nash | ||
>>> import numpy as np | ||
>>> A = np.array([[1, -1], [-1, 1]]) | ||
>>> matching_pennies = nash.Game(A) | ||
|
||
This returns the Nash equilibria by solving the underlying :ref:`linear program | ||
<formulation-of-linear-program>`:: | ||
|
||
>>> matching_pennies.linear_program() | ||
(array([0.5, 0.5]), array([0.5, 0.5])) | ||
|
||
Note that this is only defined for :ref:`Zero sum games <zero-sum-games>`:: | ||
|
||
>>> A = np.array([[1, -1], [-1, 1]]) | ||
>>> B = np.array([[2, -2], [-2, 2]]) | ||
>>> game = nash.Game(A, B) | ||
>>> game.linear_program() | ||
Traceback (most recent call last): | ||
... | ||
ValueError: The Linear Program corresponding to the minimax theorem is defined only for Zero Sum games. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
.. _zero-sum-games: | ||
|
||
Zero Sum Games | ||
============== | ||
|
||
.. _motivating-example-zero-sum-games: | ||
|
||
Motivating example: changing Rock Paper Scissors | ||
------------------------------------------------ | ||
|
||
If we modify the game of :ref:`Rock Paper Scissora | ||
<motivating-example-strategy-for-rps>` to add a single new strategy "Spock": | ||
|
||
- Spock smashes Scissors and Rock. | ||
- Paper disproves Spock. | ||
|
||
The other modification is that the game is no longer symmetric: only the row | ||
player can use the Well. | ||
|
||
The mathematical representation of this game is given by: | ||
|
||
.. math:: | ||
A = \begin{pmatrix} | ||
0 & -1 & 1 \\ | ||
1 & 0 & -1\\ | ||
-1 & 1 & 0\\ | ||
1 & -1 & 1 | ||
\end{pmatrix} | ||
Is there a way that the Row player can play that guarantees a particular | ||
expected proportion of wins? | ||
|
||
Optimising worst case outcomes | ||
------------------------------ | ||
|
||
The value of a game | ||
------------------- | ||
|
||
.. _formulation-of-linear-program: | ||
|
||
Formulation of the linear program | ||
--------------------------------- | ||
|
||
In a :ref:`Zero Sum Game <definition-of-zero-sum-game>`, given a row player | ||
payoff matrix :math:`A` with :math:`m` rows and :math:`n` columns, the following | ||
linear programme will give a strategy for the row player that ensures the best | ||
possible utility as well as the value of the game: | ||
|
||
.. math:: | ||
\max_{x\in\mathbb{R}^{(m + 1)\times 1}} cx | ||
Subject to: | ||
|
||
.. math:: | ||
\begin{align} | ||
M_{\text{ub}}x &\leq b_{\text{ub}} \\ | ||
M_{\text{eq}}x &= b_{\text{eq}} \\ | ||
x_i &\geq 0&&\text{ for }i\leq m | ||
\end{align} | ||
Where the parameters of the linear programme are defined by: | ||
|
||
.. math:: | ||
\begin{align} | ||
c &= (\underbrace{0, \dots, 0}_{m}, 1) && c\in\{0, 1\}^{1 \times (m + 1)}\\ | ||
M_{\text{ub}} &= \begin{pmatrix}(-A^T)_{11}&\dots&(-A^T)_{1m}&1\\ | ||
\vdots &\ddots&\vdots &1\\ | ||
(-A^T)_{n1}&\dots&(-A^T)_{nm}&1\end{pmatrix} && M\in\mathbb{R}^{n\times (m + 1)}\\ | ||
b_{\text{ub}} &= (\underbrace{0, \dots, 0}_{n})^T && b_{\text{ub}}\in\{0\}^{n\times 1}\\ | ||
M_{\text{eq}} &= (\underbrace{1, \dots, 1}_{m}, 0) && M_{\text{eq}}\in\{0, 1\}^{1\times(m + 1)}\\ | ||
b_{\text{eq}} &= 1 \\ | ||
\end{align} | ||
.. _the-minimax-theorem: | ||
|
||
The minimax theorem | ||
------------------- | ||
|
||
Using Nashpy | ||
------------ | ||
|
||
See :ref:`how-to-use-minimax` for guidance of how to use Nashpy to | ||
find the Nash equilibria of Zero sum games using the mini max theorem. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
""" | ||
This module contains | ||
""" | ||
import numpy as np | ||
import numpy.typing as npt | ||
import scipy.optimize | ||
|
||
|
||
def get_c(number_of_rows: int) -> npt.NDArray: | ||
""" | ||
Return the coefficient vector for the objective function of the LP that | ||
corresponds to the minimax theorem. | ||
Parameters | ||
---------- | ||
number_of_rows : int | ||
The number of rows in the payoff matrix | ||
Returns | ||
------- | ||
array | ||
A vector with m 0s followed by a single 1 where m is the number of rows | ||
in the payoff matrix. | ||
""" | ||
c = np.zeros(shape=(number_of_rows + 1)) | ||
c[-1] = -1 | ||
return c | ||
|
||
|
||
def get_A_ub(row_player_payoff_matrix: npt.NDArray) -> npt.NDArray: | ||
""" | ||
Return the upper bound linear matrix for the objective function of the LP | ||
that corresponds to the minimax theorem. | ||
Parameters | ||
---------- | ||
row_player_payoff_matrix : array | ||
The payoff matrix | ||
Returns | ||
------- | ||
array | ||
A matrix that corresponds to the upper bound. | ||
""" | ||
_, number_of_columns = row_player_payoff_matrix.shape | ||
return np.hstack( | ||
(-row_player_payoff_matrix.T, np.ones(shape=(number_of_columns, 1))) | ||
) | ||
|
||
|
||
def get_b_ub(number_of_columns: int) -> npt.NDArray: | ||
""" | ||
Return the upper bound vector for the LP that corresponds to the minimax | ||
theorem. | ||
Parameters | ||
---------- | ||
number_of_columns : int | ||
The number of columns in the payoff matrix | ||
Returns | ||
------- | ||
array | ||
A vector of zeros | ||
""" | ||
return np.zeros(shape=(number_of_columns, 1)) | ||
|
||
|
||
def get_A_eq(number_of_rows: int) -> npt.NDArray: | ||
""" | ||
Return the equality linear coefficients for the LP that corresponds to the | ||
minimax theorem. | ||
Parameters | ||
---------- | ||
number_of_rows : int | ||
The number of rows in the payoff matrix | ||
Returns | ||
------- | ||
array | ||
A vector with m 1s followed by a single 0 where m is the number of rows | ||
in the payoff matrix. | ||
""" | ||
A_eq = np.ones(shape=(1, number_of_rows + 1)) | ||
A_eq[0, -1] = 0 | ||
return A_eq | ||
|
||
|
||
def get_bounds(number_of_rows: int) -> list: | ||
""" | ||
Return the bounds for each variable the LP that corresponds to the | ||
minimax theorem. | ||
Parameters | ||
---------- | ||
number_of_rows : int | ||
The number of rows in the payoff matrix | ||
Returns | ||
------- | ||
list | ||
A list of tuples, each tuple contains the lower and upper bound for each | ||
variable. | ||
""" | ||
return [(0, None) for _ in range(number_of_rows)] + [(None, None)] | ||
|
||
|
||
def linear_program(row_player_payoff_matrix: npt.NDArray) -> npt.NDArray: | ||
""" | ||
The Linear Program that corresponds to the minimax theorem. This builds and | ||
returns the row players' strategy. | ||
Parameters | ||
---------- | ||
row_player_payoff_matrix : array | ||
The payoff matrix | ||
Returns | ||
------- | ||
array | ||
The row player maxmin strategy | ||
""" | ||
number_of_rows, number_of_columns = row_player_payoff_matrix.shape | ||
c = get_c(number_of_rows=number_of_rows) | ||
A_ub = get_A_ub(row_player_payoff_matrix=row_player_payoff_matrix) | ||
b_ub = get_b_ub(number_of_columns=number_of_columns) | ||
A_eq = get_A_eq(number_of_rows=number_of_rows) | ||
b_eq = 1 | ||
bounds = get_bounds(number_of_rows=number_of_rows) | ||
|
||
res = scipy.optimize.linprog( | ||
c=c, | ||
A_ub=A_ub, | ||
b_ub=b_ub, | ||
A_eq=A_eq, | ||
b_eq=b_eq, | ||
bounds=bounds, | ||
) | ||
return res.x[:-1] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.