Imitation Dynamics and Regret Minimization models added to the repo (#…

…229) * Imitation Dynamics and Regret Minimization models added to the repo * reformatted the code using black * adding the documentation for imitation dynamics and regret minimization * removed the main() calls from the test files * recommended changes are made into the code and the documentation * changed the max_iterations and num_generations into iterations within the code and the docs * reformatted with black version 24.3.0 * removed the reference of greedy algorithm and the note from imitation docs * ran tox to fix the documentation issues and moved the functions to learning * edited the how-to files as recommented
drvinceknight · Mar 25, 2024 · a3fb697 · a3fb697
1 parent 335a8ed
commit a3fb697
Show file tree

Hide file tree

Showing 10 changed files with 634 additions and 0 deletions.
diff --git a/docs/how-to/solve-with-imitation-dynamics.rst b/docs/how-to/solve-with-imitation-dynamics.rst
@@ -0,0 +1,36 @@
+.. _how-to-use-imitation-dynamics:
+
+Solve with Imitation Dynamics
+==============================
+
+One of the algorithms implemented in :code:`Nashpy` is called
+:code:`imitation_dynamics()`, this is implemented as a method on the :code:`Game`
+class::
+
+    >>> import nashpy as nash
+    >>> import numpy as np
+    >>> A = np.array([[3, -1], [-1, 3]])
+    >>> B = np.array([[-3, 1], [1, -3]])
+    >>> rps = nash.Game(A,B)
+
+This :code:`imitation_dynamics` method returns a generator of the outcomes 
+of the imitation dynamics algorithm::
+
+    >>> ne_imitation_dynamics = rps.imitation_dynamics()
+    >>> print(list(ne_imitation_dynamics))
+    [(array([0., 1.]), array([1., 0.]))]
+
+:code:`imitation_dynamics` takes the following parameters  :code:`iterations`, :code:`population_size`, :code:`random_seed` and :code:`threshold` within the function :code:`imitation_dynamics`.
+
+    >>> import nashpy as nash
+    >>> import numpy as np
+    >>> A = np.array([[3, -1,3], [-1, 3,6], [-1, 1,2]])
+    >>> B = np.array([[-3, 1,4], [1, -3,3], [-1, 3,4]])
+    >>> rps = nash.Game(A,B)
+    >>> population_size=200
+    >>> iterations=100
+    >>> random_seed=30
+    >>> threshold=0.3
+    >>> ne_imitation_dynamics = rps.imitation_dynamics(population_size=population_size,iterations=iterations,random_seed=random_seed,threshold=threshold)
+    >>> list(ne_imitation_dynamics)
+    [(array([0., 1., 0.]), array([1., 0., 1.]))]
diff --git a/docs/how-to/solve-with-regret-minimization.rst b/docs/how-to/solve-with-regret-minimization.rst
@@ -0,0 +1,33 @@
+.. _how-to-use-regret-minimization:
+
+Solve with Regret Minimization
+==============================
+
+One of the algorithms implemented in :code:`Nashpy` is called
+:code:`regret_minimization()`, this is implemented as a method on the :code:`Game`
+class::
+
+    >>> import nashpy as nash
+    >>> import numpy as np
+    >>> A = np.array([[3, -1], [-1, 3]])
+    >>> B = np.array([[-3, 1], [1, -3]])
+    >>> rps = nash.Game(A,B)
+
+This :code:`regret_minimization` method returns a generator of the outcomes 
+of the regret minimization algorithm::
+
+    >>> ne_regret_mini = rps.regret_minimization()
+    >>> print(list(ne_regret_mini))
+    [([0.5, 0.5], [0.5, 0.5])]
+
+:code:`regret_minimization` takes the following parameters :code:`learning_rate` and :code:`iterations`.
+
+    >>> A = np.array([[3, -1,3], [-1, 3,6], [-1, 1,2]])
+    >>> B = np.array([[-3, 1,4], [1, -3,3], [-1, 3,4]])
+    >>> rps = nash.Game(A,B)
+    >>> learning_rate = 0.2
+    >>> iterations = 1000
+    >>> ne_regret_mini = rps.regret_minimization(learning_rate=learning_rate,
+    iterations=iterations)
+    >>> print(list(ne_regret_mini))
+    [([0.0, 1.0, 0.0], [0.0, 0.0, 1.0])]
diff --git a/docs/text-book/imitation-dynamics.rst b/docs/text-book/imitation-dynamics.rst
@@ -0,0 +1,65 @@
+Detailed Mathematical Model of Imitation Dynamics
+==================================================
+
+Introduction
+------------
+
+The mathematical model of imitation dynamics describes how individuals in a population adapt their strategies over time based on observing the strategies of others. This document provides a detailed mathematical formulation for understanding and simulating imitation dynamics in strategic games.
+
+Initialization
+---------------
+
+- Let `N` denote the number of individuals in the population.
+- Let `M` denote the number of strategies available to each individual.
+- Initialize the population as an :math:`N \times M` \ matrix `P`, where each row represents the strategy of an individual.
+
+Interaction and Payoff Calculation
+-----------------------------------
+
+- Define a payoff matrix `U` for each player, where :math:`U_ij` represents the payoff for player `i` when they choose strategy `j` and their opponent chooses strategy `k`.
+- Calculate the payoff for each individual `n` given their strategy :math:`P_n` and the strategies of all other individuals:
+  :math:`\text{Payoff}_n = \text{P}_n \cdot U \cdot P^T`
+
+Imitation Mechanism
+--------------------
+
+- Identify the fittest individual based on their payoffs.
+- Let `F` be the index (or indices) of the fittest individual.
+- Update the strategies of all individuals to match the strategy of the fittest individual:
+  :math:`P_n` = :math:`P_F`, for all :math:`n = 1, 2, \ldots, N`
+
+Population Update
+-----------------
+
+- Repeat the interaction, payoff calculation, and imitation mechanism steps for a certain number of generations or until convergence.
+
+Convergence and Nash Equilibrium
+---------------------------------
+
+- Check for convergence by comparing strategies of successive generations.
+- If strategies stabilize, it indicates a potential Nash equilibrium.
+
+Thresholding (Optional)
+------------------------
+
+- Apply a thresholding mechanism to discretize strategies, values must range between 0 and 1, defaulted to 0.5.
+
+Comparison with Fictitious Play
+-------------------------------
+
+Even though the Imitation dynamics method to find equilibrium looks similar to the Fictional Play method, with strategies updated adaptively over time and players adjusting their strategies based on observations of past interactions or outcomes, there are a few differences between them, which are listed below.
+
+**Key Differences between Imitation Dynamics and Fictitious Play**
+
+
+**Strategy Update Mechanism**
+
+- In :code:`imitation_dynamics`, players copy the strategy of the most successful individual. This means that at each iteration, players directly mimic the strategy of the individual who achieved the highest payoff. 
+
+- In :code:`fictitious_play`, on the other hand, players update their strategies based on observed play counts of opponents' strategies. This involves players selecting their next move based on the cumulative history of their opponents' strategies rather than directly imitating successful players.
+
+Using Nashpy
+------------
+
+See :ref:`how-to-use-imitation-dynamics` for guidance of how to use Nashpy to
+simulation Imitation Dynamics.
diff --git a/docs/text-book/regret_minimization.rst b/docs/text-book/regret_minimization.rst
@@ -0,0 +1,43 @@
+Regret Minimization in Game Theory
+==================================
+
+Introduction
+------------
+
+In the context of game theory, "Regret" refers to the difference between a player's actual payoff and the payoff they would have received by playing a different strategy. By minimising regrets, players converge towards a Nash Equilibrium where no player has an incentive to deviate from their chosen strategy unilaterally
+
+Regret minimization is a concept used in game theory to model how players learn and adapt their strategies over time. It measures the "regret" experienced by a player for not choosing a different strategy that could have yielded a better outcome in hindsight.
+
+Mathematically, let's consider a sequential game where player :math:`i` selects a strategy from a set :math:`S_i` at each time step. The regret :math:`R_i(t)` of player :math:`i` at time :math:`t` with respect to strategy :math:`s` is defined as:
+
+.. math::
+
+    R_i(t) = \max_{s' \in S_i} \sum_{\tau=1}^{t} u_i(s', s_{-i}^\tau) - \sum_{\tau=1}^{t} u_i(s, s_{-i}^\tau)
+
+where:
+- :math:`s_{-i}^\tau` represents the joint strategy profile of all players except :math:`i` up to time :math:`\tau`.
+- :math:`u_i(s, s_{-i}^\tau)` is the utility or payoff obtained by player :math:`i` when playing strategy :math:`s` against the joint strategy :math:`s_{-i}^\tau`.
+
+The regret measures how much payoff player :math:`i` could have gained if they had chosen a different strategy instead of :math:`s` in each time step up to time :math:`t`.
+
+Regret Minimization Implementation
+----------------------------------
+
+An algorithm can be used to minimize "regret" by iteratively updating strategies based on past regrets. At each time step, a player selects the strategy that minimizes their regret. This approach aims to converge towards a strategy profile with low regret.
+
+Mathematically, the algorithm can be summarized as follows:
+
+- Initialize strategies for all players.
+- At each time step :math:`t`:
+  - Calculate the regret for each player based on their current strategy.
+  - Update strategies by selecting the option that minimizes regret for each player.
+- Repeat until convergence or a stopping condition is met.
+
+By iteratively updating strategies to minimize regret, players can learn to make better decisions over time and potentially converge towards a Nash equilibrium in the game.
+
+
+Using Nashpy
+------------
+
+See :ref:`how-to-use-regret-minimization` for guidance of how to use Nashpy to
+use Regret Minimization.
diff --git a/src/nashpy/game.py b/src/nashpy/game.py
@@ -15,6 +15,8 @@
 )
 from .learning.stochastic_fictitious_play import stochastic_fictitious_play
 from .utils.is_best_response import is_best_response
+from .learning.regret_minimization import regret_minimization
+from .learning.imitation_dynamics import imitation_dynamics
 
 
 class Game:
@@ -423,3 +425,67 @@ def linear_program(self):
         row_strategy = linear_program(row_player_payoff_matrix=A)
         column_strategy = linear_program(row_player_payoff_matrix=B.T)
         return row_strategy, column_strategy
+
+    def regret_minimization(self, learning_rate=0.1, iterations=100):
+        """
+        Obtain the Nash equilibria using regret minimization method using N number of itreations.
+        The code provided is based on the concept of regret matching,
+        with the fixed learning rate.
+
+        Algorithm implemented here is Algorithm 4.3 Theorem 4.4 of [Nisan2007]_
+
+        1. Build best Strategies probability of both players
+
+        Parameters
+        ----------
+        learning_rate : float
+            The  learning_rate determines the magnitude of the update towards the regrets
+
+        iterations : Integer
+            This value is defaulted to 100 itrations, this number could be modified to a larger or smaller number based on the untilities/payoff matrix shape
+
+        Returns
+        -------
+        Generator
+            The equilibria.
+        """
+        A, B = self.payoff_matrices
+        return regret_minimization(
+            A=A, B=B, learning_rate=learning_rate, iterations=iterations
+        )
+
+    def imitation_dynamics(
+        self,
+        population_size=100,
+        iterations=1000,
+        random_seed=None,
+        threshold=0.5,
+    ):
+        """
+        Simulate the imitation dynamics for a given game represented by payoff matrices A and B.
+
+        Parameters
+        ----------
+        population_size : number
+            number of individuals in the population of the group (default: 100)
+        iterations : number
+            number of generations to simulate (default: 1000)
+        random_seed : number
+            seed for reproducibility (default: None)
+        threshold : float
+            threshold value for representing strategies as 0 or 1 (default: 0.5)
+
+        Returns
+        -------
+        Generator
+            The equilibria.
+        """
+        A, B = self.payoff_matrices
+        return imitation_dynamics(
+            A=A,
+            B=B,
+            population_size=population_size,
+            iterations=iterations,
+            random_seed=random_seed,
+            threshold=threshold,
+        )
diff --git a/src/nashpy/learning/imitation_dynamics.py b/src/nashpy/learning/imitation_dynamics.py
@@ -0,0 +1,103 @@
+"""A function for a Imitation Dynamics algorithm"""
+
+import numpy as np
+from typing import Generator, Tuple, Any
+import numpy.typing as npt
+
+
+def payoff(player_strategy, opponent_strategy, player_payoff_matrix):
+    """
+    Calculate the payoff of a player given their strategy and the opponent's strategy.
+
+    Parameters
+    ----------
+    player_strategy: numpy array
+        representing the strategy of the player
+    opponent_strategy: numpy array
+        representing the strategy of the opponent
+    player_payoff_matrix: numpy matrix
+        representing the payoff matrix for the player
+
+    Returns
+    -------
+    return_value: scalar representing strategy and payoff matrix
+    """
+    return_value = np.dot(
+        player_strategy, np.dot(player_payoff_matrix, opponent_strategy)
+    )
+    return return_value
+
+
+def imitation_dynamics(
+    A: npt.NDArray,
+    B: npt.NDArray,
+    population_size=100,
+    iterations=1000,
+    random_seed=None,
+    threshold=0.5,
+) -> Generator[Tuple[float, float], Any, None]:
+    """
+    Simulate the imitation dynamics for a given game represented by payoff matrices A and B.
+
+    Parameters
+    ----------
+    A : numpy matrix
+        representing the payoff matrix for Player 1
+    B : numpy matrix
+        representing the payoff matrix for Player 2
+    population_size : number
+        number of individuals in the population of the group (default: 100)
+    iterations : number
+        number of generations to simulate (default: 1000)
+    random_seed : number
+        seed for reproducibility (default: None)
+    threshold : float
+        threshold value for representing strategies as 0 or 1 (default: 0.5)
+
+    Yields
+    -------
+    Generator
+        The equilibria.
+    """
+    num_strategies = len(A)
+
+    # Initialize population
+    if random_seed:
+        np.random.seed(random_seed)  # Set random seed for reproducibility
+
+    population_A = np.random.dirichlet(np.ones(num_strategies), size=population_size)
+    population_B = np.random.dirichlet(np.ones(num_strategies), size=population_size)
+
+    for generation in range(iterations):
+        # Play the game
+        payoffs_A = np.array(
+            [
+                payoff(population_A[i], population_B[i], A)
+                for i in range(population_size)
+            ]
+        )
+        payoffs_B = np.array(
+            [
+                payoff(population_B[i], population_A[i], B)
+                for i in range(population_size)
+            ]
+        )
+
+        # Update population based on payoffs
+        # Used Imitation dynamics in which the players copy the strategy of the most successful individual
+        fittest_A_index = np.argmax(payoffs_A)
+        fittest_B_index = np.argmax(payoffs_B)
+        population_A = np.tile(population_A[fittest_A_index], (population_size, 1))
+        population_B = np.tile(population_B[fittest_B_index], (population_size, 1))
+
+    # Calculate Nash equilibrium strategies
+    nash_equilibrium_A = np.mean(population_A, axis=0)
+    nash_equilibrium_B = np.mean(population_B, axis=0)
+
+    # Threshold the strategies
+    nash_equilibrium_A[nash_equilibrium_A >= threshold] = 1
+    nash_equilibrium_A[nash_equilibrium_A < threshold] = 0
+    nash_equilibrium_B[nash_equilibrium_B >= threshold] = 1
+    nash_equilibrium_B[nash_equilibrium_B < threshold] = 0
+
+    yield nash_equilibrium_A, nash_equilibrium_B