In [1]:
import numpy as np
import matplotlib.pyplot as plt

## 7.2 Fititious play

Fictitious play is a very simple learning rule. Like a lot of learning rules agents will maintain a model of the opponent's strategy. At each iteration this model is updated based on the decision the opponent makes, after the player updates their own strategy. Essentially players track the emprical distribution of their opponent's actions and respond accordingly. Note that agents here are basically unaware of their opponent's payoffs.

For example, consider the (repeated) game of Rock-Paper-Scissors:

$$
\begin{array}{c|cc}
\text{} & \text{R} & \text{P} & \text{S} \\
\hline
\text{R} & 0,0 & -1,1 & 1,-1 \\
\text{P} & 1,-1 & 0,0 & -1,1 \\
\text{S} & -1,1 & 1,-1 & 0,0 \\
\end{array}
$$

In [7]:
def RPS_best_response(strategy):
    r, p, s = strategy
    # payoff of playing rock/paper/scissors against opponent mix
    return np.argmax([s - p, r - s, p - r])

player1 = np.array([1., 1., 1.])
player2 = np.array([1., 1., 1.])

steps = [1, 2, 5, 10, 20, 50, 100, 200, 500, 
         1000, 2000, 5000, 10000]  # log / spaced checkpoints

for t in range(1, max(steps) + 1):
    # best response dynamics
    p2_strat = player2 / player2.sum()
    player1[RPS_best_response(p2_strat)] += 1

    p1_strat = player1 / player1.sum()
    player2[RPS_best_response(p1_strat)] += 1

    if t in steps:
        print(f"Iteration {t}")
        print("  P1 dist:", player1.astype(int))
        print("  P2 dist:", player2.astype(int))

Iteration 1
  P1 dist: [2 1 1]
  P2 dist: [1 2 1]
Iteration 2
  P1 dist: [2 1 2]
  P2 dist: [2 2 1]
Iteration 5
  P1 dist: [2 4 2]
  P2 dist: [3 2 3]
Iteration 10
  P1 dist: [4 4 5]
  P2 dist: [5 5 3]
Iteration 20
  P1 dist: [7 9 7]
  P2 dist: [8 7 8]
Iteration 50
  P1 dist: [16 19 18]
  P2 dist: [19 17 17]
Iteration 100
  P1 dist: [34 30 39]
  P2 dist: [35 36 32]
Iteration 200
  P1 dist: [65 68 70]
  P2 dist: [72 68 63]
Iteration 500
  P1 dist: [165 164 174]
  P2 dist: [172 170 161]
Iteration 1000
  P1 dist: [328 334 341]
  P2 dist: [349 333 321]
Iteration 2000
  P1 dist: [658 666 679]
  P2 dist: [686 667 650]
Iteration 5000
  P1 dist: [1638 1698 1667]
  P2 dist: [1683 1653 1667]
Iteration 10000
  P1 dist: [3375 3337 3291]
  P2 dist: [3312 3336 3355]


Over time this converges to a uniform strategy for both players, as you would expect. There are a couple of subtleties to this. Firstly, the outcome can depend slightly on how you break ties, and secondly it can highly depend on the starting distribution.

A steady state in Fictitious play is an action profile for both players that doesn't change from one iteration to the next. I.e., a fixed solution. 

A couple nice properties of steady states:

1. If the game has a pure-strategy nash equilibrium, that pure strategy will be a steady state for fictitious play of the repeated game.
2. If a pure-strategy is a steady state of ficitious play in the repeated game then it is a Nash equilibrium in the original game.

We can also say in addition to the above that if the empirical distribution converges then it converges to a Nash equilibrium. An example of this is the above Paper-Scissors-Rock game, where there is no steady-state, but the empirical distribution does converge.

There are several games where the this property is guaranteed:
1. Zero-sum games
2. Games which are solvable by iterated elimination of strictly-dominated strategies
3. Potential games (or, it seems, any game where the players have identical interests)
4. 2xn games with generic payoffs (look it up).

Fictitious play is very simple, but limited. We will see more complicated alternatives next.