In [1]:
import numpy as np
import matplotlib.pyplot as plt

## 7.7 Evolutionary learning and other large-population models

### 7.7.1 The replicator dynamic

Say we have a population playing repeated normal games against one another. Let $\theta_t(a)$ be the share of players who do action $a$ at time $t$. Then we can define the expected payoff to an agent at time $t$ as:

$$u_t(a) = \sum_{a^\prime}\theta_t(a^\prime)u(a,a^\prime)$$

I.e., a weighted sum of the utility of that action, given what others will play. The share of the population taking a particular action will increase if it gives a higher utility than the avergage for the population, and will decrease otherwise. There are some formulas to back this up too, but basically the gradient of $\theta_t(a)$ is $\theta_t(a)(u_t(a)-u_t^*)$, where $u_t^*$ is the mean.

One of the nice benefits of this model is you can kind-of view the ratio of the strategies as the mixed strategy of a single agent. 

As an example, consider the simple coordination game:

$$
\begin{array}{c|cc}
\text{} & \text{H} & \text{T} \\
\hline
\text{H} & 1,1 & 0,0 \\
\text{T} & 0,0 & 1,1 \\
\end{array}
$$

But let's start out a slight majority doing T.

In [9]:
theta= np.array([0.45,0.55])
for t in range(10):
    u_H = theta[0]
    u_T = theta[1]
    mean_u = theta[0]*u_H + theta[1]*u_T
    delta_H = theta[0]*(u_H-mean_u)
    delta_T = theta[1]*(u_T-mean_u)
    print("mean_u",round(mean_u,3),"u_H",round(u_H,3),"u_T",round(u_T,3),"delta_H",round(delta_H,3),"delta_T",round(delta_T,3))
    theta[0]+=delta_H
    theta[1]+=delta_T

mean_u 0.505 u_H 0.45 u_T 0.55 delta_H -0.025 delta_T 0.025
mean_u 0.511 u_H 0.425 u_T 0.575 delta_H -0.037 delta_T 0.037
mean_u 0.525 u_H 0.389 u_T 0.611 delta_H -0.053 delta_T 0.053
mean_u 0.554 u_H 0.336 u_T 0.664 delta_H -0.073 delta_T 0.073
mean_u 0.613 u_H 0.263 u_T 0.737 delta_H -0.092 delta_T 0.092
mean_u 0.717 u_H 0.171 u_T 0.829 delta_H -0.093 delta_T 0.093
mean_u 0.857 u_H 0.077 u_T 0.923 delta_H -0.06 delta_T 0.06
mean_u 0.966 u_H 0.017 u_T 0.983 delta_H -0.016 delta_T 0.016
mean_u 0.998 u_H 0.001 u_T 0.999 delta_H -0.001 delta_T 0.001
mean_u 1.0 u_H 0.0 u_T 1.0 delta_H -0.0 delta_T 0.0


As expected, everyone eventually just does T. This evolutionary model is not very different to the repeated-game model. There are several important concepts it's useful to define:
1. A steady state. Like before, this is where the strategies don't change round-to-round.
2. A stable steady state. Same as above, but if you change the state a little bit it doesn't evolve very far away.
3. An asymptotically stable state. It evolves back to the steady state.

The previous case is an example of 1 and 2, but not 3.

Unsuprisingly, Nash equilibria also turn out to be steady states and vice-versa. Also, when 3 is the case it turns out to be a trembling-hand equilibrium too (perfect and isolated).

### 7.7.2 Evolutionarily stable strategies (ESS)

A strategy is ESS if it isn't 'invadable' by any other strategy. What this means is that in the replicator dynamic the payoff for the original strategy is greater than the new strategy, when the new is a small share. A weak ESS is where the payoffs are at least the same, not necessarily better (so the population won't shrink). 

Every ESS is a Nash equilibrium, but not every equilibrium strategy is an ESS, unless you have a symmetric 2-player game.

Every ESS is unsuprisingly an asymptotically stable state.

### 7.7.3 Agent-based simulation and emergent convention

While replicator dynamics are nice for their mathematical simplicity it is also worth noting that you can model each agent independently. A good rule for doing this, e.g., for modelling social norms, is to allow each agent to update their behaviour given their own history. 2 agents might meet eachother randomly every iteration, and switch to a new strategy if it looks better than their current one over the past n iterations. Following this kind of rule for a social convention model eventually leads to a payoff that is at least the maxmin.