In [2]:
import numpy as np
import matplotlib.pyplot as plt

# Learning and Teaching

In this section we look at learning from the game theory perspective.

## 7.1 Why the subject of "learning" is complex

Starting with a brief overview of some core themes...

### 7.1.1 The interaction between learning and teaching

When multiple agents are learning one additional source of complexity is that they are generally learning off one another, creating a dynamical system which evolves over time. Such systems can produce a wide range of results, even for very simple learning rules. A second conceptual confusion is that agents are now no longer just learning on their own, but an intelligent agent will also be teaching. This can be shown with a simple example.

Imagine we have a repeated game with these payoffs:

$$
\begin{array}{c|cc}
\text{} & \text{L} & \text{R} \\
\hline
\text{U} & 1,0 & 3,2 \\
\text{D} & 2,1 & 4,0 \\
\end{array}
$$

In this case the best strategy for the Row player is always D in any single game. Column then goes L giving Row a payoff of 2. However, if Row goes U column will go R, which ends up giving Row a payoff of 3 instead. However, this goes against the instincts of the Row player.  If Column is sensible they will do L until Row plays U with some consistency. Row maybe then need to teach Column that it will go U, and it can go R. Row would then need to resist going D once column is going R.

It's key that players teach / learn from eachother what moves they will play and under what conditions.

### 7.1.2 What constitutes learning?

Learning in game theory basically involves repeated or stochastic games. We need agents to adapt over time. An example of a simple rule might be 'tit-for-tat', but more complicated rules are possible.

Learning can involve not just determining the strategies of other players, but also the nature of the game itself. It is possible for agents to converge to an equilibrium without knowing the actual game being played. 

A slight distinction might be made between evolutionary games where members of a population are playing against eachother as opposed to repeated games, but these are basically the same.

### 7.1.3 If learning is the asnwer, what is the question?

Learning is not necessarily a good thing. In certain games like Chicken it might even be beneficial to not be capable of learning. So it is worth asking why learn at all. 

**Descriptive theories:**

Descriptive theories are about how agents actually learn. It's about showing that under a particular scenario the agents learn to adopt a set of behaviours which can be seen in the real world. There are different measures of the quality of the convergence:

1. Nash equilibrium
2. Emprical frequency equilibrium (e.g., in PSR the players converge to 1/3,1/3,1/3).
3. Correlated equilbrium
4. None of the above, but the policies converge to an intersting state

**Prescriptive theories:**

Prescriptive theories are about how agents *should* learn. We can study whether their learning strategies are in equilibrium, we can ask whether agents are learning well-enough with a given strategy. We might look at requirements like:
1. Safety. Does it give at least the maxmin payoff?
2. Rationality. If the opponent plays a particular strategy does the learning algorithm converge to the best response?
3. No-regret. The results with the learning rule are at least as good as the results with a fixed pure strategy. More later.

In order to answer these questions we can look at how the agent learns against itself (self-play) or against another opponent who is learning as well.