In [4]:
import numpy as np
import matplotlib.pyplot as plt

# 6. Richer Representations: Beyond the Normal and Extensive Forms

There are several reasons we want to explore other forms of game. Firstly, so far we have assumed a lot of finite variables. The number of decisions / time is finite, the players are finite, and the actions have been finite. We may want to consider what happens for infinite agents, or for games which are repeated forever. Secondly, so far we have assumed that agents share a knowledge of eachother's payoffs. This is very unrealistic! Thirdly, we would like to find more compact ways of describing games, for the sake of efficiency.

## 6.1 Repeated games

Consider a game like the prisoner's dilemma, in normal form, which is played multiple times. In the case that agents have no information about previous games the answer is pretty trivial. However, if agents can see what happened before then things become more complicated.

### 6.1.1 Finitely repeated games

With a finitely repeated game we simply need a bigger table in the normal form (or tree in the extensive form) to capture the strategies and payoffs for both players. We assume that agents don't know what eachother will play, but find out later. One simple answer is just to play the same strategy at each game level, which we call a *stationary strategy*. But in general, the action can depend on what was played before. Consider the prisoner's dilemma repeated 100 times. If we explore the rational solutions we find that players should always defect every time. 

### 6.1.2 Infinitely repeated games

If a game is repeated infinitely we get an infinite tree of decisions. In order to quantify the reward of being in a state we can consider the average reward over all games, or a discounted reward of future games (i.e., the agent cares more about the present, or the games might end at some random point). 

In infinitely repeated games there are strategies other than the stationary ones. For instance we have Tit-for-Tat (TfT) in which players start by cooperating and then repeat whatever their opponent's strategy was.

Ideally we want to be able to calculate the equilibirum strategies in infinitely repeated games. A good place to start is **The Folk Theorem**, which states that the possible rewards in an equilibrium game that can be obtained by the players are exactly their rewards in the original game, as long as both are at least equal to the minmax values. 

Recall we define the minmax reward for player $1$ as

$$v_1=\text{min}_{s_2}\text{max}_{s_1}u_1(s_1,s_2)$$

The above corresponds to player 1 going AFTER player 2 in the two player single game (This is just using 2 players, but adding others is fine, the minimisation is taken over all other player's actions).

In order to prove the theorem we need to introduce a couple of concepts. Below $r_i$ is the average reward for player $i$ in the infinite game.

1. **Enforcability:** $r_i$ is enforcable if $r_i\geq v_i$.
2. **Feasibility:** $r_i$ is feasible if it can be a written combination of the rewards in $u_i$ (i.e., just a mixture where the weights sum to 1).

The theorem is proved by first noting that no player will accept less than their minmax value at equilibrium, and then by constructing an equilibrium solution where all players go through a sequence of outcomes together, punishing people who deviate by playing minmax against them.

**Part 1:**

Consider that there is a player for whom $r_i<v_i$. Then 