Key ideas

Introduction

Goal: learn equilibrium policies in general-sum markov games
Nash-Q: only works on some games, depending on conditions
FF-Q: same
Correlated-Q:
- In general-sum games, correlated equilibria contains Nash
- In constant-sum games, correlated equilibria contains minimax
Nash equilibrium
- Vector of independent probability distributions over actions
- All agents optimize with respect to one another probabilities
- No agent would prefer a different action at convergence
Correlated equilibrium (more general)
- Dependencies among agents probability distributions
- Can be computed using linear programming
Difficulties for learning equilibrium policies
- Multiple equilibria - multiple payoff values
- Equilibrium selection via four selection functions (utilitarian, egalitarian, republican, libertarian)

For n-player, general sum games
Vi(s) = Nash for ith player (Qi(s)...Qn(s))
Nash for ith player denotes ith player according to Nash equilibrium determined by reward matrices

Correlated equilibrium allows for dependencies in the agents' randomizations:
Probability distribution over the joint space of actions, agents optimize with respect to one another but taking themselves into account ultimately
Utilitarian
- maximize the sum of all players' rewards
- argmax sum of players rewards
Egalitarian
- maximize the minimum of all players' rewards
- argmax min
Republican
- maximize the maximum of all players' rewards
- argmax max
Libertarian
- maximize the maximum of each individual ith player rewards
- argmax rewards where result is a Correlated Equlibrium