You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CFR Key Ideas:
Maximum regret over all actions approaches zero.
The that such rate goes to zero is governed by a regret bound which is algorithm dependent.
EFR Key ideas:
Time selection regret minimisation
Time selection functions (experts deciding who to listen to)
Deviations (take different actions depending on the current information set, proceeding information set or previous information set)
Hindsight rationality
Observable hindsight rationality (keep some observations hidden to limit computational complexity, given this allow the learner to be rational to the best of their observations).
Partial deviation sequences (allow three distinct phases, correlated play, deviated play, recorrelated play, this is shown to improve strategic power).
Mediated equilibrium (an equilibrium strategy profile where each player is rational with respect to a deviation set).
Memory probability function with respect to a deviation (generalises counterfactual reach probability to account for memory states and additionally playouts according to a given deviation)
Why I am using EFR as opposed to CFR, why haven't more people done this, what are the advantages/disadvantages.
All are good questions, look more into the Morrhill paper.
The text was updated successfully, but these errors were encountered: