<a href="https://colab.research.google.com/github/2zilli/kuhn-poker-cfr/blob/main/kuhn_poker_cfr.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Counterfactual Regret Minimization (CFR) for Kuhn Poker

This notebook explores the application of Counterfactual Regret Minimization (CFR) to the game of Kuhn Poker, a simplified version of poker that is often used as a benchmark in game theory and AI research. We will implement the game logic, develop a CFR agent, and simulate games to evaluate the agent's effectiveness.


## Abstract Base Class: PokerGame

The `PokerGame` class serves as an abstract base class for poker-style games. This class defines the structure and required methods that any specific poker game implementation must provide. Here's an overview of its responsibilities:

- **Card Representation**: Converts card values to human-readable characters.
- **Game State Management**: Includes methods to reset the game, check if the game state is terminal, and replicate the game state.
- **Actions and Payoffs**: Defines methods to get available actions, perform actions, determine if the current state is a showdown, and calculate payoffs.

This abstraction allows us to implement any specific poker game rules by extending this base class and providing specific implementations for these abstract methods.


In [None]:
from abc import ABC, abstractmethod

class PokerGame(ABC):
    """
    Abstract base class for poker-style games, providing common interface and functionality.
    """

    @staticmethod
    def get_card_char(card_value):
        """Returns the character representation of a card based on its integer value."""
        return {11: 'J', 12: 'Q', 13: 'K'}.get(card_value, '?')

    @abstractmethod
    def reset(self):
        """Resets the game to its initial state, shuffles and deals cards."""
        pass

    @abstractmethod
    def is_terminal(self):
        """Checks if the game is at a terminal state."""
        pass

    @abstractmethod
    def get_available_actions(self):
        """Returns the list of available actions for the current game state."""
        pass

    @abstractmethod
    def perform_action(self, action):
        """Performs an action and updates the game state accordingly."""
        pass

    @abstractmethod
    def is_showdown(self):
        """Determines if the current state is a showdown."""
        pass

    @abstractmethod
    def get_payoff(self):
        """Calculates and returns the payoff based on the current game state without altering the game's state."""
        pass

    @abstractmethod
    def clone(self):
        """Creates a deep copy of the current game state."""
        pass

    def __repr__(self):
        """Returns a string representation of the game state."""
        return super().__repr__()

## Kuhn Poker Game Implementation

`KuhnGame` extends the abstract `PokerGame` class to provide a specific implementation for Kuhn Poker, a simplified poker variant perfect for theoretical analysis and algorithm testing. Here's what this class entails:

- **Initialization and Resetting**: Sets up the game with initial configurations and shuffles the cards.
- **Game Progression**: Manages the flow of the game by updating the state based on player actions and switching turns.
- **Terminal State Check**: Determines if the game has reached a conclusion.
- **Available Actions**: Lists possible actions (bet or pass) depending on the game state.
- **Action Effects**: Updates the game state according to the chosen action, managing bets and turns.
- **Showdown Determination**: Identifies if the game ends in a showdown where cards are compared.
- **Payoff Calculation**: Computes the monetary outcome for each player at the game's end.
- **State Cloning**: Supports creating a deep copy of the game state, useful for simulations and AI decision-making.

This class encapsulates all the rules of Kuhn Poker and interacts with player agents to simulate gameplay.


In [None]:
import random

class KuhnGame(PokerGame):
    """Class representing the state and logic of a Kuhn Poker game."""

    def __init__(self):
        """Initializes the KuhnGame with default settings."""
        self.reset()

    def reset(self):
        """Resets the game to its initial state, shuffles and deals cards."""
        self.current_player = 0
        self.history = ""
        self.bets = [-1, -1]  # Each player antes 1 unit
        self.cards = [11, 12, 13]  # Using integers for easier comparison; 11 = Jack, 12 = Queen, 13 = King
        random.shuffle(self.cards)
        self.player_cards = self.cards[:2]

    def is_terminal(self):
        """Checks if the game is at a terminal state."""
        return self.history in ['pp', 'bp', 'pbp', 'bb', 'pbb']

    def get_available_actions(self):
        """Returns the list of available actions for the current game state."""
        if self.is_terminal():
            return []  # No actions available if the game has ended
        return ['b', 'p']

    def perform_action(self, action):
        """Performs an action and updates the game state accordingly."""
        if self.is_terminal():  # Raise an error if an action is attempted when the game is terminal
            raise ValueError("Cannot perform actions in a terminal game state.")

        self.history += action
        if action == 'b':
            self.bets[self.current_player] -= 1  # Deduct an additional unit for a bet
        self.current_player = 1 - self.current_player  # Switch turns

    def is_showdown(self):
        """Determines if the current state is a showdown."""
        if not self.is_terminal():
            raise ValueError("Showdown status requested but the game is not in a terminal state.")
        return self.history[-1] == self.history[-2]

    def get_payoff(self):
        """Calculates and returns the payoff based on the current game state without altering the game's state."""
        # Initialize a copy of the bets to calculate the payoff
        payoff = self.bets[:]
        if self.is_showdown():
            winner = 0 if self.player_cards[0] > self.player_cards[1] else 1
            loser = 1 - winner
            payoff[winner] = -self.bets[winner]  # Invert the winner's negative bets to reflect the gain
            payoff[loser] = self.bets[loser]     # Loser's bets stay negative
        else:
            winner = self.current_player
            loser = 1 - winner
            payoff[winner] = -self.bets[loser]   # Winner gets positive the amount lost by loser
            payoff[loser] = self.bets[loser]     # Loser's bet remains the same (negative)
        return payoff

    def clone(self):
        """Creates a deep copy of the current game state."""
        cloned_game = KuhnGame()
        cloned_game.cards = self.cards[:]
        cloned_game.player_cards = self.player_cards[:]
        cloned_game.current_player = self.current_player
        cloned_game.history = self.history[:]
        cloned_game.bets = self.bets[:]
        return cloned_game

    def __repr__(self):
        """Returns a string representation of the game state."""
        display_cards = [self.get_card_char(card) for card in self.player_cards]
        return (f"Game(cards={display_cards}, next_player_to_act={self.current_player + 1}, "
                f"history='{self.history}', bets={self.bets})")

## Abstract Base Class: Agent

The `Agent` class serves as an abstract base class for game-playing agents. It defines the core functionalities that any agent must implement to interact with a game environment. The key responsibilities of this class include:

- **Action Selection**: Defines how an agent chooses an action based on the current state of the game.

This structure allows for the creation of various types of game-playing agents, providing flexibility in implementing different strategies or AI algorithms.


In [None]:
from abc import ABC, abstractmethod

class Agent(ABC):
    @abstractmethod
    def select_action(self, game):
        """
        Select an action to take based on the current game state.

        Parameters:
            game (KuhnGame): The Kuhn Poker game in its current state.

        Returns:
            str: The action chosen by the agent. Possible actions are 'b' for bet and 'p' for pass.
        """
        pass

## Learning Agent

Building upon the base `Agent` class, the `LearningAgent` class introduces additional functionalities pertinent to agents that can learn from their experiences. This class emphasizes methods that support learning and adaptation over time, essential for developing sophisticated AI players. Key features include:

- **Experience Recording**: Captures the outcomes of actions taken in specific game states, facilitating learning mechanisms such as reinforcement learning or Monte Carlo methods.
- **Training**: Iteratively adjusts the agent's strategy based on accumulated data, improving decision-making over time.
- **Model Management**: Provides methods for saving and loading learned models, enabling persistence of learning across sessions.

These enhancements make `LearningAgent` suitable for creating agents capable of evolving their strategies through gameplay.


In [None]:
from abc import abstractmethod

class LearningAgent(Agent):
    @abstractmethod
    def record_experience(self, infoset, rewards):
        """
        Record the experience associated with a specific game state to learn from it. Each possible action at this state
        has an associated reward.

        Parameters:
            infoset (str): The representation of the game state, encapsulating the player's card and action history.
            rewards (dict): A dictionary of rewards where keys are actions and values are the associated rewards for these actions.
        """
        pass

    @abstractmethod
    def train(self, iterations):
        """
        Train the agent based on accumulated data.

        Parameters:
            iterations (int): The number of training cycles to perform.

        This method could be implemented to periodically adjust the agent's strategy, such as after a set number of games or rounds, or in response to performance metrics.
        """
        pass

    @abstractmethod
    def save_model(self, filepath):
        """
        Save the model or strategy parameters to a file.

        Parameters:
            filepath (str): The path to the file where the model should be saved.
        """
        pass

    @abstractmethod
    def load_model(self, filepath):
        """
        Load the model or strategy parameters from a file.

        Parameters:
            filepath (str): The path to the file from which the model should be loaded.
        """
        pass

### Chance Sampling CFR Agent

The `CFRAgent` implements the Chance Sampling method of the Counterfactual Regret Minimization (CFR) algorithm. This method is designed to reduce computational complexity by sampling outcomes at chance nodes during the traversal of the game tree. Despite the sampling at chance nodes, the agent fully explores all possible actions available to the players at decision nodes, making it a comprehensive approach to learning optimal strategies in two-player zero-sum games.

#### Key Features of the `CFRAgent`:
- **Chance Sampling CFR**: Utilizes a chance-sampled game tree to minimize computation time per iteration while maintaining effective exploration of strategic options.
- **Exhaustive Player Action Exploration**: At each decision node, all potential actions are considered, ensuring thorough evaluation of possible strategies.
- **Adaptive Strategy Improvement**: Continuously refines strategies based on accumulated regret values for each action, progressively approaching an optimal Nash equilibrium strategy.
- **Broad Applicability**: While specifically effective in two-player zero-sum games, the principles underpinning this agent are applicable across various game types that feature multiple agents and sequential decision-making.

#### Operation Details:
- **Strategy and Regret Management**: The agent keeps track of regret values for each action at every information set, which guide the adjustment of action probabilities in future iterations based on past performance.
- **Iterative Strategy Optimization**: The agent adjusts its strategic approach by increasing the probabilities of actions with higher regret values, thus refining its strategy towards the Nash equilibrium over numerous iterations.
- **Efficiency in Complex Environments**: By employing Chance Sampling, the `CFRAgent` efficiently manages computational resources while thoroughly exploring strategic options, making it suitable for complex, multi-agent environments with imperfect information.

This implementation of Chance Sampling CFR offers a blend of computational efficiency and strategic depth, enabling the development of robust game strategies in environments characterized by uncertainty and multiple decision stages.


In [None]:
import random
import pickle
import os

class CFRAgent(LearningAgent):
    def __init__(self, game, delay=1000):
        super().__init__()
        self.game = game
        self.regret_sum = {}
        self.strategy_sum = {}
        self.delay = delay
        self.iteration_count = 0

    def save_model(self, filename):
        # Ensure the 'models' directory exists
        models_dir = 'models'
        if not os.path.exists(models_dir):
            os.makedirs(models_dir)
        # Construct the full filepath
        filepath = os.path.join(models_dir, filename)

        data = {
            'regret_sum': self.regret_sum,
            'strategy_sum': self.strategy_sum,
            'iteration_count': self.iteration_count
        }
        with open(filepath, 'wb') as f:
            pickle.dump(data, f)
        print(f"Model saved to {filepath}")

    def load_model(self, filename):
        # Construct the full filepath
        filepath = os.path.join('models', filename)
        if not os.path.exists(filepath):
            print(f"No model file found at {filepath}. Starting from scratch.")
            return False
        try:
            with open(filepath, 'rb') as f:
                data = pickle.load(f)
            self.regret_sum = data['regret_sum']
            self.strategy_sum = data['strategy_sum']
            self.iteration_count = data['iteration_count']
            print(f"Model loaded from {filepath}")
            return True
        except Exception as e:
            print(f"Failed to load model from {filepath}: {e}")
            return False

    def select_action(self, game_state):
        info_set = self.get_information_set(game_state)
        available_actions = game_state.get_available_actions()
        average_strategy = self.get_average_strategy(info_set, available_actions)
        actions, weights = zip(*average_strategy.items())
        action = random.choices(actions, weights=weights)[0]
        return action

    def get_average_strategy(self, information_set, available_actions):
        strategy_sum = self.strategy_sum.get(information_set, {action: 1.0 for action in available_actions})
        sum_of_sums = sum(strategy_sum.values())
        return {action: strategy_sum[action] / sum_of_sums if sum_of_sums > 0 else 1.0 / len(available_actions) for action in available_actions}

    def get_current_strategy(self, information_set, available_actions):
        if information_set not in self.regret_sum:
            self.regret_sum[information_set] = {action: 0 for action in available_actions}
        regrets = self.regret_sum[information_set]
        total_positive_regrets = sum(max(regret, 0) for regret in regrets.values())
        return {action: (max(regrets[action], 0) / total_positive_regrets if total_positive_regrets > 0 else 1.0 / len(available_actions)) for action in available_actions}

    def update_strategy_sums(self, information_set, strategy, reach_prob):
        if self.iteration_count > self.delay:  # Only update strategy sums after the delay period
            if information_set not in self.strategy_sum:
                self.strategy_sum[information_set] = {action: 0 for action in strategy}
            for action in strategy:
                self.strategy_sum[information_set][action] += strategy[action] * reach_prob

    def cfr(self, game_state, reach_prob):
        if game_state.is_terminal():
            return game_state.get_payoff()

        hero = game_state.current_player
        villain = 1 - hero
        info_set = self.get_information_set(game_state)
        available_actions = game_state.get_available_actions()
        current_strategy = self.get_current_strategy(info_set, available_actions)
        self.update_strategy_sums(info_set, current_strategy, reach_prob[villain])

        node_utility = [0.0, 0.0]
        action_utility = {}
        for action in available_actions:
            next_game_state = game_state.clone()
            next_game_state.perform_action(action)
            next_reach_prob = reach_prob[:]
            next_reach_prob[hero] *= current_strategy[action]
            action_utility[action] = self.cfr(next_game_state, next_reach_prob)[hero]
            node_utility[hero] += current_strategy[action] * action_utility[action]

        for action in available_actions:
            regret = action_utility[action] - node_utility[hero]
            self.regret_sum[info_set][action] += regret * reach_prob[villain]

        node_utility[villain] = -node_utility[hero]

        return node_utility

    def train(self, iterations):
        for i in range(iterations):
            self.game.reset()
            self.cfr(self.game, [1.0, 1.0])
            self.iteration_count += 1  # Increment the iteration count after each training game

    def get_information_set(self, game_state):
        card_int = game_state.player_cards[game_state.current_player]
        info_set = f"{card_int}{game_state.history}"
        return info_set

    def print_pretty_strategy(self):
        print("Information Set | Action Probabilities")
        print("-" * 41)
        sorted_info_sets = sorted(self.strategy_sum.items(), key=lambda x: (int(x[0][:2]), x[0][2:]))
        for info_set, strategies in sorted_info_sets:
            card_int = int(info_set[:2])
            card_char = PokerGame.get_card_char(card_int)
            history = info_set[2:]
            sum_of_sums = sum(strategies.values())
            formatted_strategies = " | ".join(f"{action}: {strategies[action] / sum_of_sums:.5f}" if sum_of_sums > 0 else f"{action}: {1.0 / len(strategies):.5f}" for action in strategies)
            print(f"{card_char}{history:<14} | {formatted_strategies}")

    def record_experience(self, infoset, rewards):
        # Placeholder function, not used in CFR directly
        pass

### Training the Counterfactual Regret Minimization (CFR) Agent for Kuhn Poker

In this section, we initiate and train a `CFRAgent` specifically tailored for Kuhn Poker. The CFR agent utilizes the Chance Sampling technique to optimize its strategy towards a Nash equilibrium over iterative self-play. Below are the steps followed in training and evaluating the agent:

1. **Initialize the Agent**:
   - An instance of `CFRAgent` is created with Kuhn Poker as the game environment.

2. **Model Loading**:
   - The system attempts to load a pre-existing trained model from `kuhn_cfr_model.pkl`. If no model is found, training starts from scratch. This approach allows for continuous improvement of the agent as more training iterations can be built upon previously learned strategies.

3. **Training Process**:
   - The agent undergoes a specified number of training iterations; in this case, 1,000,000. Each iteration represents a self-play game where the agent plays against itself, refining its strategy based on the outcomes and improving its understanding of the game dynamics.

4. **Strategy Visualization**:
   - After training, the agent's strategy is displayed. This showcases how the agent's decision-making has evolved over the training iterations, reflecting a move towards optimal play.

5. **Model Saving**:
   - Post training, the current state of the agent (including its strategy sums and regret values) is saved back to `kuhn_cfr_model.pkl`. This model saving enables the preservation of strategic developments and provides a checkpoint that can be reloaded in future sessions.

6. **Performance Reflection**:
   - The total number of iterations completed during training is printed, providing a metric of the agent's experience and the computational effort involved.

This methodology ensures a systematic approach to developing a robust CFR agent capable of approaching theoretical optimal strategies in Kuhn Poker through computational learning and iterative self-improvement.


In [None]:
kuhn_cfr_agent = CFRAgent(KuhnGame())

# Try to load a pre-existing model
if not kuhn_cfr_agent.load_model('kuhn_cfr_model.pkl'):
    print("Starting training from scratch.")

# Train with a suitable number of iterations
kuhn_cfr_agent.train(1000000)

# Print strategies to see how they have evolved
kuhn_cfr_agent.print_pretty_strategy()
print("Iterations:", kuhn_cfr_agent.iteration_count)

# Save the model after training
kuhn_cfr_agent.save_model('kuhn_cfr_model.pkl')


Model loaded from models/kuhn_cfr_model.pkl
Information Set | Action Probabilities
-----------------------------------------
J               | b: 0.30828 | p: 0.69172
Jb              | b: 0.00000 | p: 1.00000
Jp              | b: 0.33348 | p: 0.66652
Jpb             | b: 0.00000 | p: 1.00000
Q               | b: 0.00000 | p: 1.00000
Qb              | b: 0.33292 | p: 0.66708
Qp              | b: 0.00000 | p: 1.00000
Qpb             | b: 0.64086 | p: 0.35914
K               | b: 0.92315 | p: 0.07685
Kb              | b: 1.00000 | p: 0.00000
Kp              | b: 1.00000 | p: 0.00000
Kpb             | b: 1.00000 | p: 0.00000
Iterations: 51000000
Model saved to models/kuhn_cfr_model.pkl


### Best Response Agent for Testing CFR Strategies

The `BestResponseAgent` is designed to identify and exploit the best responses to strategies developed by other agents, such as those trained via Counterfactual Regret Minimization (CFR). This agent plays a crucial role in validating the robustness and optimality of learned strategies by attempting to exploit any possible weaknesses systematically.

#### Key Features:

- **Strategy Profile Maintenance**: Maintains a record of average rewards associated with each action at different game states (information sets). This allows the agent to choose actions that historically yielded the highest rewards.

- **Action Selection**:
  - For each decision point, the agent selects the action with the highest average reward based on past encounters within the same information set.
  - If no historical data is available for the current state, it defaults to a random action selection, ensuring variability in play and exploration of new strategies.

- **Experience Recording**:
  - After each game, the agent records the outcomes associated with each action taken at various states to update its strategy profile.
  - This continual learning process refines its understanding of the game dynamics and optimal responses.

- **Model Persistence**:
  - **Saving Models**: The agent's current strategy profile can be saved to a file, allowing long-term retention of learned strategies and easy reload for further refinement or evaluation.
  - **Loading Models**: At initialization, the agent attempts to load a pre-existing strategy profile from a specified file. This feature facilitates cumulative learning where the agent resumes learning from where it left off.

- **Pretty Print Strategy**:
  - Provides a formatted output of the learned strategy profile, detailing the average rewards for each action at different information sets. This output is invaluable for analysis and debugging of the agent's decision-making process.

#### Usage Scenario:

The `BestResponseAgent` is typically used in a testing or simulation environment where it is tasked with playing against a pre-trained agent (like a CFR-based agent). By rigorously challenging the strategies of these trained agents, it helps in verifying their strength and identifying potential areas for improvement.

### Implementation Note:

This agent is part of a broader framework where it interacts with various game types and other agents. It is crucial for the testing phase of strategy development, providing insights into the effectiveness of different strategies under adversarial conditions.


In [None]:
import json
import os
import random

class BestResponseAgent(LearningAgent):
    def __init__(self, game):
        self.game = game
        self.strategy_profile = {}  # Store the strategy profiles with counters for averaging

    def select_action(self, game):
        hero_card = game.player_cards[game.current_player]
        infoset = f"{hero_card}{''.join(game.history)}"

        # Check if we have history and the strategy profile has this history recorded
        if infoset in self.strategy_profile:
            # Select the action with the highest average reward
            best_action = max(self.strategy_profile[infoset], key=lambda a: self.strategy_profile[infoset][a]['total_reward'] / self.strategy_profile[infoset][a]['count'])
            return best_action

        # If no history or no data for this history, choose randomly
        return random.choice(game.get_available_actions())

    def record_experience(self, infoset, rewards):
        if infoset not in self.strategy_profile:
            self.strategy_profile[infoset] = {}

        for action, reward in rewards.items():
            if action not in self.strategy_profile[infoset]:
                self.strategy_profile[infoset][action] = {'total_reward': 0, 'count': 0}
            self.strategy_profile[infoset][action]['total_reward'] += reward
            self.strategy_profile[infoset][action]['count'] += 1

    def train(self, iterations=1):
        # Training method to potentially adjust internal strategies based on observed gameplay
        pass

    def save_model(self, filename):
        # Ensure the 'models' directory exists
        models_dir = 'models'
        if not os.path.exists(models_dir):
            os.makedirs(models_dir)
        # Construct the full filepath
        filepath = os.path.join(models_dir, filename)

        try:
            with open(filepath, 'w') as f:
                json.dump(self.strategy_profile, f, ensure_ascii=False, indent=4)
            print("Model saved successfully.")
        except Exception as e:
            print(f"Failed to save model: {e}")

    def load_model(self, filename):
        # Construct the full filepath
        filepath = os.path.join('models', filename)

        try:
            if os.path.exists(filepath):
                with open(filepath, 'r') as f:
                    self.strategy_profile = json.load(f)
                print("Model loaded successfully.")
                return True
            else:
                print(f"No model found at {filepath}. Starting fresh.")
                return False
        except Exception as e:
            print(f"Failed to load model: {e}")
            return False

    def print_pretty_strategy(self):
        print("Information Set | Action Values")
        print("-" * 38)
        # Sorting the strategy profile by card integer value and then by the history
        sorted_info_sets = sorted(self.strategy_profile.items(), key=lambda x: (int(x[0][:1]), x[0][1:]))
        for info_set, actions in sorted_info_sets:
            card_int = int(info_set[:2])  # Extract the card integer directly
            card_char = PokerGame.get_card_char(card_int)  # Convert card integer to character
            history = info_set[2:]  # Extract history part of the information set
            actions_str = ' | '.join([f"{a}: {data['total_reward'] / data['count']:.2f}" for a, data in actions.items() if data['count'] > 0])
            print(f"{card_char}{history:<14} | {actions_str}")


### Gameplay Simulator

The `GameplaySimulator` is designed to simulate gameplay between two agents, referred to as `hero` and `villain`, within a defined game setting like Kuhn or Leduc Poker. This class facilitates the interaction of strategies and tactics between competing agents, allowing for the dynamic training and assessment of agent behaviors.

#### Key Features:

- **Agent Interaction**:
  - The simulator controls the sequence of gameplay, determining which agent takes action based on the game state and alternating control between the `hero` and `villain`.
  
- **Exploratory Game Play**:
  - Each game is played to completion by recursively deciding actions for each agent based on the current game state until a terminal condition is met (e.g., all rounds are complete or a player folds).
  
- **Information Set Construction**:
  - During gameplay, an information set for the `hero` is constructed that encapsulates the current state from the `hero`'s perspective. This typically includes the hero's own cards and the action history, forming the basis for decision-making.

- **Adaptability Note**:
  - The current implementation of the `GameplaySimulator` uses only one hero card for constructing the information set, which is adequate for games like Kuhn Poker. For games with more complex or multiple hole cards, this approach may need to be adapted to ensure a more comprehensive representation of the game state is considered.

- **Action Reward Mapping**:
  - For every action available at a particular state, the simulator clones the game state, performs the action, and recursively evaluates the outcome. This approach captures the direct consequences of actions, informing the hero's strategy adjustments.
  
- **Strategy Updates**:
  - After exploring all possible actions and their outcomes, the simulator updates the `hero`'s strategy based on the observed rewards. This continuous feedback loop enhances the `hero`'s decision-making process over multiple simulations.

- **Reward Return**:
  - The simulator returns the reward of the chosen action, aiding in the assessment of strategy efficacy and decision-making processes over successive games.

#### Usage Scenario:

The `GameplaySimulator` is integral to training scenarios where a `hero` agent is tested against a `villain` to refine strategies or to evaluate the robustness of a learned strategy against a predefined or adaptive adversary. It is particularly useful in environments where exhaustive exploration of game states is feasible and informative for agent learning.

### Implementation Note:

While designed to be generic enough to handle any two-player card game, the `GameplaySimulator` is optimized for games like Kuhn and Leduc Poker, where the complexity and depth of the game tree allow for comprehensive strategy development and testing. This class is a cornerstone of the training framework, ensuring agents can adapt and optimize their strategies in a controlled yet competitive setting.


In [None]:
class GameplaySimulator:
    def __init__(self, hero, villain):
        self.hero = hero
        self.villain = villain

    def explore_game(self, game, hero_player):
        if game.is_terminal():
            return game.get_payoff()[hero_player]

        current_player = game.current_player

        if current_player == hero_player:
            hero_card = game.player_cards[hero_player]  # Include hero's card in history
            infoset = f"{hero_card}{''.join(game.history)}"  # Create a infoset string with integer card value

            available_actions = game.get_available_actions()
            action_rewards = {}

            for action in available_actions:
                game_clone = game.clone()
                game_clone.perform_action(action)
                reward = self.explore_game(game_clone, hero_player)
                action_rewards[action] = reward

            self.hero.record_experience(infoset, action_rewards)

            # Return the reward for the action chosen by the hero's strategy
            return action_rewards[self.hero.select_action(game)]
        else:
            action = self.villain.select_action(game)
            game.perform_action(action)
            return self.explore_game(game, hero_player)

### Training the Best Response Agent

The Best Response Agent is designed to optimize its strategy against a pre-trained opponent, in this case, a Counterfactual Regret Minimization (CFR) agent. This section outlines the steps involved in training, loading, and saving the Best Response Agent, ensuring it adapts effectively to the strategies employed by the CFR agent.

#### Initial Setup

1. **Agent Initialization**:
   - A new instance of the Best Response Agent is created, tailored for the game of Kuhn Poker.

2. **Model Loading**:
   - The agent attempts to load a pre-existing model from a file (`best_response_model.pkl`). If the model exists, it is loaded into the agent, allowing it to continue refining its strategy from previous training sessions. If no model is found, training starts from scratch.

#### Training Process

3. **Gameplay Simulation**:
   - A `GameplaySimulator` is set up with the Best Response Agent as the hero and the pre-trained CFR agent as the villain. This setup is crucial for ensuring that the Best Response Agent learns to counter strategies that the CFR agent might employ.
   - The agent undergoes training through repeated game sessions, where each session involves:
     - Resetting the game to a clean state.
     - Simulating a complete game from the perspective of the Best Response Agent, allowing it to explore different actions and learn from the outcomes.

4. **Exploring Game States**:
   - For each training iteration, the game is played twice—once with the Best Response Agent starting the game and once with the CFR agent starting. This alternating start helps in providing balanced exposure to different game states, enhancing the learning and adaptability of the Best Response Agent.

#### Strategy Evaluation and Saving

5. **Strategy Output**:
   - Post-training, the strategy of the Best Response Agent is printed, showcasing how it decides to act in various situations based on the learned experiences.

6. **Model Saving**:
   - The newly trained model is saved back to `best_response_model.pkl`, preserving the learning and improvements made during the training sessions for future use.

This training approach ensures that the Best Response Agent develops a robust strategy to effectively counter the moves of a CFR-trained opponent in Kuhn Poker. By continually refining its responses and strategies through simulated gameplay against a sophisticated adversary, the agent enhances its competitiveness and readiness for real-world application.


In [None]:
# Assuming CFR agent is already trained and ready
best_response_agent = BestResponseAgent(KuhnGame())  # Initialize Best Response agent for training

# Load the best response model if available
model_filename = 'best_response_model.pkl'
if best_response_agent.load_model(model_filename):
    print("Loaded existing best response model.")
else:
    print("No existing model found, starting fresh training.")

gameplay_simulator = GameplaySimulator(best_response_agent, kuhn_cfr_agent)

# Train the Best Response Agent by playing it against the CFR Agent
game = KuhnGame()  # Create a new game instance for the training session
for _ in range(1000000):
    game.reset()
    gameplay_simulator.explore_game(game, 0)
    game.reset()
    gameplay_simulator.explore_game(game, 1)

# Save the newly trained best response model
best_response_agent.save_model(model_filename)

# Print strategy of the Best Response Agent
best_response_agent.print_pretty_strategy()

Model loaded successfully.
Loaded existing best response model.
Model saved successfully.
Information Set | Action Values
--------------------------------------
J               | b: -1.00 | p: -1.00
Jb              | b: -2.00 | p: -1.00
Jp              | b: -1.00 | p: -1.00
Jpb             | b: -2.00 | p: -1.00
Q               | b: -0.50 | p: -0.33
Qb              | b: -1.00 | p: -1.00
Qp              | b: 0.70 | p: 0.80
Qpb             | b: -1.00 | p: -1.00
K               | b: 1.17 | p: 1.17
Kb              | b: 2.00 | p: -1.00
Kp              | b: 1.38 | p: 1.00
Kpb             | b: 2.00 | p: -1.00


## Simulation of CFR and Best Response Agents in Kuhn Poker

In this section, we simulate a large number of Kuhn Poker games between a pre-trained Counterfactual Regret Minimization (CFR) agent and a Best Response agent. The simulation helps verify the effectiveness of the CFR strategy by alternately positioning the CFR agent as Player 0 and Player 1 against an adaptive opponent. This approach provides insights into the CFR agent's performance and adaptability under varied game conditions.

### Setup
- **Game Initialization**: We use the `KuhnGame` class to create instances of the game.
- **Agent Initialization**: The CFR agent is assumed to be pre-trained and is ready for testing. The Best Response agent is also initialized.
- **Simulation Parameters**:
  - `num_games`: The total number of games played, set to 1,000,000 to ensure robust statistical results.
  - `payoff_records`: An array to track the cumulative payoffs for the CFR agent when it plays as Player 0 and Player 1.

### Simulation Process
1. **CFR as Player 0**:
   - For each game instance, the CFR agent acts as Player 0. The game progresses turn by turn between the CFR and Best Response agents until a terminal state (end of the game) is reached.
   - Payoffs are recorded for the CFR agent at the end of each game.
   
2. **CFR as Player 1**:
   - The roles are reversed, and the process is repeated with the CFR agent acting as Player 1. This setup tests the agent's strategies from both player perspectives.

### Calculation of Results
- **Average Payoff**:
  - The average payoff for the CFR agent when it plays as Player 0 and Player 1 is calculated by dividing the total payoffs by the number of games. This measure evaluates the agent's overall performance against the Best Response agent.
- **Theoretical Comparison**:
  - According to game theory and prior analysis of Kuhn Poker, the expected average utility for Player 0 is -1/18 and for Player 1 is 1/18 under optimal play. Our results are compared against these benchmarks to assess the CFR agent's adherence to theoretically optimal strategies. For further reading on Kuhn Poker strategy and Nash equilibrium, visit [Wikipedia](https://en.wikipedia.org/wiki/Kuhn_poker).

The results from this simulation provide valuable insights into the strategic consistency and effectiveness of the CFR agent when confronted with a strategically optimized opponent. It also serves as a practical validation of the CFR algorithm's capability to approximate Nash equilibrium strategies in two-player zero-sum games like Kuhn Poker.

### Implementation Code
Here is the code snippet that sets up and runs the simulation, evaluates the outcomes, and prints the average payoffs:


In [None]:
import random

# Initialize the game and agents
game = KuhnGame()
num_games = 1000000
payoff_records = [0, 0]

# Simulation loop for a large number of games
for game_index in range(num_games):
    # Simulate game where CFR is Player 0
    game.reset()
    while not game.is_terminal():
        current_player = kuhn_cfr_agent if game.current_player == 0 else best_response_agent
        action = current_player.select_action(game)
        game.perform_action(action)
    payoffs = game.get_payoff()
    payoff_records[0] += payoffs[0]

    # Simulate game where CFR is Player 1
    game.reset()
    while not game.is_terminal():
        current_player = best_response_agent if game.current_player == 0 else kuhn_cfr_agent
        action = current_player.select_action(game)
        game.perform_action(action)
    payoffs = game.get_payoff()
    payoff_records[1] += payoffs[1]

# Calculate average payoffs
average_payoff_0 = payoff_records[0] / num_games
average_payoff_1 = payoff_records[1] / num_games
print(f"Average Payoff over {num_games} games: CFR as Player 0 -> {average_payoff_0:.5f}, CFR as Player 1 -> {average_payoff_1:.5f}")
print(f"Expected theoretical payoff for Player 0: {-1/18:.5f}, for Player 1: +{1/18:.5f}")


Average Payoff over 1000000 games: CFR as Player 0 -> -0.05415, CFR as Player 1 -> 0.05602
Expected theoretical payoff for Player 0: -0.05556, for Player 1: +0.05556


## Conclusion

The simulation of 1,000,000 games of Kuhn Poker provided a comprehensive evaluation of the Counterfactual Regret Minimization (CFR) agent against a Best Response agent. Here's what we've learned and what's next:

### Observations:
- **Performance Consistency**: The CFR agent demonstrated its ability to perform close to the theoretical expectations derived from game theory, effectively approximating Nash equilibrium strategies in Kuhn Poker.
- **Effectiveness of CFR**: The results confirm the effectiveness of the CFR algorithm in achieving outcomes that align closely with theoretical predictions, showcasing its reliability in strategic game settings.

### Future Work:
- **Exploring More Complex Games**: To further validate the algorithm's effectiveness, the next step involves applying CFR to Leduc Holdem, a more complex variant of poker that introduces greater decision-making complexity. [Placeholder for future Leduc Holdem Notebook](#).
- **Integration with Neural Networks**: There is also an interest in integrating neural networks with CFR strategies. This initiative will explore whether combining these technologies can enhance or even surpass the strategic decision-making capabilities currently achieved by traditional CFR methods.

These initiatives aim to deepen our understanding of game theory applications in AI, expanding their practical utility across a wider range of strategic games and decision-making environments.
