# Poker Bot

Poker Bot built using TensorFlow. Trained with reinforcement learning using the PyPokerEngine library.

## TO DO:
### Q Learning
Implement a Q learning training sytem to update the model after each round with the following:
- Observation: Game state observed before action (hole_card, round_state)
- Action: Output of declare_action
- Reward: The net change in chips from the round

Note that the multiple actions taken during a single round have the same reward but different game states so are independent phenomena. 

### Error: Raises
When raising is an invalid move the valid_actions "amount" field says -1. When the model raises even when it is not able to it raises the pot to -1.
- Need to make the model choose next best option if raise invalid

### Building the Poker AI

In [9]:
# Packages to install
# pip install PyPokerEngine
# pip install pyyaml h5py  # Required to save models in HDF5 format

In [53]:
import os
import math
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
import collections
import collections.abc
import h5py
from tensorflow import keras


from tensorflow.keras import layers, losses
from tensorflow.keras.layers import Dense, Flatten, Reshape, LeakyReLU
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.models import load_model
from tensorflow.keras.optimizers import Adam, RMSprop
from collections import Counter
from datetime import datetime
import keras
import keras.callbacks
from keras.callbacks import TensorBoard

In [3]:
from pypokerengine.players import BasePokerPlayer
from pypokerengine.api.emulator import Emulator
from pypokerengine.utils.game_state_utils import restore_game_state

In [4]:
# Notes:

# use tf.keras.callbacks.ModelCheckpoint to continually 
# save the model both during and at the end of training.
# https://www.tensorflow.org/tutorials/keras/save_and_load

In this implementation, the feature vector's length is constant for every game state. The community cards are represented as 5 pairs of suit and rank features, with placeholders for missing cards. The action histories are aggregated into a fixed number of features.

#### Model Settings

In [127]:
# Program settings
global DEBUG
DEBUG = False

# Reinforcement learning active
global train_model
train_model = True

# Load saved model or create new
global save_model
global load_saved_model
global num_saves
global model_pathname
save_model = False
load_saved_model = False
num_saves = 0
model_pathname = 'pathname'

# Model info
global model_input_size
model_input_size = 34

#### Q Learning

In [132]:
# Q(S, A)
# Q value a function of the states and actions (game state and actions taken)
# AKA how good it is to take action A at state S

# TD-Update (Temporal Difference)
# Q(S,A) <-- Q(S,A) + alpha * (R + gamma * Q(S',A') - Q(S,A))))

# S :Current State of the agent.
# A  : Current Action Picked according to some policy.
# S'  : Next State where the agent ends up.
# A'  : Next best action to be picked using current Q-value estimation, i.e. pick the action with the maximum Q-value in the next state.
# R  : Current Reward observed from the environment in Response of current action.

# (>0 and <=1) : Discounting factor for future rewards
gamma = 0.2 

# Step length taken to update the estimation of Q(S, A)
alpha = 1

# Greedy policy
# Probability of choosing any action at random (vs. action with highest Q value)
epsilon = 0.1


In [64]:
# Define neural network architecture
def create_model(input_shape):
    if load_saved_model and os.path.isfile(model_pathname):
        model = model.load_model(model_pathname)
    else:
        model = tf.keras.Sequential([
            tf.keras.layers.Dense(128, activation='relu', input_shape=input_shape),
            tf.keras.layers.Dropout(0.5),
            tf.keras.layers.Dense(64, activation='relu'),
            tf.keras.layers.Dense(32, activation='relu'),
            tf.keras.layers.Dense(3, activation='softmax')  # Assuming simple output (fold, call, or raise)
        ])
        model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [79]:
# Helper methods
def flatten(x):
        if isinstance(x, collections.Iterable):
            return [a for i in x for a in flatten(i)]
        else:
            return [x]

In [130]:
class PokerBot(BasePokerPlayer):
    
    # TODO: Read some saved state of the model to allow reinforcement learning over time
    def __init__(self):
        # Initialize model
        self.model = create_model((model_input_size,))  # Input shape, adjust based on features
        # Save game states and actions in input output matrices for Q learning
        self.x = [] # np.empty((0, model_input_size))
        self.Y = [] # np.empty((0, 2)) # action, amount
    
    # TODO: Look at Emulator implementation to make the model work with reinforcement learning
    # Make an action based on the model output
    def declare_action(self, valid_actions, hole_card, round_state):
        # Prepare feature vector based on the game state
        feature_vector = self._extract_features(hole_card, round_state)
        if DEBUG:
            print("input size: " + str(len(feature_vector)))
            print("input shape: " + str(feature_vector.shape))
        
        
        # Use the model to predict the action
        action_probs = self.model.predict(feature_vector).flatten()
        # If raising invalid move restrict valid moves to fold and call
        if valid_actions[2]["amount"]["max"] == -1:
            action_probs = action_probs[:2]
        outcome = np.argmax(action_probs)
        
        if DEBUG:
            print("action_probs length: " + str(len(action_probs)))
            print(action_probs)
            print("action argmax: " + str(outcome))
        
        action_info = valid_actions[outcome]
        action = action_info["action"]
        if outcome == 2:
            # Scale raise to the confidence of the model
            amount = action_info["amount"]["min"] + math.floor((action_info["amount"]["max"] - action_info["amount"]["min"]) * action_probs[outcome])
            if DEBUG:
                print(valid_actions)
                print(str((action_info["amount"]["max"] - action_info["amount"]["min"]) * action_probs[outcome]))
        else:
            amount = action_info["amount"]
        
        # Update Q learning input, output (observation, action)
        # np.append(self.x, feature_vector, axis=1)
        # np.append(self.Y, [action, amount], axis=1)
        self.x.append(feature_vector)
        self.Y.append([action, amount])
        
        return action, amount
    
    # Setup Emulator object by registering game information
    def receive_game_start_message(self, game_info):
        return
        
        # Emulator skeleton code
        player_num = game_info["player_num"]
        max_round = game_info["rule"]["max_round"]
        small_blind_amount = game_info["rule"]["small_blind_amount"]
        ante_amount = game_info["rule"]["ante"]
        blind_structure = game_info["rule"]["blind_structure"]
        
        self.emulator = Emulator()
        self.emulator.set_game_rule(player_num, max_round, small_blind_amount, ante_amount)
        self.emulator.set_blind_structure(blind_structure)
        
        # Register algorithm of each player which used in the simulation.
        for player_info in game_info["seats"]["players"]:
            self.emulator.register_player(player_info["uuid"], PokerBot())

    # Not neccesarily useful
    def receive_round_start_message(self, round_count, hole_card, seats):
        # Reset Round info for Q learning
        # NOTE: Only for debugging purposes. Optimal practices may vary
        self.x = []
        self.Y = []
        pass

    # Not neccesarily useful
    def receive_street_start_message(self, street, round_state):
        pass

    # Can incorporate player observation in model updated with each move
    def receive_game_update_message(self, new_action, round_state):
        pass
    
    # Update model with each round result
    def receive_round_result_message(self, winners, hand_info, round_state):
        if train_model:
            # Update model with round results
            pass
        if save_model:
            # Save model to file
            self.model.save(model_pathname + "_V" + num_saves)
        
        # print(self.x)
        print("Round actions: player " + self.uuid)
        self._print_hole_cards(self.x)
        print(self.Y)
        
        pass
    
    # Additional methods
    
    # Produce a feature vector of length 34
    def _extract_features(self, hole_card, round_state):
        # 4 Features from hole cards
        hole_card_features = [self._card_to_feature(card) for card in hole_card]

        # 10 Features from community cards (always represent 5 cards)
        community_cards = round_state['community_card'] + [None] * (5 - len(round_state['community_card']))
        community_card_features = [self._card_to_feature(card) if card else [0, 0] for card in community_cards]

        # 8 Standard features
        standard_features = [
            round_state['round_count'],
            round_state['pot']['main']['amount'],
            sum([side_pot['amount'] for side_pot in round_state['pot']['side']]),
            round_state['dealer_btn'],
            round_state['small_blind_pos'],
            round_state['big_blind_pos'],
            round_state['small_blind_amount'],
            self._street_to_feature(round_state['street'])
        ]

        # 8 Action history features (2 {# raises, # calls} for each betting stage: preflop, flop, turn, river)
        action_history_features = self._aggregate_action_histories(round_state['action_histories'])

        # Combine all features into a single fixed-size feature vector of length 34
        # Flatten the list of lists
        features = flatten(hole_card_features + community_card_features + standard_features + action_history_features)
        features = np.array(features)
        features = features.reshape(1, -1)
        return features
    
    def _card_to_feature(self, card):
        # Convert card to a numerical feature
        suits = {'C': 1, 'D': 2, 'H': 3, 'S': 4, 'None': 0}
        ranks = {'2': 2, '3': 3, '4': 4, '5': 5, '6': 6, '7': 7, '8': 8, '9': 9, 'T': 10, 'J': 11, 'Q': 12, 'K': 13, 'A': 14, 'None': 0}
        suit = suits.get(card[0], 0) if card else 0
        rank = ranks.get(card[1], 0) if card else 0
        return [suit, rank]
    
    def _feature_to_card(self, card):
        # Reverse mappings
        suits_reverse = {1: 'C', 2: 'D', 3: 'H', 4: 'S', 0: 'None'}
        ranks_reverse = {2: '2', 3: '3', 4: '4', 5: '5', 6: '6', 7: '7', 8: '8', 9: '9', 10: 'T', 11: 'J', 12: 'Q', 13: 'K', 14: 'A', 0: 'None'}
        
        # Map features back to card representation
        suit = suits_reverse.get(int(card[0]), 'None')
        rank = ranks_reverse.get(int(card[1]), 'None')

        # Combine suit and rank to form card representation
        # If either suit or rank is 'None', the card is considered invalid or not present
        if suit == 'None' or rank == 'None':
            return 'None'
        else:
            return suit + rank
    
    def _street_to_feature(self, street):
        # Convert street to a numerical feature
        streets = {'preflop': 1, 'flop': 2, 'turn': 3, 'river': 4, 'showdown': 5}
        return streets.get(street, 0)

    def _aggregate_action_histories(self, action_histories):
        '''
        # Aggregate action histories into a fixed-length vector
        # Example: Count the number of raises, calls, etc.
        raise_count = sum(1 for action in action_histories.get('preflop', []) if action['action'] == 'raise')
        call_count = sum(1 for action in action_histories.get('preflop', []) if action['action'] == 'call')
        # Add more aggregated features as needed
        # Ensure the length of this vector is fixed
        return [raise_count, call_count]
        '''
        
        # Initialize counts
        raise_count = [0, 0, 0, 0]  # Preflop, Flop, Turn, River
        call_count = [0, 0, 0, 0]
        fold_count = [0, 0, 0, 0]

        # Define rounds
        rounds = ['preflop', 'flop', 'turn', 'river']

        # Count actions in each round
        for i, round in enumerate(rounds):
            for action in action_histories.get(round, []):
                if action['action'] == 'raise':
                    raise_count[i] += 1
                elif action['action'] == 'call':
                    call_count[i] += 1
                elif action['action'] == 'fold':
                    fold_count[i] += 1

        # Flatten and return
        return raise_count + call_count + fold_count
    
    def _print_hole_cards(self, feature_list):
        for features in feature_list:
            # Assuming the first four features are for the two hole cards
            card1_features = features[0][0:2]  # First two features for the first card
            card2_features = features[0][2:4]  # Next two features for the second card
            
            # Convert features to card representations
            card1 = self._feature_to_card(card1_features)
            card2 = self._feature_to_card(card2_features)

            # Print the hole cards for this round
            print(f"Hole Cards: {card1}, {card2}")

### Simulating Games

In [131]:
from pypokerengine.api.game import setup_config, start_poker

# Declare game setup paramers
config = setup_config(max_round=10, initial_stack=100, small_blind_amount=5)
config.register_player(name="p1", algorithm=PokerBot())
config.register_player(name="p2", algorithm=PokerBot())
config.register_player(name="p3", algorithm=PokerBot())
game_result = start_poker(config, verbose=1)

Started the round 1
Street "preflop" started. (community card = [])
"p1" declared "raise:89"
"p2" declared "call:89"
"p3" declared "fold:0"
Street "flop" started. (community card = ['H8', 'HT', 'C9'])
"p2" declared "call:0"
"p1" declared "raise:10"
"p2" declared "call:10"
Street "turn" started. (community card = ['H8', 'HT', 'C9', 'H7'])
"p2" declared "call:0"
"p1" declared "fold:0"
"['p2']" won the round 1 (stack = {'p1': 1, 'p2': 209, 'p3': 90})
Round actions: player hdagfeczuamlwbaxajkgvm
Hole Cards: C7, D9
Hole Cards: C7, D9
Hole Cards: C7, D9
[['raise', 89], ['raise', 10], ['fold', 0]]
Round actions: player mhvwceygwkyxlvnothorxl
Hole Cards: CA, DT
Hole Cards: CA, DT
Hole Cards: CA, DT
Hole Cards: CA, DT
[['call', 89], ['call', 0], ['call', 10], ['call', 0]]
Round actions: player ptvagydvgxrtffupjlllpc
Hole Cards: C8, S5
[['fold', 0]]
Started the round 2
Street "preflop" started. (community card = [])
"p3" declared "fold:0"
"['p2']" won the round 2 (stack = {'p1': 0, 'p2': 214, 'p

### Save Model to File