# No Limit Texas Hold'em Deep Q learning

This notebook is using modern deep learning libraries to try to solve No Limit Hold'em (NLH). There are AIs developed that have beaten world class players in heads-up (2 players) NLH. We still have a long way to go here.

To run the notebook you need to install the https://github.com/VinQbator/holdem fork of the holdem library. A lot of bugfixes and changes were needed to effectively run the environment for deep learning.

Also keras-rl should be installed from https://github.com/VinQbator/keras-rl. Sorry for the inconvenience.

Rest of the libraries are found from pip as listed in the following imports section.

A lot of heavy lifting is in .py files adjacent to the notebook to keep the notebook clean.

Most of the effort here is put into building a framework to enable more serious development in the future.

# Imports

In [None]:
from players.atm import ATM
from players.ai_player import AIPlayer
from players.random_player import RandomPlayer
from training_env import TrainingEnv
from agents import build_dqn_agent, fit_agent, train_loop, load_agent_weights
from models import simple_model, complex_model, test_model
from util import visualize_history, use_jupyter, set_on_demand_memory_allocation
from helpers.poker_history import PokerHistory

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
set_on_demand_memory_allocation()

In [None]:
use_jupyter()

# Feature Engineering

### Some of the stuff that is happening in the wrapper layer for the holdem gym environment.
* Positions are one-hot encoded
* Pot and bet sizes are normalized to 100 big blinds
* Hand ranking either normalized or one-hot encoded
* Cards are one-hot encoded

# Action space

### Essentially infinite or at least large action space is split up into

* Base moves like FOLD/CALL/CHECK

* And few common bet/raise sizes relative to the pot size: 1/5, 1/4, 1/3, 2/5, 1/2, 3/5, 2/3, 3/4, 4/5, 1, 4/3, 5/3, 2, 3, 5, 10, 15, 20, 30, 50, 75, 100

In [None]:
# How many players in table
NUMBER_OF_SEATS = 2
# Max betsize in simulation environment (shouldn't really matter with discrete relative to pot sizing)
MAX_BET = 100000
# 'norm' (normalized) or 'one-hot' < how to encode player hand ranking from 7642 unique values
RANK_ENCODING = 'norm'

WINDOW = 1
MODEL = simple_model

FIRST_RUN_STEPS = 200
SECOND_RUN_STEPS = 200
THIRD_RUN_STEPS = 200
THIRD_RUN_ITERATIONS = 10

BENCHMARK_EPISODES = 200

In [None]:
# Lets start with playing against player that always calls or checks based on which is currently valid move
# Hopefully this will teach the agent something about hand strength at least
env = TrainingEnv.build_environment(ATM(), NUMBER_OF_SEATS, debug=False)

##### First lets train a simple model with 1 step sequences against an opponent who always calls or folds based on whichever move is valid

No need to train against it for long - we just want to learn some basics

In [None]:
model = MODEL(WINDOW, env.n_observation_dimensions, env.n_actions)
print(model.summary())
# window_length - how many timesteps to look into past (will multiply observation space by this, be careful)
# enable_double_dqn - https://arxiv.org/pdf/1509.06461.pdf
# enable_dueling_network - ???
# train_interval - every how many steps to run a train cycle (or if in 0...1 range, the soft update weight)
# n_warmup_steps - how many steps to run without training
# batch_size - number of (s, a, G) triplets to train on in one training cycle (as a batch)
# gamma - future reward discount essentially
# memory_interval - how often to add last step to memory buffer (discarding every other)

agent = build_dqn_agent(model, env.n_actions, window_length=WINDOW, target_model_update=0.001, 
                        enable_double_dqn=True, enable_dueling_network=True, dueling_type='avg', 
                        train_interval=100, n_warmup_steps=50, batch_size=32, gamma=.99, memory_interval=1)

In [None]:
# Let's play for 100000 steps (decisions made by AI)
agent, hist = fit_agent(agent, env, FIRST_RUN_STEPS, debug=False) 

## As we can see the simple network was able to learn a bit and achieve a positive winrate

NB! The Winrate and Winnings plots show $ not big blinds. (Didn't want to run the notebook again, fixed in code)

In [None]:
# Some plots of how the training session went
visualize_history(hist)

### Hand history rendering is still a bit wonky, but it's clear that the bot is not making too smart choices

In [None]:
# Let's evaluate our agent for 5 episodes (hands).
agent.test(env, nb_episodes=5, visualize=True)

# Lets now play against opponent who makes totally random moves

In [None]:
# Lets play against our bot with totally random moves now
# Hopefully it teaches the agent at least something about how to act on wide range of situations
env = TrainingEnv.build_environment(RandomPlayer(), NUMBER_OF_SEATS, debug=False)

In [None]:
# Train for playing against RandomPlayer
agent, hist = fit_agent(agent, env, SECOND_RUN_STEPS, False, hist)

# Our winrate has increased even if the opponent is not totally predictable

In [None]:
visualize_history(hist)

# Hand history shows that AI is still making quite random moves

In [None]:
agent.test(env, nb_episodes=5, visualize=True)

In [None]:
agent, hist = train_loop(agent, MODEL, env, steps_in_iteration=THIRD_RUN_STEPS, 
                         n_iterations=THIRD_RUN_ITERATIONS, window_length=WINDOW, verbose=1, debug=False)

In [None]:
agent.test(env, nb_episodes=50, visualize=True)

In [None]:
load_agent_weights(agent)

In [None]:
# Let's benchmark against ATM
env = TrainingEnv.build_environment(ATM(), n_seats=NUMBER_OF_SEATS)
hist = agent.test(env, nb_episodes=BENCHMARK_EPISODES, visualize=False, verbose=0, history=PokerHistory())
visualize_history(hist)

In [None]:
# Let's benchmark against RandomPlayer
env = TrainingEnv.build_environment(RandomPlayer(), n_seats=NUMBER_OF_SEATS)
hist = agent.test(env, nb_episodes=BENCHMARK_EPISODES, visualize=False, verbose=0, history=PokerHistory())
visualize_history(hist)