This program is an implementation of the Q-learning reinforcement algorithm on a combinatorial game called Nim. This implementation was largely informed and inspired by the undergraduate thesis of Erik Jarleberg (Royal Institute of Technology) entitled "Reinforcement learning on the combinatorial game of Nim".
The game of Nim is a two-person mathematical game with the following setup. The game intially starts with three piles of items and the two players alternate taking items from the three piles. The rule is that you must choose one of the three piles and remove at least one items from that pile and the player who removes the last item(s) wins.
There is a known optimal strategy for the game which is credited to Charles L. Bouton of Harvard which can serve as a benchmark for evaluating the performance on our Q-learning agent.
Erik Jarleberg's Paper - Link
Nim (Wikipedia) - Link