tic tac toe experiments
This is my version of https://github.com/AxiomaticUncertainty/Deep-Q-Learning-for-Tic-Tac-Toe 's neural network.
After attending a talk by a Google employee on neural networks I decided that the network can probably be shrunk by a lot, which will lead to faster training times. The original code trains to perfection in about 30mins on my PC, and the smaller NN is also unbeatable in the same time frame. I did think it would train faster, but that didn't appear to be the case. The play was still quite weak / losing after 15-20 minutes.
The code has been heavily commented whilst I was reading through to ensure I understood everything that was going on. They should help a beginner understand how it all works.