# [WIP] Playing Connect4 with Reinforcement Learning & Deep Q-Learning

Adapted from Kaggle's [Connect X Simulation Competition](https://www.kaggle.com/c/connectx),
the notebooks scaffold is based on the [ConnectX Getting Started](https://www.kaggle.com/ajeffries/connectx-getting-started) kernel. Overall notebook & code was created with the help of the following resources:
- https://www.kaggle.com/phunghieu/connectx-with-deep-q-learning
- https://towardsdatascience.com/deep-reinforcement-learning-build-a-deep-q-network-dqn-to-play-cartpole-with-tensorflow-2-and-gym-8e105744b998
- https://medium.com/applied-data-science/how-to-build-your-own-alphazero-ai-using-python-and-keras-7f664945c188
- https://towardsdatascience.com/from-scratch-implementation-of-alphazero-for-connect4-f73d4554002a
- ...

Author: Alex Erfurt, Customer Engineer at Google
- http://github.com/alexerfurt/
- [@alex_erfurt](https://twitter.com/alex_erfurt) on Twitter
- [Linkedin](https://www.linkedin.com/in/alexander-erfurt/)

## Install Kaggle Environment and other helpful libraries

In [5]:
# ConnectX environment was defined in v0.1.6
!pip install numpy matplotlib tensorflow kaggle-environments 



In [None]:
!pip install tensorflow==2.1.0rc0

In [1]:
pip install tqdm

Note: you may need to restart the kernel to use updated packages.


In [2]:
!pip install "kaggle-environments>=0.1.6"



In [5]:
import sys
sys.version

'3.7.3 | packaged by conda-forge | (default, Dec  6 2019, 08:36:57) \n[Clang 9.0.0 (tags/RELEASE_900/final)]'

In [3]:
pip freeze

absl-py==0.8.1
apache-beam==2.16.0
appdirs==1.4.3
appnope==0.1.0
astor==0.8.1
atari-py==0.2.6
attrs==19.3.0
avro-python3==1.9.1
backcall==0.1.0
bleach==3.1.0
cachetools==3.1.1
certifi==2019.11.28
cffi==1.13.2
chardet==3.0.4
Click==7.0
cloudpickle==1.2.2
colorama==0.4.3
crcmod==1.7
cryptography==2.8
cycler==0.10.0
cytoolz==0.10.1
dask==2.9.0
decorator==4.4.1
defusedxml==0.6.0
dill==0.3.1.1
docopt==0.6.2
entrypoints==0.3
fastavro==0.21.24
fasteners==0.15
future==0.18.2
gast==0.2.2
gin-config==0.1.3
gitdb2==2.0.6
GitPython==3.0.5
google-api-core==1.14.3
google-api-python-client==1.7.11
google-apitools==0.5.28
google-auth==1.8.2
google-auth-httplib2==0.0.3
google-cloud-bigquery==1.17.1
google-cloud-bigtable==1.0.0
google-cloud-core==1.1.0
google-cloud-datastore==1.7.4
google-cloud-pubsub==1.0.2
google-pasta==0.1.8
google-resumable-media==0.4.1
googleapis-common-protos==1.6.0
graphviz==0.10.1
grpc-google-iam-v1==0.12.3
grpcio==1.25.0
gym==0.15.3
h5py==2.10.0
hdfs==2.5.8
httplib2==0.12.0
idn

In [1]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm

### Setting up a tic-tac-toe environment to play around

In [4]:
from kaggle_environments import evaluate, make, utils

ModuleNotFoundError: No module named 'kaggle_environments'

In [None]:
env = make("tictactoe")

In [None]:
# Basic agent which marks the first available cell.
def my_agent(obs):
    return [c for c in range(len(obs.board)) if obs.board[c] == 0][0]

In [None]:
# Run the basic agent against a default agent which chooses a "random" move.
env.run([my_agent, "random"])

In [None]:
env.render(mode="ipython")

## Creating the Connect4 Environment

In [2]:
env = make("connectx", debug=True)
env.render(mode="ipython")

NameError: name 'make' is not defined

## Create an Agent

This first, sample agent provided by Kaggle randomly chooses a non-empty column

In [None]:
def my_agent(observation, configuration):
    from random import choice
    return choice([c for c in range(configuration.columns) if observation.board[c] == 0])

In [None]:
# Testing the agent; start with new env
env.reset()
# Play as the first agent against default "random" agent.
env.run([my_agent, "random"])
env.render(mode="ipython", width=500, height=450)

## Train / Debug your Agent

In [None]:
# Play as first position against random agent.
trainer = env.train([None, "random"])

obs = trainer.reset()

while not env.done:
    my_action = my_agent(obs, env.configuration)
    print("My action (column 0-6): ", my_action)
    obs, reward, done, info = trainer.step(my_action)

env.render(mode="ipython", width=200, height=180, header=False, controls=False)

## Evaluate your Agent

In [None]:
# Create function to calculate average reward:
def mean_reward(rewards):
    return sum(r[0] for r in rewards) / sum(r[0] + r[1] for r in rewards)

# Run multiple episodes to estimate its performance:
print("My Agent vs Random Agent:", mean_reward(evaluate("connectx", [my_agent, "random"], num_episodes=10)))
print("My Agent vs Negamax Agent:", mean_reward(evaluate("connectx", [my_agent, "negamax"], num_episodes=10)))

## Play your Agent

Just click on any column to place a checker there.

In [None]:
# "None" represents which agent you'll manually play as (first or second player)
env.play([my_agent, None], width=500, height=450)

In [None]:
env.state[0]['observation'].get('board')

# Define classes & functions

In [None]:
# Evironment Class; will be used to instatiate the environment & run each step the agent is taking
class Connect4():
    #ToDO

In [None]:
# Building a tf.keras model class, which implements a deep learning neural net model in Tensorflow & Keras
# The model's forward pass takes in our state (input) and gives out probabilities for each possbile action at each state (output)
class myModel(tf.keras.Model):
    def __init__(self, num_states, hidden_units, num_actions):