# AI Learns to Play Connect 4
#### Jordan Yeomans - 2018

## Part 6 - Implement AI Version 1

### What's Next?

At this point we have trained a Neural Network to predict the move of the winning player with 50% accuracy. Let's see if the bot can use what it's learnt to play (and hopefully win) against the same bot it was playing before

We need to do two things:

1. Update the State object to include a few parameters
2. Update the make_move function

#### Step 1 - Add Parameters To The State() Object

For simplicity I am using the State object to hold the store the placeholders and model. We can then call on this object each itteration.

1. Include the filepath to the saved model
2. Create x placeholder
3. Create Network (Needs to be the same as what has been trained)

In [1]:
class State(object):
    def __init__(self):
        self.settings = Settings()
        self.field = Field()
        self.round = 0

        folder = 'C:/Users/Jordan Yeomans/Documents/GitHub/RiddlesIO/four_in_a_row/Data/Raw_Data/AI-v1_vs_Random/Red_AI-v1/'
        num_files = os.listdir(folder)
        self.name = folder + str(len(num_files)+1) + '.npy'
        self.first = None

        self.model_folder = 'C:/Users/Jordan Yeomans/Documents/GitHub/RiddlesIO/four_in_a_row/NeuralNetworks/AI_Bot_Version_1/'

        self.x = tf.placeholder(tf.float32, shape=[None, 6, 7], name='input_placeholder')
        
        nn = tf.layers.flatten(self.x, name='input')
        nn = tf.layers.dense(nn, 256, activation=tf.nn.relu)
        nn = tf.layers.dense(nn, 256, activation=tf.nn.relu)
        nn = tf.layers.dense(nn, 256, activation=tf.nn.relu)
        self.last_layer = tf.layers.dense(nn, 7, name='output')

#### Step 2 - Update make_move() function

Unlike in the first bot, which had some long logic to figure out who's turn it was and what a winning move might be, we have scrapped all of that.

Now the simple question is:

- Given this board, what column should we place the token in?

The enemy is the original bot, which includes the logic to calculate the winning moves. This mean's our AI-v1 has a bit of a disavantage. So to win, our network needs to have learnt what is a winning move while the opponent is being told what a winning move is!

To implement the network:

1. We need to make a saver object so we can resore the saved model
2. We get the input data as the most recent board and reshape it to be of the shape (1, 6, 7). Remember our tensor is expecting a shape (None, 6, 7). It a bit of a trick, but None can't actually be "None", it needs to be something
3. Start a session
4. Restore the model
5. Run the model (Note the state.last_layer is the model stored in the State object. Same applies for state.x which is the x placeholder stored in the state object)
6. The move is the argmax of the output

How simple is that!

We continue to save the board so we can continue learning later!

In [2]:
def make_move(state):

    field_state = np.array(state.field.field_state)

    # Recording Data
    if state.round == 0:
        current_board = np.zeros((30, 6, 7))
    else:
        current_board = np.load(state.name)

    # Get Most Recent Board
    for row in range(6):
        yellow_idx = np.where(field_state[row] == '1')[0]
        red_idx = np.where(field_state[row] == '0')[0]

        current_board[state.round][row][yellow_idx] = 1
        current_board[state.round][row][red_idx] = -1

    # Make Move from NN
    saver = tf.train.Saver()

    input_data = current_board[state.round]
    input_data = input_data.reshape([1, input_data.shape[0], input_data.shape[1]])

    with tf.Session() as sess:

        saver.restore(sess, state.model_folder)

        output = sess.run(state.last_layer, feed_dict={state.x: input_data})
        move = np.argmax(output)

    np.save(state.name, current_board)

    return 'place_disc {}'.format(move)

#### Step 3 - Run Some Games!

Success! We are winning around 75% of games!

Remember, Random vs Random 50% - 50% so this is definititely an improvement!

That makes a bit of a checkpoint for us. We know we can:

1. Run Games
2. Store Games
3. Organise Data
4. Build and Train a NN
5. Implement an NN
6. Build an AI that has learnt something!

Let's see how far we can go!

Complete code is below

In [3]:
from IPython.display import Image
from IPython.core.display import HTML
Image(url= "https://i.imgur.com/oUsfFws.png", width=1000)

In [4]:
#!/usr/bin/env python3
import sys
import numpy as np
import os
import time
import tensorflow as tf

class Settings(object):
    def __init__(self):
        self.timebank = None
        self.time_per_move = None
        self.player_names = None
        self.your_bot = None
        self.your_botid = None
        self.field_width = None
        self.field_height = None


class Field(object):
    def __init__(self):
        self.field_state = None

    def update_field(self, celltypes, settings):
        self.field_state = [[] for _ in range(settings.field_height)]
        n_cols = settings.field_width
        for idx, cell in enumerate(celltypes):
            row_idx = idx // n_cols
            self.field_state[row_idx].append(cell)

class State(object):
    def __init__(self):
        self.settings = Settings()
        self.field = Field()
        self.round = 0

        folder = 'C:/Users/Jordan Yeomans/Documents/GitHub/RiddlesIO/four_in_a_row/Data/Raw_Data/AI-v1_vs_Random/Red_AI-v1/'
        num_files = os.listdir(folder)
        self.name = folder + str(len(num_files)+1) + '.npy'
        self.first = None

        self.model_folder = 'C:/Users/Jordan Yeomans/Documents/GitHub/RiddlesIO/four_in_a_row/NeuralNetworks/AI_Bot_Version_1/'

        self.x = tf.placeholder(tf.float32, shape=[None, 6, 7], name='input_placeholder')
        self.y = tf.placeholder(tf.float32, shape=[None, 7], name='output_placeholder')

        nn = tf.layers.flatten(self.x, name='input')
        nn = tf.layers.dense(nn, 256, activation=tf.nn.relu)
        nn = tf.layers.dense(nn, 256, activation=tf.nn.relu)
        nn = tf.layers.dense(nn, 256, activation=tf.nn.relu)
        self.last_layer = tf.layers.dense(nn, 7, name='output')

def parse_communication(text):
    """ Return the first word of the communication - that's the command """
    return text.strip().split()[0]


def settings(text, state):
    """ Handle communication intended to update game settings """
    tokens = text.strip().split()[1:]  # Ignore token 0, it's the string "settings".
    cmd = tokens[0]
    if cmd in ('timebank', 'time_per_move', 'your_botid', 'field_height', 'field_width'):
        # Handle setting integer settings.
        setattr(state.settings, cmd, int(tokens[1]))
    elif cmd in ('your_bot',):
        # Handle setting string settings.
        setattr(state.settings, cmd, tokens[1])
    elif cmd in ('player_names',):
        # Handle setting lists of strings.
        setattr(state.settings, cmd, tokens[1:])
    else:
        raise NotImplementedError('Settings command "{}" not recognized'.format(text))


def update(text, state):
    """ Handle communication intended to update the game """
    tokens = text.strip().split()[2:] # Ignore tokens 0 and 1, those are "update" and "game" respectively.
    cmd = tokens[0]
    if cmd in ('round',):
        # Handle setting integer settings.
        setattr(state.settings, 'round', int(tokens[1]))
    if cmd in ('field',):
        # Handle setting the game board.
        celltypes = tokens[1].split(',')
        state.field.update_field(celltypes, state.settings)


def action(text, state):
    """ Handle communication intended to prompt the bot to take an action """
    tokens = text.strip().split()[1:] # Ignore token 0, it's the string "action".
    cmd = tokens[0]
    if cmd in ('move',):
        move = make_move(state)
        state.round += 1
        return move
    else:
        raise NotImplementedError('Action command "{}" not recognized'.format(text))

def make_move(state):

    field_state = np.array(state.field.field_state)

    # Recording Data
    if state.round == 0:
        current_board = np.zeros((30, 6, 7))
    else:
        current_board = np.load(state.name)

    # Get Most Recent Board
    for row in range(6):
        yellow_idx = np.where(field_state[row] == '1')[0]
        red_idx = np.where(field_state[row] == '0')[0]

        current_board[state.round][row][yellow_idx] = 1
        current_board[state.round][row][red_idx] = -1

    # Make Move from NN
    saver = tf.train.Saver()

    input_data = current_board[state.round]
    input_data = input_data.reshape([1, input_data.shape[0], input_data.shape[1]])

    with tf.Session() as sess:

        saver.restore(sess, state.model_folder)

        output = sess.run(state.last_layer, feed_dict={state.x: input_data})
        move = np.argmax(output)

    np.save(state.name, current_board)

    return 'place_disc {}'.format(move)

def main():
    command_lookup = { 'settings': settings, 'update': update, 'action': action }
    state = State()


    for input_msg in sys.stdin:
        cmd_type = parse_communication(input_msg)
        command = command_lookup[cmd_type]

        # Call the correct command.
        res = command(input_msg, state)

        # Assume if the command generates a string as output, that we need
        # to "respond" by printing it to stdout.
        if isinstance(res, str):
            print(res)
            sys.stdout.flush()

if __name__ == '__main__':
    main()


  from ._conv import register_converters as _register_converters
