# Simple Inference

In this notebook we experiment with training a simple CNN which replicates the moves of the greedy defending agent.


Our plan is as follows:

1. Make the greedy defending agent play many times (against random, greedy and itself)
2. Collect data on the current state of the board, and the move made by the greedy defender
3. Create a CNN which takes board state as input and outputs the best move
4. Use the data collected to train the CNN to move like the greedy defending agent

We try out the training process with a 5x5 board first.

## Setup

Requirements:

- numpy
- torch
- torchvision

First we try out the following: 100 games with random, 100 games with greedy, and 100 games with itself.

Run the following from the root of the project:
```
python main.py -1 greedy_defender -2 random -o -p data/s5_gd_random -s 5 -r 50 -q
python main.py -1 random -2 greedy_defender -o -p data/s5_random_gd -s 5 -r 50 -q
python main.py -1 greedy_defender -2 greedy -o -p data/s5_gd_greedy -s 5 -r 50 -q
python main.py -1 greedy -2 greedy_defender -o -p data/s5_greedy_gd -s 5 -r 50 -q
python main.py -1 greedy_defender -2 greedy_defender -o -p data/s5_gd_gd -s 5 -r 100 -q
```

In [1]:
import csv
import glob
import numpy as np
import os
import random
import time

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import optim
from torch.optim import lr_scheduler
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms.functional as TF

## Model definition

The input will be the current state of the board. We represent this as a 2x10x10 array. The first channel of 10x10 represents coordinates with black pieces, and the second channel represents coordinates with white pieces. The model will output a 10x10 array of probabilities of the corresponding coordinate being the best move.

Given that we are looking for five-in-a-row combinations, the kernel size of the first convolution should be at least 5.

The output size is the same as the input, so we add a padding to the convolutions to make sure the size does not change.

In [2]:
# convert board to 2xNxN input array (pieces of self in 0th channel, opposition in 1st channel)
def board_to_input_array(board: np.array, side_self=1):
    side_opposition = 2 if side_self == 1 else 1

    input_array = torch.zeros((2,) + board.shape)
    for a, b in zip(*np.where(board == side_self)):
        input_array[0, a, b] = 1
    for a, b in zip(*np.where(board == side_opposition)):
        input_array[1, a, b] = 1
    return input_array

In [3]:
class ConvNet3(nn.Module):
    def __init__(self):
        super(ConvNet3, self).__init__()

        self.net = nn.Sequential(
            # 2xNxN -> 10xNxN
            nn.Conv2d(2, 10, 5, padding=(2,2)),
            nn.ReLU(),
            # 10xNxN -> 10xNxN
            nn.Conv2d(10, 10, 5, padding=(2,2)),
            nn.ReLU(),
            # 10xNxN -> 1xNxN
            nn.Conv2d(10, 1, 5, padding=(2,2))
        )

    def forward(self, x):
        n = x.size()[-1]
        x = self.net(x).view((-1, n*n))
        x = F.softmax(x, dim=1)
        x = x.view((-1, 1, n, n))
        return x

    # return coordinate of maximum as numpy array
    def get_move(self, x):
        input_array = board_to_input_array(x).unsqueeze(dim=0)
        with torch.set_grad_enabled(False):
            probs = self.forward(input_array).squeeze()
        return random.choice((probs == torch.max(probs)).nonzero().numpy())

In [4]:
ConvNet3()

ConvNet3(
  (net): Sequential(
    (0): Conv2d(2, 10, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): ReLU()
    (2): Conv2d(10, 10, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (3): ReLU()
    (4): Conv2d(10, 1, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  )
)

In [5]:
# test run to make sure output size is correct
ConvNet3()(torch.zeros((1,2,10,10))).size()

torch.Size([1, 1, 10, 10])

## Dataloader

We define a dataloader that reads the csv output by gomoku, and returns a combination of the board state and the move made by the greedy defender. We assume that the agent is playing black. When inputting white's moves we switch the two sides so the agent always sees the board as black.

In [6]:
class GomokuMoveDataset(Dataset):
    def __init__(self, size, files, transform=None):
        self.size = size
        self.files = files          # list of tuples: (filename, side)
        self.transform = transform

    def __len__(self):
        return len(self.files)

    def __getitem__(self, idx):
        filename, side = self.files[idx]
        with open(filename, 'r') as f:
            reader = csv.reader(f)
            csv_input = [list(map(int, row)) for row in reader]

        assert csv_input[0][0] == self.size
        base = 1 if side == 1 else self.size + 3
        stride = 2 * (self.size + 2)
        num_moves = int(np.ceil((len(csv_input) - base) / stride))

        # choose random move from csv
        move_idx = random.randint(0, num_moves-1)

        # get move/board state from the chosen move
        start = move_idx * stride + base
        side, move_row, move_column = csv_input[start]
        board = np.array(csv_input[start + 1: start + self.size + 1], np.int8)

        # board is state after the move, so remove this piece
        board[move_row, move_column] = 0

        # convert board to input array
        input_array = board_to_input_array(board, side)

        # convert move to 1xNxN target array
        move_array = torch.zeros((1,) + board.shape)
        move_array[0, move_row, move_column] = 1

        # apply transforms
        if self.transform is not None:
            input_array, move_array = self.transform(input_array, move_array)

        return input_array, move_array

We consider two augmentations during training: flip and rotate (by multiples of 90).

(Translate could also be an option, but we must be careful given that a combination cannot be continued past the edge of the board, meaning that a cropped region at the center is not exactly the same as a region on the edge. A 10x10 board seems too small for this difference to be ignored.)

In [7]:
class Compose(object):
    def __init__(self, transforms):
        self.transforms = transforms

    def __call__(self, image, target):
        for t in self.transforms:
            image, target = t(image, target)
        return image, target

class RandomHorizontalFlip(object):
    def __init__(self, prob=0.5):
        self.prob = prob

    def __call__(self, image, target):
        if random.random() < self.prob:
            image = TF.hflip(image)
            target = TF.hflip(target)
        return image, target

class RandomVerticalFlip(object):
    def __init__(self, prob=0.5):
        self.prob = prob

    def __call__(self, image, target):
        if random.random() < self.prob:
            image = TF.vflip(image)
            target = TF.vflip(target)
        return image, target

class RandomRotate(object):
    """Rotate by one of the given angles."""

    def __init__(self, angles=[0,90,180,270]):
        self.angles = angles

    def __call__(self, image, target):
        angle = random.choice(self.angles)
        image = TF.rotate(image, angle)
        target = TF.rotate(target, angle)
        return image, target

Train/val split to make sure that

1. Each game type appears with the same ratio
2. Moves from the same game do not leak

In [8]:
def train_val_split(directory, side=None, train_ratio=0.8, shuffle=True):

    files = glob.glob(os.path.join(directory, '*.csv'))
    if shuffle:
        random.shuffle(files)
    num_train = int(np.ceil(len(files) * train_ratio))

    if side is not None:
        tuples = [(x, side) for x in files]
        return tuples[:num_train], tuples[num_train:]

    # if side is None get both sides
    return (
        [(x, 1) for x in files[:num_train]] + [(x, 2) for x in files[:num_train]],
        [(x, 1) for x in files[num_train:]] + [(x, 2) for x in files[num_train:]]
    )

In [9]:
input_dirs = [
    ("../../data/s5_gd_random", 1),
    ("../../data/s5_gd_greedy", 1),
    ("../../data/s5_random_gd", 2),
    ("../../data/s5_greedy_gd", 2),
    ("../../data/s5_gd_gd", None)
]

train_files = []
val_files = []
for directory, side in input_dirs:
    new_train, new_val = train_val_split(directory, side)
    train_files += new_train
    val_files += new_val

len(train_files), len(val_files)

(320, 80)

In [10]:
train_transform = Compose([
    RandomHorizontalFlip(),
    RandomVerticalFlip(),
    RandomRotate()
])

train_set = GomokuMoveDataset(5, train_files, transform=train_transform)
train_loader = DataLoader(train_set, batch_size=1, shuffle=True)
val_set = GomokuMoveDataset(5, val_files)
val_loader = DataLoader(val_set, batch_size=1, shuffle=False)

Run this a couple of times to make sure data loading is done correctly
(Check that the move looks okay...!!)

In [11]:
next(iter(train_loader))

[tensor([[[[0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0.],
           [0., 1., 1., 0., 1.],
           [0., 0., 0., 0., 0.],
           [0., 0., 0., 1., 0.]],
 
          [[0., 0., 0., 1., 0.],
           [0., 0., 0., 1., 0.],
           [0., 0., 0., 0., 0.],
           [0., 1., 0., 1., 0.],
           [0., 0., 1., 0., 0.]]]]),
 tensor([[[[0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0.],
           [1., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0.]]]])]

In [12]:
next(iter(train_loader))

[tensor([[[[0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0.],
           [0., 0., 1., 0., 0.],
           [0., 1., 0., 0., 0.],
           [0., 0., 0., 0., 0.]],
 
          [[0., 0., 0., 0., 0.],
           [0., 1., 0., 0., 0.],
           [0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0.],
           [0., 0., 0., 1., 0.]]]]),
 tensor([[[[0., 0., 0., 0., 1.],
           [0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0.]]]])]

In [13]:
next(iter(train_loader))

[tensor([[[[0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0.]],
 
          [[0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0.]]]]),
 tensor([[[[0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0.],
           [0., 0., 1., 0., 0.],
           [0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0.]]]])]

All moves look like the greedy defender!

## Training

In [14]:
def train(model, data_loaders, criterion, optimizer, scheduler, num_epochs, output_dir='data'):

    os.makedirs(output_dir, exist_ok=True)
    start_time = time.time()

    for epoch in range(1, num_epochs+1):
        print('=' * 20)
        print(f'Epoch {epoch}/{num_epochs}')

        for phase in ['train', 'val']:
            if phase == 'train':
                print('-' * 5 + 'Training' + '-' * 5)
                scheduler.step()
                model.train()
            else:
                print('-' * 5 + 'Validation: ' + '-' * 5)
                model.eval()

            running_loss = 0.0
            num_inputs = 0
            for i, data in enumerate(data_loaders[phase], 1):
                inputs, targets = data
                num_inputs += inputs.size(0)

                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    loss = criterion(outputs, targets)
                    if phase == 'train':
                        optimizer.zero_grad()
                        loss.backward()
                        optimizer.step()

                running_loss += loss.item() * inputs.size(0)
                if i % 500 == 0:
                    cur_loss = running_loss / num_inputs
                    print(f"Iteration: {i}, Loss: {cur_loss:.8f}")

            epoch_loss = running_loss / num_inputs
            print(f'{phase} Loss: {epoch_loss:.8f}')

            if epoch % 100 == 0:
                torch.save(model.state_dict(), os.path.join(output_dir, f"model_{epoch:04d}.pth"))

    print('=' * 20)
    print('Finished Training')
    print(f'Time: {int(time.time() - start_time)} seconds')

In [15]:
model = ConvNet3()

data_loaders = {"train": train_loader, "val": val_loader}
criterion = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=1)
scheduler = lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.5)

train(
    model, data_loaders, criterion, optimizer, scheduler, num_epochs=200,
    output_dir='data/s5_gd_convnet3')

---Validation: -----
val Loss: 0.12956196
Epoch 36/200
-----Training-----
train Loss: 0.13466151
-----Validation: -----
val Loss: 0.13588313
Epoch 37/200
-----Training-----
train Loss: 0.13227578
-----Validation: -----
val Loss: 0.14387297
Epoch 38/200
-----Training-----
train Loss: 0.13430732
-----Validation: -----
val Loss: 0.12751177
Epoch 39/200
-----Training-----
train Loss: 0.13726495
-----Validation: -----
val Loss: 0.14312329
Epoch 40/200
-----Training-----
train Loss: 0.14004211
-----Validation: -----
val Loss: 0.13026581
Epoch 41/200
-----Training-----
train Loss: 0.13998818
-----Validation: -----
val Loss: 0.13256169
Epoch 42/200
-----Training-----
train Loss: 0.14060909
-----Validation: -----
val Loss: 0.13964979
Epoch 43/200
-----Training-----
train Loss: 0.13745467
-----Validation: -----
val Loss: 0.13315711
Epoch 44/200
-----Training-----
train Loss: 0.13889622
-----Validation: -----
val Loss: 0.13873381
Epoch 45/200
-----Training-----
train Loss: 0.13478337
-----Validat

## Inference

In [16]:
model = ConvNet3()
model.load_state_dict(torch.load('data/s5_gd_convnet3/model_0100.pth'))

<All keys matched successfully>

In [17]:
board = np.zeros((5, 5))
model.get_move(board)

array([2, 2])

In [18]:
board = np.zeros((5, 5))
board[2,2] = 1
board[4,0] = 2
board[0,0] = 1
board[4,1] = 2
board[3,4] = 1
board[4,2] = 2
board

array([[1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 1.],
       [2., 2., 2., 0., 0.]])

In [19]:
model.get_move(board)

array([4, 3])

In [20]:
board = np.zeros((5, 5))
board[0,0] = 1
board[4,4] = 2
board[0,2] = 1
board[2,4] = 2
board[3,1] = 1
board[1,2] = 2
board

array([[1., 0., 1., 0., 0.],
       [0., 0., 2., 0., 0.],
       [0., 0., 0., 0., 2.],
       [0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 2.]])

In [21]:
model.get_move(board)

array([1, 4])

Looks like the greedy defender!

## Test run against baseline

In [22]:
import pandas as pd
from tqdm import trange

from gomoku.agent.base import BaseAgent, RandomAgent
from gomoku.agent.greedy import GreedyAgent, GreedyDefendingAgent
from gomoku.manager import GameManager

class SimpleInferenceAgent(BaseAgent):
    def __init__(self, quiet=False):
        super().__init__()
        self.__quiet = quiet

    def move(self, board):
        move = model.get_move(board.get_board())
        if board.is_valid_move(move):
            return move

        if not self.__quiet:
            print(f'Model chose invalid move ({move[0]}, {move[1]}): make random move')
        return random.choice(board.get_valid_moves())

In [23]:
game = GameManager(size=5)
agent1 = SimpleInferenceAgent()
agent2 = RandomAgent()
game.run_game_custom(agent1, agent2)

[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]
[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]
Placed BLACK at [2 2]
[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 1 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]
Placed WHITE at [0 3]
[[0 0 0 2 0]
 [0 0 0 0 0]
 [0 0 1 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]
Placed BLACK at [0 4]
[[0 0 0 2 1]
 [0 0 0 0 0]
 [0 0 1 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]
Placed WHITE at [1 3]
[[0 0 0 2 1]
 [0 0 0 2 0]
 [0 0 1 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]
Placed BLACK at [2 3]
[[0 0 0 2 1]
 [0 0 0 2 0]
 [0 0 1 1 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]
Placed WHITE at [4 3]
[[0 0 0 2 1]
 [0 0 0 2 0]
 [0 0 1 1 0]
 [0 0 0 0 0]
 [0 0 0 2 0]]
Placed BLACK at [2 1]
[[0 0 0 2 1]
 [0 0 0 2 0]
 [0 1 1 1 0]
 [0 0 0 0 0]
 [0 0 0 2 0]]
Placed WHITE at [3 0]
[[0 0 0 2 1]
 [0 0 0 2 0]
 [0 1 1 1 0]
 [2 0 0 0 0]
 [0 0 0 2 0]]
Placed BLACK at [2 0]
[[0 0 0 2 1]
 [0 0 0 2 0]
 [1 1 1 1 0]
 [2 0 0 0 0]
 [0 0 0 2 0]]
Placed WHITE at [3 1]
[[0 0 0 2 1]
 [0 0 0 2 0]
 [1 1 1 1 0]
 [2 2 0 0 0]
 [

<Side.BLACK: 1>

In [24]:
game.run_game_custom(agent1, agent2)

[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]
Placed BLACK at [2 2]
[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 1 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]
Placed WHITE at [3 3]
[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 1 0 0]
 [0 0 0 2 0]
 [0 0 0 0 0]]
Placed BLACK at [2 3]
[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 1 1 0]
 [0 0 0 2 0]
 [0 0 0 0 0]]
Placed WHITE at [1 4]
[[0 0 0 0 0]
 [0 0 0 0 2]
 [0 0 1 1 0]
 [0 0 0 2 0]
 [0 0 0 0 0]]
Placed BLACK at [2 1]
[[0 0 0 0 0]
 [0 0 0 0 2]
 [0 1 1 1 0]
 [0 0 0 2 0]
 [0 0 0 0 0]]
Placed WHITE at [3 1]
[[0 0 0 0 0]
 [0 0 0 0 2]
 [0 1 1 1 0]
 [0 2 0 2 0]
 [0 0 0 0 0]]
Placed BLACK at [2 0]
[[0 0 0 0 0]
 [0 0 0 0 2]
 [1 1 1 1 0]
 [0 2 0 2 0]
 [0 0 0 0 0]]
Placed WHITE at [2 4]
[[0 0 0 0 0]
 [0 0 0 0 2]
 [1 1 1 1 2]
 [0 2 0 2 0]
 [0 0 0 0 0]]
Model chose invalid move (2, 2): make random move
Placed BLACK at [0 3]
[[0 0 0 1 0]
 [0 0 0 0 2]
 [1 1 1 1 2]
 [0 2 0 2 0]
 [0 0 0 0 0]]
Placed WHITE at [1 3]
[[0 0 0 1 0]
 [0 0 0 2 2]
 [1 1 1 1 2]
 [0 2 0 2 0]
 [0 0 0 0 0]]
Mode

<Side.BLACK: 1>

In [25]:
baselines = {
    "random": RandomAgent(),
    "greedy": GreedyAgent(),
    "greedy_defender": GreedyDefendingAgent()
}

def evaluate_agent(game: GameManager, agent: BaseAgent, agent_name: str, runs: int = 100):

    results_df = pd.DataFrame(columns=['black', 'white', 'wins (black)', 'wins (white)', 'ties'])

    def run_agents(agent1, name1, agent2, name2):
        print(f"Running {name1} against {name2}")
        time.sleep(0.5)

        # array containing results: dummy, agent1 (1), agent2 (2), tie (-1)
        results = [0] * 4
        for _ in trange(runs):
            winner = game.run_game_custom(agent1, agent2)
            results[winner.value] += 1

        print(f"Results: black({name1}): {results[1]}, white({name2}): {results[2]}, ties: {results[3]}")
        return [name1, name2] + results[1:]

    for baseline_name, baseline in baselines.items():
        results_df.loc[len(results_df)] = run_agents(baseline, baseline_name, agent, agent_name)
    for baseline_name, baseline in baselines.items():
        results_df.loc[len(results_df)] = run_agents(agent, agent_name, baseline, baseline_name)

    return results_df

In [26]:
df = evaluate_agent(GameManager(size=5, quiet=True), SimpleInferenceAgent(quiet=True), "inference")

Running random against inference
100%|██████████| 100/100 [00:01<00:00, 74.54it/s]
Results: black(random): 3, white(inference): 52, ties: 45
Running greedy against inference
100%|██████████| 100/100 [00:04<00:00, 21.80it/s]
Results: black(greedy): 33, white(inference): 39, ties: 28
Running greedy_defender against inference
100%|██████████| 100/100 [00:07<00:00, 12.72it/s]
Results: black(greedy_defender): 32, white(inference): 0, ties: 68
Running inference against random
100%|██████████| 100/100 [00:01<00:00, 95.70it/s]
Results: black(inference): 92, white(random): 0, ties: 8
Running inference against greedy
100%|██████████| 100/100 [00:03<00:00, 28.05it/s]
Results: black(inference): 90, white(greedy): 7, ties: 3
Running inference against greedy_defender
100%|██████████| 100/100 [00:08<00:00, 12.33it/s]Results: black(inference): 0, white(greedy_defender): 29, ties: 71



In [27]:
df

Unnamed: 0,black,white,wins (black),wins (white),ties
0,random,inference,3,52,45
1,greedy,inference,33,39,28
2,greedy_defender,inference,32,0,68
3,inference,random,92,0,8
4,inference,greedy,90,7,3
5,inference,greedy_defender,0,29,71


The agent does better than greedy, which is a good sign, although it cannot beat the greedy defender.

Since the model is fully convolutional, we can apply the model to boards of a different size. (This may not work well if the agent was not trained for the size.)

In [28]:
df_10 = evaluate_agent(GameManager(size=10, quiet=True), SimpleInferenceAgent(quiet=True), "inference")

Running random against inference
100%|██████████| 100/100 [00:02<00:00, 48.83it/s]
Results: black(random): 1, white(inference): 99, ties: 0
Running greedy against inference
100%|██████████| 100/100 [00:18<00:00,  5.31it/s]
Results: black(greedy): 99, white(inference): 1, ties: 0
Running greedy_defender against inference
100%|██████████| 100/100 [00:40<00:00,  2.47it/s]
Results: black(greedy_defender): 100, white(inference): 0, ties: 0
Running inference against random
100%|██████████| 100/100 [00:01<00:00, 84.70it/s]
Results: black(inference): 100, white(random): 0, ties: 0
Running inference against greedy
100%|██████████| 100/100 [00:13<00:00,  7.32it/s]
Results: black(inference): 0, white(greedy): 100, ties: 0
Running inference against greedy_defender
100%|██████████| 100/100 [00:26<00:00,  3.76it/s]Results: black(inference): 0, white(greedy_defender): 100, ties: 0



In [29]:
df_10

Unnamed: 0,black,white,wins (black),wins (white),ties
0,random,inference,1,99,0
1,greedy,inference,99,1,0
2,greedy_defender,inference,100,0,0
3,inference,random,100,0,0
4,inference,greedy,0,100,0
5,inference,greedy_defender,0,100,0
