I almost always lose in games. If I am playing a game and you bet that I am going to lose, well, the odds are in your favour. So, I decided to build an AI that almost never loses in a game of Connect 4. Now, if you're like me, you probably knew the game but did not know the name. So, here goes:
* There are two players in a game with alternating turns.
* Each turn consists of a player dropping a disc into the game.
* The player who has four discs together, either horizontally, or vertically, or diagonally, wins.

In [None]:
import numpy as np
import random

In [None]:
from kaggle_environments import make, evaluate
env = make("connectx", debug=True)

print(list(env.agents))

Now, if you still don't know how the game works (understandable), let's make two random agents play the game. Just know that these agents are completely random and will play the game extremely poorly. They have no presence of mind and either of the two, if wins, does so by fluke. This is just to get acquainted with the game.

In [None]:
env.run(["random", "random"])

# Show the game
env.render(mode="ipython")

So now that we have a hang of the game, let's go ahead and make some other agents.
* one will be random as before
* one will always choose the middle column in the grid
* one will always choose the leftmost available column in the grid.

In [None]:
def agent_random(obs, config):
    valid_moves = [col for col in range(config.columns) if obs.board[col] == 0]
    return random.choice(valid_moves)

def agent_middle(obs, config):
    return config.columns//2

def agent_leftmost(obs, config):
    valid_moves = [col for col in range(config.columns) if obs.board[col] == 0]
    return valid_moves[0]

Left-most agent versus random agent:

In [None]:
env.run([agent_leftmost, agent_random])
env.render(mode = "ipython")

Left-most agent versus middle agent:

In [None]:
env.run([agent_leftmost, agent_middle])
env.render(mode = "ipython")

And finally, middle agent versus random agent:

In [None]:
env.run([agent_middle, agent_random])
env.render(mode = "ipython")

So, clearly, all the agents are, as a matter of fact, quite dumb. So it's safe to say that they play the game exactly like me. You'll see how bad I play later in the notebook, it isn't that better than these agents. But, we need to make these agents better. So, I'll first use a Rudimentary method of rules as a manœuvre.

### Making the Agent Make Sense

Before this, the agent did not know what to do at all. It just dropped the discs where it was told to. Now the agent will be programmed in such a way that, if a move is present which assures the agent that it will win, it ought to select that move. Otherwise, it ought to keep it random. Furthermore, if there are multiple moves which hint towards a win, the agent ought to choose either of them randomly.

In [None]:
def my_agent(obs, config):
    valid_moves = [col for col in range(config.columns) if obs.board[col] == 0]
    for col in valid_moves:
        if check_winning_move(obs, config, col, obs.mark):
            return col
    for col in valid_moves:
        if check_winning_move(obs, config, col, obs.mark%2+1):
            return col
    return random.choice(valid_moves)

However, clearly just doing that will not suffice. The agent must prevent the other player (me) from winning, which shouldn't be a difficult task for a human but the AI is dumb. So, whenever it can, it should block any three discs of the opponent by placing its own disc in place of the apparent fourth disc.

In [None]:
# Gets board at next step if agent drops piece in selected column
def drop_piece(grid, col, piece, config):
    next_grid = grid.copy()
    for row in range(config.rows-1, -1, -1):
        if next_grid[row][col] == 0:
            break
    next_grid[row][col] = piece
    return next_grid

# Returns True if dropping piece in column results in game win
def check_winning_move(obs, config, col, piece):
    # Convert the board to a 2D grid
    grid = np.asarray(obs.board).reshape(config.rows, config.columns)
    next_grid = drop_piece(grid, col, piece, config)
    # horizontal
    for row in range(config.rows):
        for col in range(config.columns-(config.inarow-1)):
            window = list(next_grid[row,col:col+config.inarow])
            if window.count(piece) == config.inarow:
                return True
    # vertical
    for row in range(config.rows-(config.inarow-1)):
        for col in range(config.columns):
            window = list(next_grid[row:row+config.inarow,col])
            if window.count(piece) == config.inarow:
                return True
    # positive diagonal
    for row in range(config.rows-(config.inarow-1)):
        for col in range(config.columns-(config.inarow-1)):
            window = list(next_grid[range(row, row+config.inarow), range(col, col+config.inarow)])
            if window.count(piece) == config.inarow:
                return True
    # negative diagonal
    for row in range(config.inarow-1, config.rows):
        for col in range(config.columns-(config.inarow-1)):
            window = list(next_grid[range(row, row-config.inarow, -1), range(col, col+config.inarow)])
            if window.count(piece) == config.inarow:
                return True
    return False

Let's play! The turquoise discs belong to the agent mine are the white discs.

In [None]:
from kaggle_environments import evaluate, make, utils

env = make("connectx", debug=True)
env.play([my_agent, None], width=500, height=450)

In [None]:
env.render(mode = "ipython", width=500, height=450)

So while the intelligence of the AI has increased a lot by just using some rudimentary methods, I still won, which means it is a bad AI. However, it has mastered blocking me from winning: when I have three discs together it puts its own disc in the location of the fourth apparent disc. The only reason it lost was because it had no other choice. So, not bad.

### One-Step Look-Ahead Heuristics

Now, what if we make the AI look at all the consequences of each and every move even before we start playing? These are known as look-ahead heuristics since the AI is essentially looking ahead of the present state, it is assessing all the future possiblitites and how they can affect its gameplay. While that seems pretty far-fetched, it is not impossible. But to keep my sanity intact, I will use a one-step look-ahead, i.e. the agent can look one steap ahead of me for each and every move that is possible in the future of the gameplay.

In [None]:
def score_move(grid, col, mark, config):
    next_grid = drop_piece(grid, col, mark, config)
    score = get_heuristic(next_grid, mark, config)
    return score

def drop_piece(grid, col, mark, config):
    next_grid = grid.copy()
    for row in range(config.rows-1, -1, -1):
        if next_grid[row][col] == 0:
            break
    next_grid[row][col] = mark
    return next_grid

def get_heuristic(grid, mark, config):
    num_threes = count_windows(grid, 3, mark, config)
    num_fours = count_windows(grid, 4, mark, config)
    num_threes_opp = count_windows(grid, 3, mark%2+1, config)
    score = num_threes - 1e2*num_threes_opp + 1e6*num_fours
    return score

def check_window(window, num_discs, piece, config):
    return (window.count(piece) == num_discs and window.count(0) == config.inarow-num_discs)
    
def count_windows(grid, num_discs, piece, config):
    num_windows = 0
    # horizontal
    for row in range(config.rows):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[row, col:col+config.inarow])
            if check_window(window, num_discs, piece, config):
                num_windows += 1
    # vertical
    for row in range(config.rows-(config.inarow-1)):
        for col in range(config.columns):
            window = list(grid[row:row+config.inarow, col])
            if check_window(window, num_discs, piece, config):
                num_windows += 1
    # positive diagonal
    for row in range(config.rows-(config.inarow-1)):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[range(row, row+config.inarow), range(col, col+config.inarow)])
            if check_window(window, num_discs, piece, config):
                num_windows += 1
    # negative diagonal
    for row in range(config.inarow-1, config.rows):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[range(row, row-config.inarow, -1), range(col, col+config.inarow)])
            if check_window(window, num_discs, piece, config):
                num_windows += 1
    return num_windows

At this point, I start to treat my agent more or less like a pet dog: "Good dog" when it does the trick, "bad dog", when it doesn't. What this means is that the agent gets 1 million points if it gets all four of its discs together in a row, or a column or a diagonal; it gets 1 point if it gets three discs together like so and needs only one to win, and -100 if I can manage to get three of my discs together. That gives it enough incentive to make me lose.

In [None]:
def agent(obs, config):
    valid_moves = [c for c in range(config.columns) if obs.board[c] == 0]
    grid = np.asarray(obs.board).reshape(config.rows, config.columns)
    scores = dict(zip(valid_moves, [score_move(grid, col, obs.mark, config) for col in valid_moves]))
    max_cols = [key for key in scores.keys() if scores[key] == max(scores.values())]
    return random.choice(max_cols)

In [None]:
env = make("connectx", debug=True)
env.play([agent, None], width=500, height=450)

White discs are mine in the game.

In [None]:
env.render(mode = "ipython", width=500, height=450)

So, the AI did trap me quite well and won. But I'm not satisfied. I think this was more telling of how I bad I played than how good the AI got. I want the AI to play mind-games with me and make me question everything.

Hence, a little change in heuristics. The agent gets 1 million points if it has four discs together, 100 points if it has three discs together, 1 point if it has two discs together, -1 point if I have two discs together and -1000 points if I have three discs together. This way, the stakes are high and the AI has more to lose if I win.

In [None]:
def get_heuristic_extra(grid, col, mark, config):
    num_twos = count_windows(grid, 2, mark, config)
    num_threes = count_windows(grid, 3, mark, config)
    num_fours = count_windows(grid, 4, mark, config)
    num_twos_opp = count_windows(grid, 2, mark%2+1, config)
    num_threes_opp = count_windows(grid, 3, mark%2+1, config)
    score = A*num_fours + B*num_threes + C*num_twos + D*num_twos_opp + E*num_threes_opp
    return score

In [None]:
def score_move_extra(grid, col, mark, config):
    next_grid = drop_piece(grid, col, mark, config)
    score = get_heuristic_extra(next_grid, col, mark, config)
    return score

In [None]:
A = 1e6
B = 1e2
C = 1
D = -1
E = -1e3

In [None]:
def agent_extra(obs, config):
    valid_moves = [c for c in range(config.columns) if obs.board[c] == 0]
    grid = np.asarray(obs.board).reshape(config.rows, config.columns)
    scores = dict(zip(valid_moves, [score_move_extra(grid, col, obs.mark, config) for col in valid_moves]))
    max_cols = [key for key in scores.keys() if scores[key] == max(scores.values())]
    return random.choice(max_cols)

In [None]:
env = make("connectx", debug=True)
env.play([agent_extra, None], width=500, height=450)

In [None]:
env.render(mode = "ipython", width=500, height=450)

And there you go. The AI trapped me, played mind games with me, and won. And I'm still the loser.