## Connect Three 
### Skeleton Code
Here is an example which adapts the minimax code from the noughts and crosses demo to work on the provided connect 3 code. This is not a “model” solution – it was just quickly put together as a quick demo. You may wish to use this as a base to add some of the aforementioned extensions.

Using minimax with alpha-beta pruning is sufficient to play quite quickly when the AI is playing second. If you ask it to play first, it will take a long time, but it is playable. Warning: the game is a forced win for the first player when playing optimally – prepare to lose if you play second!

If you increase the size of the grid, or make the number of pieces to connect higher than 3, it goes extremely slowly even in the second player slot.

The code would benefit from many relatively simple optimisations. Caching the result of each move (memoisation) would be a good start, though this is really just a programming trick, not a new AI technique.

Similarly, one of the reasons it runs slowly is because it has to clone the original game every time it tests a move. If the AI used its own representation of the board and rules it could optimise the depth-first search to save a lot of memory. Copying memory is time expensive too, so this would also speed up execution.

But the fun part of this activity would be to try one of the other optimisations, such as a heuristic function to evaluate the board and fix the recursion depth, or a table lookup that pre-computes endgame positions before the code even runs. Have fun if you decide to try!

### Explanation
There is a lot of code below, but it hardly any more complicated than the alpha-beta pruning code from before. Try to understand that before you continue.

First new thing is a bunch of helper code so that you can actually play the game interactively when you run the cell. As with any program: it's easiest to work out what's going on if you read the code in the order it is executed. In this case start at the bottom and work up. Just a few helper functions to present a minimal text-based UI.

A few modifications are made to the minimax code as well.

As mentioned, the line `copy.deepcopy(game)` is added so that it could test each action without modifying the original game.

The `connect.py` code returns `reward = -1` if player one wins, and `reward = 1` if player two wins – but the minimax code always assumes it is the "max" player. We abuse the fact that whenever the AI makes a move and the game ends, we know that move resulted in a win or a draw for that player. By taking `abs(reward)` we force it to be `1`, then multiply it by `-1` if we were actually simulating a "min" player move at that point in time.

Another interesting thing about connect 3 on this small board is that if player 1 goes in the middle space then they are guaranteed to win. The minimax code simply generates a value for each action, so in for player 2 following this move, all of its actions will return a reward of `-1` – if player 1 plays optimally they *will* win. If moves are tied for the best value it will play randomly between them. But this leads to odd looking play, because the AI won't block 2 in a row, even if it hasn't lost yet.

In this situation, it looks like the `o` player should play in column 3, but all moves will lead to losses against an optimal `x` player.
```
[[' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ']
 ['o' ' ' 'x' ' ' 'x']]
```

To make the AI play more human-like, we modify the reward from the game, and divide by the turn number. So, if a move will result in an immediate loss (anything other than column 3 on the board above) then it gets a score of `-1`. But if it would result in a loss in 2 moves time, it will get a value of `-0.5`, which is better! This simple modification means the AI tries to hang on as long as possible, and if player 1 plays badly, it could even turn the tables.

This means that if it can lose in multiple ways on a single move, it might still choose not to block any of them!

```
[[' ' ' ' 'o' ' ' ' ']
 [' ' 'x' 'x' ' ' ' ']
 ['o' 'x' 'x' 'o' ' ']]
```

At this point all columns are equally instant-lose moves for the AI playing as `o`, so it might move in the far right column even though it looks like a silly move – consider it a resignation! (Maybe it would be interesting to try to "improve" this behaviour through another modification of the reward?)

### Code

In [None]:
import connect
import math
import copy
import random
from abc import ABC, abstractmethod


class Agent(ABC):
    @abstractmethod
    def next_move(self, state):
        pass


class HumanAgent(Agent):
    def next_move(self, state):
        while True:
            try:
                print("What's your next move? Available columns:")
                print(state.available_actions)
                move = input("> ").strip()
                move = int(move)
                if move not in state.available_actions:
                    print("Invalid column.")
                else:
                    return move
            except ValueError:
                print(f"Please enter valid column from: {state.available_actions}")


class ConnectAgent(Agent):
    def __init__(self, verbose=False):
        self.verbose = verbose

    def next_move(self, game):
        print("\nThe AI is thinking.", end="")

        best_action = []
        best_value = -1 * math.inf
        for action in game.available_actions:
            # important: take a copy of the entire game so we don't break things
            # memory intensive, but saves us rewriting the game
            game_copy = copy.deepcopy(game)
            game_copy._verbose = False

            reward, game_over = game_copy.act(action)

            if game_over:
                # this code always tries to maximise the score
                # but the game returns reward for player 2 ('o')
                # if our move ended the game, then it was either win or draw
                # so absolute value will ensure we get 0 or 1 no matter if we are p1 or p2
                action_value = abs(reward)
            else:
                # our move didn't end the game, so recurse
                action_value = self.get_value(game_copy, get_min=True)
            print(".", end="", flush=True)

            if self.verbose:
                print(action_value, end=" ")
            if action_value > best_value:
                best_action = [action]
                best_value = action_value
            if action_value == best_value:
                best_action.append(action)

        if self.verbose:
            print()
        print("\n")

        return random.choice(best_action)

    def get_value(self, game, get_min, alpha=-math.inf, beta=math.inf, turn=2):
        """If get_min is set to true, returns the minimum value, otherwise the maximum value"""

        best_value = math.inf
        if not get_min:
            best_value *= -1

        for action in game.available_actions:
            game_copy = copy.deepcopy(game)

            reward, game_over = game_copy.act(action)

            if game_over:
                # there is an explanation for diving by the turn number above!
                action_value = abs(reward)/turn
                if get_min:
                    # force the reward to be -1 if this was a winning move while looking to minimise
                    # but keep it as zero if it was a draw
                    action_value *= -1
            else:
                action_value = self.get_value(game_copy, get_min=not get_min, alpha=alpha, beta=beta, turn=turn+1)

            if not get_min:
                alpha = max(alpha, action_value)
                if action_value >= beta:
                    return action_value
            else:
                beta = min(beta, action_value)
                if action_value <= alpha:
                    return action_value

            if not get_min and action_value > best_value \
                    or get_min and action_value < best_value:
                best_value = action_value

        return best_value


def run_game(game, player1=HumanAgent(), player2=ConnectAgent()):
    game_over = False
    while not game_over:
        move = player1.next_move(game)
        reward, game_over = game.act(move)

        # reward is in terms of player 2, 'o'
        if game_over and reward == -1:
            print("Player one wins!")
            return
        elif game_over and reward == 0:
            print("It's a draw.")
            return

        move = player2.next_move(game)
        reward, game_over = game.act(move)

        if game_over and reward == 1:
            print("Player two wins!")
            return
        elif game_over and reward == 0:
            print("It's a draw.")
            return


def yes_no_input(text, prompt="> "):
    print(text + " (y/n)")
    response = input(prompt).strip()
    while response not in ['y', 'n']:
        print("Please enter y for yes or n for no.")
        print(text)
        response = input(prompt).strip()
    return response == "y"


def play():
    cols = 5
    rows = 3
    n = 3

    print(f"Let's play connect {n}!")
    game = connect.Connect(num_cols=cols, num_rows=rows, num_connect=n, verbose=True)
    again = True
    while again:
        response = yes_no_input("Would you like to play first?")
        if response:
            run_game(game=game, player1=HumanAgent(), player2=ConnectAgent())
        else:
            run_game(game=game, player1=ConnectAgent(), player2=HumanAgent())

        again = yes_no_input("Would you like to play again?")
        if again:
            game.reset()


play()