# Connect 4

In [None]:
import copy
import itertools
import random
from collections import namedtuple
import numpy as np
import math

## Goals

In this assignment, you are going to design an AI bot for solving connect 4. We have two goals:

1. Complete the code of alpha beta depth limited minimax search. 
2. Write a better evaluation function. 

## Step 1: Build the class of connect 4

You should read the code and comments in this step. But you should not change anything in the code, unless you want to insert debugging information. 

### Step 1.1: Define state

In the code cell below, we define the state using namedtuple, instead of using a class. Namedtuple is similar to a class. To learn more about namedtuple, go to [here](https://www.geeksforgeeks.org/namedtuple-in-python/).

In [None]:
"""
- to_move is the player who is going to make a move in current state. 
  to_move is initialized as X, because X moves first. 

  Please notice that to_move is not same as player in the minimax code below,
  where player is the to_move at the root of the game tree. 
  
- utility is updated when a new state is created. It is 1 for win, -1 for lose,
  and 0 for tie or any non-terminal state. 
  
  Please notice that the +1 or -1 value of state.utility refers to X. 
  
- board is a dictionary with pair of (row, col): player. 

- moves are avaiable moves or actions at current state. 
"""
GameState = namedtuple('GameState', 'to_move, utility, board, moves')

### Step 1.2: Define Connect 4

We define the class of connect 4 following this inheritance path:

Game -> TicTacToe -> ConnectFour

We view Connect 4 as a generalized game based on TicTacToe.

In [None]:
class Game:
    """A game is similar to a searching problem, but it has a utility for each
    state and a terminal test instead of a path cost and a goal
    test. To create a game, subclass this class and implement actions,
    result, utility, and terminal_test. You may override display and
    successors or you can inherit their default methods. You will also
    need to set the .initial attribute to the initial state; this can
    be done in the constructor."""

    def actions(self, state):
        """Return a list of the allowable moves at this point."""
        raise NotImplementedError

    def result(self, state, move):
        """Return the state that results from making a move from a state."""
        raise NotImplementedError

    def utility(self, state, player):
        """Return the value of this final state to player."""
        raise NotImplementedError

    def terminal_test(self, state):
        """Return True if this is a final state for the game."""
        return not self.actions(state)

    def to_move(self, state): # return the player of current state 
        """Return the player whose move it is in this state."""
        return state.to_move

    def display(self, state):
        """Print or otherwise display the state."""
        print(state)

    def __repr__(self):
        return '<{}>'.format(self.__class__.__name__)


class TicTacToe(Game):
    """Play TicTacToe on an h x v board, with Max (first player) playing 'X'.
    A state has the player to move, a cached utility, a list of moves in
    the form of a list of (x, y) positions, and a board, in the form of
    a dict of {(x, y): Player} entries, where Player is 'X' or 'O'."""

    def __init__(self, h=3, v=3, k=3):
        self.h = h
        self.v = v
        self.k = k
        # Initially, all coordinates are available moves/actions. 
        moves = [(x, y) for x in range(1, h + 1)
                 for y in range(1, v + 1)]
        # Initially, X moves first, board dictionary is empty, and utility is 0. 
        self.initial = GameState(to_move='X', utility=0, board={}, moves=moves)

    def actions(self, state):
        """Legal moves are any square not yet taken."""
        return state.moves

    def result(self, state, move):
        if move not in state.moves:
            return state  # Illegal move has no effect
        board = state.board.copy()
        board[move] = state.to_move
        moves = list(state.moves)
        moves.remove(move) # delete the next move from action list. 
        return GameState(to_move=('O' if state.to_move == 'X' else 'X'), # switch player for deciding the next move
                         utility=self.compute_utility(board, move, state.to_move),
                         board=board, moves=moves)

    def utility(self, state, player):
        """Return the value to player; 1 for win, -1 for loss, 0 otherwise.
        If player is X (goes first), return the state utility because it is calculated based on X.
        If player is O, return the opposite value. 
        """
        return state.utility if player == 'X' else -state.utility

    def terminal_test(self, state):
        """A state is terminal if it is won or there are no empty squares."""
        return state.utility != 0 or len(state.moves) == 0

    def display(self, state):
        board = state.board
        if self.__class__.__name__ == 'ConnectFour': # print last row at first
            for x in range(self.h, 0 , -1):
                for y in range(1, self.v + 1):
                    print(board.get((x, y), '.'), end=' ')
                print()
        else: # print for tic-tac-toe
            for x in range(1, self.h + 1):
                for y in range(1, self.v + 1):
                    print(board.get((x, y), '.'), end=' ')
                print()

    def compute_utility(self, board, move, player):
        """If 'X' wins with this move, return 1; if 'O' wins return -1; else return 0."""
        if (self.k_in_row(board, move, player, (0, 1)) or
                self.k_in_row(board, move, player, (1, 0)) or
                self.k_in_row(board, move, player, (1, -1)) or
                self.k_in_row(board, move, player, (1, 1))):
            return +1 if player == 'X' else -1
        else:
            return 0

    def k_in_row(self, board, move, player, delta_x_y):
        """Return true if there is a line through move on board for player."""
        (delta_x, delta_y) = delta_x_y
        x, y = move
        n = 0  # n is number of moves in row
        while board.get((x, y)) == player:
            n += 1
            x, y = x + delta_x, y + delta_y
        x, y = move
        while board.get((x, y)) == player:
            n += 1
            x, y = x - delta_x, y - delta_y
        n -= 1  # Because we counted move itself twice
        return n >= self.k


class ConnectFour(TicTacToe):
    """A TicTacToe-like game in which you can only make a move on the bottom
    row, or in a square directly above an occupied square.  Traditionally
    played on a 7x6 board and requiring 4 in a row."""

    def __init__(self, h=6, v=7, k=4):
        TicTacToe.__init__(self, h, v, k)

    def actions(self, state): 
        # x is row, y is column
        # up to 7 actions on 7 columns. 
        return [(x, y) for (x, y) in state.moves
                if x == 1 or (x-1, y) in state.board]


## Step 2: Play the Game

Define the game loop. You do not need to do anything in the function below. We'll call this function to play the game one bot vs another bot. 

In [None]:
"""
strategies are the AI bot functions.
*strategies means it could receive various number of bots.

Turn on verbose if you want to see the step-by-step board updates.

kwargs_list receives addtional information instead of using default values in the bots. 
"""
def play_game(game, *strategies, verbose = True, kwargs_list=None): 
        """Play an n-person, move-alternating game."""
        state = game.initial
        rounds = 0
        while True:
            rounds += 1
            if verbose: print("\nRound ", rounds, ":")

            # Each bot has its own eval function eval_fn. 
            # Name the one from first bot as eval_X and second one as eval_O.
            if kwargs_list:
                if 'eval_fn' in kwargs_list[0]:
                    kwargs_list[0]['eval_X']  = kwargs_list[0]['eval_fn']
                    del kwargs_list[0]['eval_fn']
                else:
                    kwargs_list[0]['eval_X']  = default_eval
                
                if 'eval_fn' in kwargs_list[1]:
                    kwargs_list[1]['eval_O']  = kwargs_list[1]['eval_fn']
                    del kwargs_list[1]['eval_fn']
                else:
                    kwargs_list[1]['eval_O']  = default_eval
                    
            strategy_id = 0
            for strategy in strategies:
                # call bot to choose the next move. 
                if kwargs_list: # if the bot has non-default arguments 
                    move = strategy(game, state, **kwargs_list[strategy_id]) 
                    strategy_id += 1
                else:
                    move = strategy(game, state) 
                
                if verbose: 
                    print(state.to_move, " moves on ", move)
                
                state = game.result(state, move) 
                
                if verbose: 
                    game.display(state)
                    
                if game.terminal_test(state): # if game is over
                    # self.to_move(self.initial) is the player of the first move. It sets as X in init. 
                    player = game.to_move(game.initial)
                    result = game.utility(state, player)
                    print("Number of Rounds: ", rounds)
                    if result == 1:
                        print(player, " wins the game.")
                    elif result == 0:
                        print("Tie")
                    else:
                        print(player, " loses the game.")
                    return rounds # end the game

Here we define two toy bots. The first one is actually not a bot. The human plays the game.

The second one is a random player, who randomly choose an action to play. 

In [5]:

# A human player who types the action at each turn
def human_player(game, state, **kwargs):
    while True:
        pos_raw = input("Please input a new position as row, column like 1,1: ")
        x,y = [int(i) for i in pos_raw.split(',')]
        if (x,y) not in game.actions(state):
            print("Please input an available position!")
        else:
            return (x,y)

def random_player(game, state, **kwargs):
    """A player that chooses a legal move at random."""
    return random.choice(game.actions(state)) if game.actions(state) else None


Now we can play the game! Run the code below to play the game with the random player. You go first.

In [6]:
game = ConnectFour()
play_game(game, random_player, random_player)


Round  1 :
X  moves on  (1, 2)
. . . . . . . 
. . . . . . . 
. . . . . . . 
. . . . . . . 
. . . . . . . 
. X . . . . . 
O  moves on  (1, 3)
. . . . . . . 
. . . . . . . 
. . . . . . . 
. . . . . . . 
. . . . . . . 
. X O . . . . 

Round  2 :
X  moves on  (2, 3)
. . . . . . . 
. . . . . . . 
. . . . . . . 
. . . . . . . 
. . X . . . . 
. X O . . . . 
O  moves on  (2, 2)
. . . . . . . 
. . . . . . . 
. . . . . . . 
. . . . . . . 
. O X . . . . 
. X O . . . . 

Round  3 :
X  moves on  (3, 2)
. . . . . . . 
. . . . . . . 
. . . . . . . 
. X . . . . . 
. O X . . . . 
. X O . . . . 
O  moves on  (1, 4)
. . . . . . . 
. . . . . . . 
. . . . . . . 
. X . . . . . 
. O X . . . . 
. X O O . . . 

Round  4 :
X  moves on  (2, 4)
. . . . . . . 
. . . . . . . 
. . . . . . . 
. X . . . . . 
. O X X . . . 
. X O O . . . 
O  moves on  (1, 7)
. . . . . . . 
. . . . . . . 
. . . . . . . 
. X . . . . . 
. O X X . . . 
. X O O . . O 

Round  5 :
X  moves on  (3, 4)
. . . . . . . 
. . . . . . . 
. . . . . 

7

## Step 3: alpha beta cutoff search

In this step, you need to finish the code below to complete alpha beta cutoff search. It adds this feature to the classicial alpha beta pruning code you worked on in the assignment tictactoe:

It cuts off the tree at pre-defined depth limit or at terminal, and then returns an evaluation/heuristic value.

**Hint**
- Each player has its own evaluation function: eval_X and eval_O. You should call one of them during building of the entire game tree, depending on who is the current player at the root of the game tree.
- player is constant in the game tree, while state.to_move changes at every turn. 
- Unlike the depth in pacman project, since the branch factor is 7 at most states, we define one level of the tree as one move from either X or O, which is **NOT** a round of play from both X and O. Here is the graphical explanation: 

```
      MAX --> d = 0
MIN   MIN   MIN  --> d = 1
     ....        --> d = 2
     ....
```
- You may use cache in the function, but you need to make sure your cache includes state, depth, and player, which is not the cache1 we used in the assignment of tictactoe. I did not use it in my solution. I suggest you to add cache feature after you complete this assignment in case the set up of your cache is wrong. 

In [9]:
infinity = math.inf

# The default depth limit is 4.
# The default evaluation function for X and O are all default_eval defined in the code cell below.
def alpha_beta_cutoff_search(game, state, depth_limit=4, eval_X=default_eval, eval_O=default_eval):
    """Search game to determine best action; use alpha-beta pruning.
    This version cuts off search and uses an evaluation function."""

    # player at the root of the game tree.
    player = game.to_move(state)

    # Maximizer
    def max_value(state, alpha, beta, depth):
        # call evaluation if cut off condition met
        # Your code goes here:
        if game.terminal_test(state) or depth >= depth_limit:
            '''if player == 'X':
                return eval_X(state, player)
            else:'''
            return eval_X(state, player)

        v = -infinity

        for action in game.actions(state):
            v = min_value(game.result(state, action), alpha, beta, depth + 1)

            if v >= beta:
                return v

            alpha = max(alpha, v)

        return v

    # Minimizer
    def min_value(state, alpha, beta, depth):
        # Your code goes here:
        if game.terminal_test(state) or depth >= depth_limit:
            ''' if player == 'X':
                 return eval_X(state, player)
             else:'''
            return eval_O(state, player)

        v = infinity

        for action in game.actions(state):
            v = max_value(game.result(state, action), alpha, beta, depth + 1)

            if v <= alpha:
                return v

            beta = min(beta, v)

        return v

    # Body of alpha_beta_cutoff_search starts here:
    best_score = -infinity
    beta = infinity
    best_action = None
    for a in game.actions(state):
        v = min_value(game.result(state, a), best_score, beta, 1)
        if v > best_score:
            best_score = v
            best_action = a
    return best_action


Here is the default evaluation function. It returns 1 for win, -1 for lose, and 0 for tie or any state if it is not terminal. 

This simple evaluation is actually not very bad. It could estimate a few steps ahead and try it best to avoid choosing an action that could lose the game. 

In [8]:
# *args means it could accept more arguments.
def default_eval(state, player, *args):
    if player == 'X':
        return state.utility
    else: 
        return -state.utility
  

### Step 3.1: Test your code.

Once your alpha beta cutoff search code is done, you can run the code below to test your code.

Run your code against the random player below. Your code should win the game. 

In [1]:
game = ConnectFour()
result = play_game(game, alpha_beta_cutoff_search, random_player);

NameError: name 'ConnectFour' is not defined

Run your code again using non-default depth. We turned the verbose False to save the space in the output. 

In [10]:
game = ConnectFour()
play_game(game, alpha_beta_cutoff_search, alpha_beta_cutoff_search, 
               verbose = False, kwargs_list = [{'depth_limit':2}, {'depth_limit':4}]); # different cut off depth 

Number of Rounds:  9
X  loses the game.


## Step 4. Design your own evaluation function. 

You could use state, player, and/or depth to design a better evaluation function than the default one. 

**Hint:**

- Use what we learned in the class activity to guide your design, but you could do better.
- player is the player at the root of the game tree. 
- Use state.board to get the board dictionary and then use board.get((r,c)) to get the value at position (r,c).
    - The value on the board could be X, O, or a dot .
- You should not just design an evaluation function to win, but also to win as quickly as possible. 

In [None]:
def better_eval(state, player, depth):
   # Your code goes here:





### Step 4.1: Testing your better evaluation function

**Test A:** Run the code below to see if you can beat the bot you just built. It would be a surprise if you can win. 

In [None]:
game = ConnectFour()
play_game(game, alpha_beta_cutoff_search, random_player,
               verbose = True, kwargs_list = [{'depth_limit':4, 'eval_fn':better_eval}, {}]); 

**Test B:** Run the code below to play the game between two better bots at different depth. Usually, the bot with deeper depth wins the game. 

In [None]:
game = ConnectFour()
play_game(game, alpha_beta_cutoff_search, alpha_beta_cutoff_search, 
               verbose = False, kwargs_list = [{'depth_limit':4, 'eval_fn':better_eval}, 
                                               {'depth_limit':3, 'eval_fn':better_eval}]); # different cut off depth 

**Test C:** Run the code below to play the game between your better bot and the default bot.

In [None]:
game = ConnectFour()
play_game(game, alpha_beta_cutoff_search, alpha_beta_cutoff_search, 
               verbose = False, kwargs_list = [{'depth_limit':4, 'eval_fn':better_eval}, {}]); 

**Test D:** Run the code below to play the game between your better bot and the random bot 10 times.

In [None]:
rounds = 0
for i in range(10):
    game = ConnectFour()
    rounds += play_game(game, alpha_beta_cutoff_search, random_player, 
               verbose = False, kwargs_list = [{'depth_limit':4, 'eval_fn':better_eval}, {}]); 
print("The average rounds is ", rounds/10)

## Grading Note

- You will lose 40 points at least if your bot cannot beat default bot in test C and random bot in test D. 
- You will lose 30 points if your bot cannot beat the random bot less than 10 rounds in average in test D. 
- You will lose 20 points if your bot at deeper level cannot beat the bot at shallow level in test B. 
- You will lose 10 points if your bot needs to beat the default bot equal or more than 15 rounds in test C. 
- **Additional** up to 5 points if your bot can beat my bot when I grade your bot.