# Assignment 4: Negamax with Alpha-Beta Pruning and Iterative Deepening

Bradley Pospeck

# Table of Contents
* [Assignment 4: Negamax with Alpha-Beta Pruning and Iterative Deepening](#Assignment-4:-Negamax-with-Alpha-Beta-Pruning-and-Iterative-Deepening)
	* [Negamax](#Negamax)
    * [Game Setup](#Game-Setup)
	* [NegamaxIDS](#NegamaxIDS)
	* [NegamaxIDSab](#NegamaxIDSab)
	* [Grading](#Grading)
	* [Extra Credit](#Extra-Credit)


For this assignment, an alpha-beta pruning will be compared with, and applied to, negamax and negamax iterative deepening search. The `negamax` algorithm maximizes a player's score as best it can by exploring possible paths of a game tree to a specified depth. It only works with zero sum games. For example, in this assignment tic-tac-toe is the game of choice. The winner of the game gets a value of 1, while the loser gets a value of -1. Any draws result in both players receiving a 0. The sum of the scores that both players receive is zero, thus making it a zero sum game. The algorithm assumes it is playing an optimum opponent. Both `negamaxIDS` and `negamaxIDSab` will be explained briefly below.

## Negamax

Below is the implementation of `negamax`. This base function was also provided with the assignment because the main goal of the assignment is to compare the base `negamax` with `negamaxIDS` and `negamaxIDSab`.

In [1]:
def negamax(game, depthLeft):
    # If at terminal state or depth limit, return utility value and move None
    if game.isOver() or depthLeft == 0:
        return game.getUtility(), None # call to negamax knows the move
    # Find best move and its value from current state
    bestValue, bestMove = None, None
    for move in game.getMoves():
        # Apply a move to current state
        game.makeMove(move)
        # Use depth-first search to find eventual utility value and back it up.
        #  Negate it because it will come back in context of next player
        value, _ = negamax(game, depthLeft-1)
        # Remove the move from current state, to prepare for trying a different move
        game.unmakeMove(move)
        if value is None:
            continue
        value = - value
        if bestValue is None or value > bestValue:
            # Value for this move is better than moves tried so far from this state.
            bestValue, bestMove = value, move
    return bestValue, bestMove

## Game Setup

Below is a class that implements tic-tac-toe. Most of the code was provided by Dr. Anderson, but a couple of minor methods needed to be provided by me to complete it. This includes `getNumberMovesExplored(), getWinningValue(), getFilledSquares(), and getNumX().`

In [2]:
class TTT(object):

    def __init__(self):
        self.board = [' ']*9
        self.player = 'X'
        if False:
            self.board = ['X', 'X', ' ', 'X', 'O', 'O', ' ', ' ', ' ']
            self.player = 'O'
        self.playerLookAHead = self.player
        self.movesExplored = 0

    def locations(self, c):
        return [i for i, mark in enumerate(self.board) if mark == c]

    def getMoves(self):
        moves = self.locations(' ')
        return moves

    def getNumberMovesExplored(self):
        return self.movesExplored
    
    def getUtility(self):
        whereX = self.locations('X')
        whereO = self.locations('O')
        wins = [[0, 1, 2], [3, 4, 5], [6, 7, 8],
                [0, 3, 6], [1, 4, 7], [2, 5, 8],
                [0, 4, 8], [2, 4, 6]]
        isXWon = any([all([wi in whereX for wi in w]) for w in wins])
        isOWon = any([all([wi in whereO for wi in w]) for w in wins])
        if isXWon:
            return 1 if self.playerLookAHead is 'X' else -1
        elif isOWon:
            return 1 if self.playerLookAHead is 'O' else -1
        elif ' ' not in self.board:
            return 0
        else:
            return None 

    def isOver(self):
        return self.getUtility() is not None

    def makeMove(self, move):
        self.board[move] = self.playerLookAHead
        self.playerLookAHead = 'X' if self.playerLookAHead == 'O' else 'O'
        self.movesExplored +=1

    def changePlayer(self):
        self.player = 'X' if self.player == 'O' else 'O'
        self.playerLookAHead = self.player

    def unmakeMove(self, move):
        self.board[move] = ' '
        self.playerLookAHead = 'X' if self.playerLookAHead == 'O' else 'O'
        
    def getWinningValue(self):
        return 1

    def getFilledSquares(self):
        return 9-self.board.count(' ')
    
    def getNumX(self):
        return self.board.count('X')
    
    def __str__(self):
        s = '{}|{}|{}\n-----\n{}|{}|{}\n-----\n{}|{}|{}'.format(*self.board)
        return s

The next 2 functions define an opponent and a general function to play games and call the `negamax` functions. The opponent is specific to tic-tac-toe and simply moves to the next available space. Again, besides a couple of minor changes to `playGame`, both of these functions were provided with the assignment.

In [3]:
def opponent(board):
    return board.index(' ')

def playGame(game,opponent,depthLimit, negamaxF = negamax):
    print(game)
    while not game.isOver():
        score,move = negamaxF(game,depthLimit)
        if move == None :
            print('move is None. Stopping.')
            break
        game.makeMove(move)
        print('Player', game.player, 'to', move, 'for score' ,score)
        print(game)
        if not game.isOver():
            game.changePlayer()
            opponentMove = opponent(game.board)
            game.makeMove(opponentMove)
            print('Player', game.player, 'to', opponentMove)   ### FIXED ERROR IN THIS LINE!
            print(game)
            game.changePlayer()

A quick test run to make sure everything is implemented and works correctly. `negamax` is not actually being used for both players: it is being used for player X, while player O is using the simple `opponent` function defined above.

In [4]:
g1 = TTT()
playGame(g1,opponent,20)
print('Number of moves explored:',g1.getNumberMovesExplored())
print('Number of squares filled:',g1.getFilledSquares())
print('Number of X on the board:',g1.getNumX())

 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 0
X| | 
-----
 | | 
-----
 | | 
Player O to 1
X|O| 
-----
 | | 
-----
 | | 
Player X to 3 for score 1
X|O| 
-----
X| | 
-----
 | | 
Player O to 2
X|O|O
-----
X| | 
-----
 | | 
Player X to 4 for score 1
X|O|O
-----
X|X| 
-----
 | | 
Player O to 5
X|O|O
-----
X|X|O
-----
 | | 
Player X to 6 for score 1
X|O|O
-----
X|X|O
-----
X| | 
Number of moves explored: 558334
Number of squares filled: 7
Number of X on the board: 4


## NegamaxIDS 

Iterative deepening search applied to `negamax` results in `negamaxIDS`. It's essentially the same algorithm, except that it searches the depth of the game tree in increments. This means that it starts with a search of 0 depth, then a depth 1, 2, etc., until it either reaches a terminal game state or maximum depth. Because the search starts shallow and goes steadily deeper, it should find the solution with fewer explored moves than `negamax` alone.

Obviously, there needs to be a point to stop. Tic-tac-toe can only go to a depth of 9, but what will we do if the provided depth limit is greater than 9?

In Tic-tic-toe, `negamax` can be stopped once a winning move is returned. For Tic-tac-toe, this will be a value of 1. If the maximum depth is reached and no winning move has been found, the best move that was found across all depth limits will be returned. Generally this means that if there was no win to be found, a draw will be the next best thing.

In [5]:
def negamaxIDS(game, depthLimit):
    bestValue = -100 #Just a large negative value so there's something for comparison later
    for depth in range(depthLimit+1):
        gameValue, move = negamax(game, depth)
        if gameValue == game.getWinningValue():
            return gameValue, move
        if gameValue != None and bestValue < gameValue:
            bestValue=gameValue
            bestMove=move
    return bestValue, bestMove

Another test run, but with `negamaxIDS` this time. 

In [6]:
g2 = TTT()
playGame(g2,opponent,20,negamaxIDS)
print('Number of moves explored:',g2.getNumberMovesExplored())
print('Number of squares filled:',g2.getFilledSquares())
print('Number of X on the board:',g2.getNumX())

 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 1
X| | 
-----
 | | 
-----
 | | 
Player O to 1
X|O| 
-----
 | | 
-----
 | | 
Player X to 3 for score 1
X|O| 
-----
X| | 
-----
 | | 
Player O to 2
X|O|O
-----
X| | 
-----
 | | 
Player X to 6 for score 1
X|O|O
-----
X| | 
-----
X| | 
Number of moves explored: 23338
Number of squares filled: 5
Number of X on the board: 3


## NegamaxIDSab

Alpha-beta pruning has potential to further reduce the number of explored moves in `negamaxIDSab`. This is because of the 2 new parameters, alpha and beta. Alpha is the best so far for the current player. Beta is the best so far for the other player. As a move is made, alpha is constantly updated to the maximum between the current best move and the current alpha value. When searching a branch in the game tree, if the best score is ever larger than or equal to beta, you can immediately stop searching the tree, because you know that that the current player can no longer do any better. When making a recursive call on `negamaxIDSab`, you need to swap and negate the beta and alpha values since they're the opposite for the opponent.

In [7]:
import math

def negamaxab(game, depthLeft, alpha=-math.inf, beta=math.inf):
    # If at terminal state or depth limit, return utility value and move None
    if game.isOver() or depthLeft == 0:
        return game.getUtility(), None # call to negamax knows the move
    # Find best move and its value from current state
    bestValue, bestMove = None, None
    for move in game.getMoves():
        # Apply a move to current state
        game.makeMove(move)
        # Use depth-first search to find eventual utility value and back it up.
        #  Negate it because it will come back in context of next player
        value, _ = negamaxab(game, depthLeft-1,-beta,-alpha)
        # Remove the move from current state, to prepare for trying a different move
        game.unmakeMove(move)
        if value is None:
            continue
        value = - value
        if bestValue is None or value > bestValue:
            # Value for this move is better than moves tried so far from this state.
            bestValue, bestMove = value, move
            alpha = max(bestValue,alpha)
            if bestValue >= beta: break
    return bestValue, bestMove

`negamaxIDSab` is identical to `negamaxIDS` above except that it instead calls `negamaxab`.

In [8]:
def negamaxIDSab(game, depthLimit):
    bestValue = -100 #Just a large negative value so there's something for comparison later
    for depth in range(depthLimit+1):
        gameValue, move = negamaxab(game, depth)
        if gameValue == game.getWinningValue():
            return gameValue, move
        if gameValue != None and bestValue < gameValue:
            bestValue=gameValue
            bestMove=move
    return bestValue, bestMove

One final individual game using `negamaxIDSab`.

In [9]:
g3 = TTT()
playGame(g3,opponent,10,negamaxIDSab)
print('Number of moves explored:',g3.getNumberMovesExplored())
print('Number of squares filled:',g3.getFilledSquares())
print('Number of X on the board:',g3.getNumX())

 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 1
X| | 
-----
 | | 
-----
 | | 
Player O to 1
X|O| 
-----
 | | 
-----
 | | 
Player X to 3 for score 1
X|O| 
-----
X| | 
-----
 | | 
Player O to 2
X|O|O
-----
X| | 
-----
 | | 
Player X to 6 for score 1
X|O|O
-----
X| | 
-----
X| | 
Number of moves explored: 6053
Number of squares filled: 5
Number of X on the board: 3


## playGames

Now it's time to run all 3 algorithms at once and compare the results. It would also be useful to compare the effective branching factor, as in assignment 3, so that should be defined first. My `ebf` function worked just fine in assignment 3, but I felt the professor's provided `ebf` for this assignment looked much more elegant, so I decided to use his.

In [10]:
def ebf(nNodes, depth, precision=0.01):
    if nNodes == 0:
        return 0

    def ebfRec(low, high):
        mid = (low + high) * 0.5
        if mid == 1:
            estimate = 1 + depth
        else:
            estimate = (1 - mid**(depth + 1)) / (1 - mid)
        if abs(estimate - nNodes) < precision:
            return mid
        if estimate > nNodes:
            return ebfRec(low, mid)
        else:
            return ebfRec(mid, high)

    return ebfRec(1, nNodes)

Below is the `playGames` function. It takes an opponent and depth input and uses those to run `playGame` from above on all 3 negamax algorithms.

In [11]:
def playGames(opponent, depth):
    # negamax on Tic Tac Toe
    print('negamax:')
    nm = TTT()
    playGame(nm,opponent,depth,negamax)
    xMove1 = nm.getNumX()
    explored1 = nm.getNumberMovesExplored()
    xo1 = nm.getFilledSquares()
    branch1 = ebf(explored1,xo1)
    # negamaxIDS on Tic Tac Toe
    print('\nnegamaxIDS:')
    nmIDS = TTT()
    playGame(nmIDS,opponent,depth,negamaxIDS)
    xMove2 = nmIDS.getNumX()
    explored2 = nmIDS.getNumberMovesExplored()
    xo2 = nmIDS.getFilledSquares()
    branch2 = ebf(explored2,xo2)
    # negamaxIDSab on Tic Tac Toe
    print('\nnegamaxIDSab:')
    nmIDSab = TTT()
    playGame(nmIDSab,opponent,depth,negamaxIDSab)
    xMove3 = nmIDSab.getNumX()
    explored3 = nmIDSab.getNumberMovesExplored()
    xo3 = nmIDSab.getFilledSquares()
    branch3 = ebf(explored3,xo3)
    
    print('negamax made {0} moves. {1} moves explored for ebf({1}, {2}) of {3}'.format(xMove1,explored1,xo1,branch1))
    print('negamaxIDS made {0} moves. {1} moves explored for ebf({1}, {2}) of {3}'.format(xMove2,explored2,xo2,branch2))
    print('negamaxIDSab made {0} moves. {1} moves explored for ebf({1}, {2}) of {3}'.format(xMove3,explored3,xo3,branch3))   

In [12]:
playGames(opponent, 10)

negamax:
 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 0
X| | 
-----
 | | 
-----
 | | 
Player O to 1
X|O| 
-----
 | | 
-----
 | | 
Player X to 3 for score 1
X|O| 
-----
X| | 
-----
 | | 
Player O to 2
X|O|O
-----
X| | 
-----
 | | 
Player X to 4 for score 1
X|O|O
-----
X|X| 
-----
 | | 
Player O to 5
X|O|O
-----
X|X|O
-----
 | | 
Player X to 6 for score 1
X|O|O
-----
X|X|O
-----
X| | 

negamaxIDS:
 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 1
X| | 
-----
 | | 
-----
 | | 
Player O to 1
X|O| 
-----
 | | 
-----
 | | 
Player X to 3 for score 1
X|O| 
-----
X| | 
-----
 | | 
Player O to 2
X|O|O
-----
X| | 
-----
 | | 
Player X to 6 for score 1
X|O|O
-----
X| | 
-----
X| | 

negamaxIDSab:
 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 1
X| | 
-----
 | | 
-----
 | | 
Player O to 1
X|O| 
-----
 | | 
-----
 | | 
Player X to 3 for score 1
X|O| 
-----
X| | 
-----
 | | 
Player O to 2
X|O|O
-----
X| | 
-----
 | | 
Player X to 6 for score 1
X|O|O
-----
X| | 
-----
X| | 
negam

With these results, it's interesting to see that although `negamaxIDS` explored far less moves than `negamax`, it has a larger effective branching factor. This can potentially imply that `negamaxIDS` may actually be less efficient and require larger search spaces if the game trees it is searching are large enough and deep enough.

Not surprisingly, alpha beta pruning yielded the best results. It found the quickest path to victory and explored far less moves than either of the other 2 algorithms. It's effective branching factor is also the smallest by an amount of 1 or more when compared to the other 2 algorithms. All this evidence points towards alpha beta pruning being a particularly effective method for reducing search time and space.

## Grading

In [13]:
%run -i A4grader.py


Testing negamax starting from ['O', 'X', ' ', 'O', ' ', ' ', ' ', 'X', ' ']

--- 10/10 points. negamax correctly returns value of 1

--- 10/10 points. negamax correctly explored 124 states.

Testing negamax starting from ['O', 'X', 'X', 'O', 'O', ' ', ' ', 'X', ' ']

--- 10/10 points. negamax correctly returns value of -1 and move of 5

Testing negamaxIDS with max depth of 5, starting from ['O', 'X', 'X', 'O', 'O', ' ', ' ', 'X', ' ']

--- 10/10 points. negamaxIDS correctly returns value of -1 and move of 5

Testing negamaxIDSab starting from ['O', 'X', 'X', 'O', 'O', ' ', ' ', 'X', ' ']

--- 20/20 points. negamaxIDSab correctly returns value of -1 and move of 5

Testing playGame with opponent that always plays in highest numbered position.
 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 0
X| | 
-----
 | | 
-----
 | | 
Player O to 8
X| | 
-----
 | | 
-----
 | |O
Player X to 2 for score 1
X| |X
-----
 | | 
-----
 | |O
Player O to 7
X| |X
-----
 | | 
-----
 |O|O
Player X to 1 for 

## Extra Credit 

I really want to implement at least one of the assignment's extra credits. This class is cool and I like playing around with different things, particularly games. Unfortunately it seems I've bitten off a little more than I maybe should've this semester and I only seem to have time to do the base assignment.