![alt text](https://zewailcity.edu.eg/main/images/logo3.png)

_Prepared by_  [**Muhammad Hamdy AlAref**](mailto:malaref@zewailcity.edu.eg)

# Adversarial Search

Adversarial search algorithms are algorithms that try solving problems where other agents exist and may plan against us; also referred to as *games*!

## Game Formulation

Game formulation is very similar to *problem formulation* that we discussed before. This time with multiple agents (players)!

In [2]:
class Game:
    '''
    Abstract game class for game formulation.
    It declares the expected methods to be used by an adversarial search algorithm.
    All the methods declared are just placeholders that throw errors if not overriden by child "concrete" classes!
    '''
    
    def __init__(self):
        '''Constructor that initializes the game. Typically used to setup the initial state, number of players and, if applicable, the terminal states and their utilities.'''
        self.init_state = None
    
    def player(self, state):
        '''Returns the player whose turn it is.'''
        raise NotImplementedError
    
    def actions(self, state):
        '''Returns an iterable with the applicable actions to the given state.'''
        raise NotImplementedError
    
    def result(self, state, action):
        '''Returns the resulting state from applying the given action to the given state.'''
        raise NotImplementedError
    
    def terminal_test(self, state):
        '''Returns whether or not the given state is a terminal state.'''
        raise NotImplementedError
    
    def utility(self, state, player):
        '''Returns the utility of the given state for the given player, if possible (usually, it has to be a terminal state).'''
        raise NotImplementedError

## Example: Tic-Tac-Toe

Let's try formulating the quite well-known game [tic-tac-toe](https://en.wikipedia.org/wiki/Tic-tac-toe)!

In [3]:
class TicTacToe(Game):
    '''Tic-tac-toe game formulation.'''
    
    PLAYERS = ('X', 'O')
    
    def _won(self, state, player):
        '''Auxiliary function for checking if a player has won.'''
        # Any row has 3 symbols of the player's
        for row in range(3):
            won = True
            for column in range(3):
                if state[row][column] is not player:
                    won = False
                    break
            if won: return True
        # Any column has 3 symbols of the player's
        for column in range(3):
            won = True
            for row in range(3):
                if state[row][column] is not player:
                    won = False
                    break
            if won: return True
        # Diagonal has 3 symbols of the player's
        won = True
        for i in range(3):
            if state[i][i] is not player:
                won = False
                break
        if won: return True
        # Diagonal has 3 symbols of the player's
        won = True
        for i in range(3):
            if state[i][2-i] is not player:
                won = False
                break
        if won: return True
        # If none of the above
        return False
    
    def __init__(self):
        # The state is a 3x3 matrix
        self.init_state = ((None,) * 3,) * 3
    
    def player(self, state):
        Xs, Os = 0, 0
        for row in state:
            Xs += row.count(TicTacToe.PLAYERS[0])
            Os += row.count(TicTacToe.PLAYERS[1])
        if Xs == Os:  return TicTacToe.PLAYERS[0]
        elif Xs > Os: return TicTacToe.PLAYERS[1]
    
    def actions(self, state):
        actions_list = []
        for i, row in enumerate(state):
            for j, cell in enumerate(row):
                if not cell:
                    actions_list.append((i, j))
        return actions_list
    
    def result(self, state, action):
        mutable_grid = list(state)
        mutable_row = list(mutable_grid[action[0]])
        mutable_row[action[1]] = self.player(state)
        mutable_grid[action[0]] = tuple(mutable_row)
        return tuple(mutable_grid)
    
    def terminal_test(self, state):
        all_filled = True
        for row in range(3):
            if not all_filled: break
            for column in range(3):
                if state[row][column] is None:
                    all_filled = False
                    break
        if all_filled: return True
        for player in TicTacToe.PLAYERS:
            if self._won(state, player):
                return True
        return False
    
    def utility(self, state, player):
        for p in TicTacToe.PLAYERS:
            if self._won(state, p):
                if p is player: return 1
                else: return -1
        return 0

## Minimax Algorithm

Minimax algorithm is a recursive algorithm that returns the optimal move, provided the players play *optimally*, by doing a DFS search on the game tree.

In [4]:
from math import inf

def minimax(game, state):
    '''Minimax implementation.'''
    player = game.player(state)
    
    def max_value(state):
        if game.terminal_test(state): return game.utility(state, player)
        maxi = -inf
        for action in game.actions(state):
            maxi = max(maxi, min_value(game.result(state, action)))
        return maxi
    
    def min_value(state):
        if game.terminal_test(state): return game.utility(state, player)
        mini = +inf
        for action in game.actions(state):
            mini = min(mini, max_value(game.result(state, action)))
        return mini
    
    best_action, best_value = None, None
    for action in game.actions(state):
        action_value = min_value(game.result(state, action))
        if best_value is None or best_value < action_value:
            best_action = action
            best_value = action_value
    return best_action

## Alpha-Beta Pruning

Alpha-beta pruning is a modification to the minimax algorithm that prunes sub-trees that won't affect the decision making for a great speed-up!

In [5]:
from math import inf

def alpha_beta(game, state):
    '''Alpha-Beta Pruning implementation.'''
    player = game.player(state)
    
    def max_value(state, alpha, beta):
        if game.terminal_test(state): return game.utility(state, player)
        maxi = -inf
        for action in game.actions(state):
            maxi = max(maxi, min_value(game.result(state, action), alpha, beta))
            alpha = max(alpha, maxi)
            if alpha >= beta: return maxi
        return maxi
    
    def min_value(state, alpha, beta):
        if game.terminal_test(state): return game.utility(state, player)
        mini = +inf
        for action in game.actions(state):
            mini = min(mini, max_value(game.result(state, action), alpha, beta))
            beta = min(beta, mini)
            if alpha >= beta: return mini
        return mini
    
    best_action, best_value = None, None
    for action in game.actions(state):
        action_value = min_value(game.result(state, action), -inf, +inf)
        if best_value is None or best_value < action_value:
            best_action = action
            best_value = action_value
    return best_action

In [6]:
game = TicTacToe()

In [7]:
def visualize(state):
    for row in state:
        for cell in row:
            print(cell, end='\t')
        print()
    print('--------------------')

state = game.init_state
while(not game.terminal_test(state)):
    action = minimax(game, state)
    assert action == alpha_beta(game, state)
    state = game.result(state, action)
    visualize(state)

X	None	None	
None	None	None	
None	None	None	
--------------------
X	None	None	
None	O	None	
None	None	None	
--------------------
X	X	None	
None	O	None	
None	None	None	
--------------------
X	X	O	
None	O	None	
None	None	None	
--------------------
X	X	O	
None	O	None	
X	None	None	
--------------------
X	X	O	
O	O	None	
X	None	None	
--------------------
X	X	O	
O	O	X	
X	None	None	
--------------------
X	X	O	
O	O	X	
X	O	None	
--------------------
X	X	O	
O	O	X	
X	O	X	
--------------------


## Performance Comparison

Now, let's compare the performance of alpha-beta pruning against the original minimax algorithm!

In [None]:
%prun print(minimax(game, game.init_state))

(0, 0)
 

In [None]:
%prun print(alpha_beta(game, game.init_state))

(0, 0)
 

## Requirement

Let's re-solve the tic-tac-toe game with heuristics!

You are required to write Python code that implements the H-minimax (or H-alpha-beta) algorithm and apply it to the tic-tac-toe game and compare its performance with its regular implementation!

**HINT:** You will need to edit (or inherit from) the `TicTacToe` class. You may use the heuristic evaluation function from exercise 3! $$Eval(s) = 3X_2(s)+X_1(s)-(3O_2(s) + O_1(s))$$

**Estimated time for this exercise is 30 minutes!**

In [8]:
def calc_Xn_and_On(state, PLAYERS):
    '''
    Calulcating X2,X1,O2,O1 used in the heuristic function
    '''
    # Creating a list of strings for the rows
    s = ["".join(str(p) for p in _tuple).replace('None', ' ') for _tuple in state]
    # Transposing the grid
    sT = list(map("".join, zip(*s)))
    # Creating a list for the 2 diagonlas 
    sD = ["".join([s[0][0], s[1][1], s[2][2]]), "".join([s[0][2], s[1][1], s[2][0]])]
    
    def calc_Xn_or_On(player):
        player2 = "".join(PLAYERS).replace(player, "")
        # Calculate X2/O2 and X1/O1 on rows
        X2_rows = sum([1 for i in range(3) if s[i].count(player)==2 and s[i].count(player2)==0])
        X1_rows = sum([1 for i in range(3) if s[i].count(player)==1 and s[i].count(player2)==0])
        # Calculate X2/O2 and X1/O1 on columns
        X2_cols = sum([1 for i in range(3) if sT[i].count(player)==2 and sT[i].count(player2)==0])
        X1_cols = sum([1 for i in range(3) if sT[i].count(player)==1 and sT[i].count(player2)==0])
        # Calculate X2/O2 and X1/O1 on diagonlas
        X2_diag = sum([1 for i in range(2) if sD[i].count(player)==2 and sD[i].count(player2)==0])
        X1_diag = sum([1 for i in range(2) if sD[i].count(player)==1 and sD[i].count(player2)==0])
        return (X2_rows + X2_cols + X2_diag, X1_rows + X1_cols + X1_diag)
    
    return calc_Xn_or_On(PLAYERS[0]), calc_Xn_or_On(PLAYERS[1])

# For testing
state = (('x','o','x'),('x',None, 'x'),('o',None,None))
(X2,X1),(O2,O1) = calc_Xn_and_On(state, ('x', 'o'))

visualize(state)
print('X2 = {}, X1 = {}, O2 = {}, O1 = {}'.format(X2,X1,O2,O1))

x	o	x	
x	None	x	
o	None	None	
--------------------
X2 = 2, X1 = 1, O2 = 0, O1 = 2


In [21]:
class TicTacToe_Heuristic(TicTacToe):
    'Inheret the TicTacToe class with all its functions'
    def utility(self, state, player, terminal_state = False):
        if terminal_state: return super().utility(state, player)
        (X2,X1),(O2,O1) = -1*calc_Xn_and_On(state, TicTacToe.PLAYERS)
        return 3*X2 + X1 - (3*O2 + O1)

In [15]:
def H_minimax(game, state, max_depth):
    '''H-Minimax implementation.'''
    
    player = game.player(state)
    def max_value(state, depth):
        terminal_check = game.terminal_test(state)
        if terminal_check or depth >= max_depth: return game.utility(state, player, terminal_check)
        depth += 1
        maxi = -inf
        for action in game.actions(state):
            maxi = max(maxi, min_value(game.result(state, action), depth))
        return maxi
    
    def min_value(state, depth):
        terminal_check = game.terminal_test(state)
        if terminal_check or depth >= max_depth: return game.utility(state, player, terminal_check)
        depth += 1
        mini = +inf
        for action in game.actions(state):
            mini = min(mini, max_value(game.result(state, action), depth))
        return mini
    
    best_action, best_value = None, None
    for action in game.actions(state):
        action_value = min_value(game.result(state, action), depth=1)
        if best_value is None or best_value < action_value:
            best_action = action
            best_value = action_value
    return best_action

In [22]:
game = TicTacToe_Heuristic()
state = game.init_state
while(not game.terminal_test(state)):
    action = H_minimax(game, state, max_depth=100)
    #assert action == alpha_beta(game, state)
    state = game.result(state, action)
    visualize(state)

X	None	None	
None	None	None	
None	None	None	
--------------------
X	None	None	
None	O	None	
None	None	None	
--------------------
X	X	None	
None	O	None	
None	None	None	
--------------------
X	X	O	
None	O	None	
None	None	None	
--------------------
X	X	O	
None	O	None	
X	None	None	
--------------------
X	X	O	
O	O	None	
X	None	None	
--------------------
X	X	O	
O	O	X	
X	None	None	
--------------------
X	X	O	
O	O	X	
X	O	None	
--------------------
X	X	O	
O	O	X	
X	O	X	
--------------------
