# PB016: Artificial intelligence I, lab 5 - Games and game strategies

This week's topic are games, game strategies and basic algorithms for optimal game solutions using AI. We'll focus namely on:

1. __Minimax algorithm__
2. __Alpha-beta pruning__

---

## 1. [Minimax](https://en.wikipedia.org/wiki/Minimax) algorithm

__Basic facts__
- A concept originally based on [game theory](https://en.wikipedia.org/wiki/Game_theory).
- Designed for games of two or more alternating players, each with a set of strategies for each individual move in the game.
- The goal states of the game are evaluated by a valuation function, which assigns their corresponding gain values to each player.
- The minimax algorithm recursively minimizes a possible loss of a player in the worst possible scenario (i.e. if the opponent tries to reach their maximum loss in each turn by choosing the optimal strategy).
- In simple games, a complete evaluation of the game is possible, but for more complex games a combinatorial explosion occurs quickly and the algorithm therefore searches only a few levels of the tree of possible moves at a time.

__Example__ - a sample of a general minimax tree:

![minimax tree](https://www.fi.muni.cz/~novacek/courses/pb016/labs/img/minimax.png)

### Game for these labs
- [Tic Tac Toe](https://en.wikipedia.org/wiki/Tic-tac-toe) on a 3x3 playing board.

An __example__ of unlabeled game tree for Tic Tac Toe:

![tictactoe tree](https://www.fi.muni.cz/~novacek/courses/pb016/labs/img/tictactoe.png)

### Basic game code
- a class representing the playing board with functions for:
  - initialization of the playing board,
  - verification that the current state of the playing area is a goal one,
  - verification of valid moves,
  - drawing the playing board.

In [40]:
# to measure the time required to evaluate the game using minimax and alpha-beta
# pruning
import time


'''
START OF MY CODE
'''
PLAYER_DUAL = {
    'X': 'O',
    'O': 'X',
    '.': '.'
}
'''
END OF MY CODE
'''

class Game:
    """A class representing the game."""

    def __init__(self):
        self.initialize_game()

    def initialize_game(self):
        """Initializes or resets the game."""
        self.current_state = [['.','.','.'],
                              ['.','.','.'],
                              ['.','.','.']]
        self.result = None
        # player X always plays first
        self.player_turn = 'X'

        '''
        START OF MY CODE
        '''
        # succession of moves
        self.moves = []

    def play(self, px, py):
        self.moves.append((px, py))
        self.current_state[px][py] = self.player_turn
        self.player_turn = PLAYER_DUAL[self.player_turn]

    def undo(self):
        px, py = self.moves.pop()
        self.current_state[px][py] = '.'
        self.player_turn = PLAYER_DUAL[self.player_turn]
    '''
    END OF MY CODE
    '''
    
    def get_result(self):
        """
        Tests whether the game is finished and, if it is, indeed,
        returns the resulting player's mark.

        Returns
        -------
        str (or NoneType, if the game is not finished yet)
            The indicator of the winner. Possible values: 'X' or 'O' for the
            corresponding winning player, '.' for draw, or `None` if not
            finished yet.
        """

        # vertical win
        for i in range(0, 3):
            if (self.current_state[0][i] != '.' and
                self.current_state[0][i] == self.current_state[1][i] and
                self.current_state[1][i] == self.current_state[2][i]):
                return self.current_state[0][i]

        # horizontal win
        for i in range(0, 3):
            if (self.current_state[i] == ['X', 'X', 'X']):
                return 'X'
            elif (self.current_state[i] == ['O', 'O', 'O']):
                return 'O'

        # main diagonal win
        if (self.current_state[0][0] != '.' and
            self.current_state[0][0] == self.current_state[1][1] and
            self.current_state[0][0] == self.current_state[2][2]):
            return self.current_state[0][0]

        # secondary diagonal win
        if (self.current_state[0][2] != '.' and
            self.current_state[0][2] == self.current_state[1][1] and
            self.current_state[0][2] == self.current_state[2][0]):
            return self.current_state[0][2]

        # testing whether the board if full
        for i in range(0, 3):
            for j in range(0, 3):
                # not full, no end yet
                if (self.current_state[i][j] == '.'):
                    return None

        # draw
        return '.'

    def is_valid(self, px, py):
        """Testing the move's validity.

        Parameters
        ----------
        px, py : int
            Coordinates of the move.

        Returns
        -------
        bool
            True if the move is valid, False otherwise.
        """

        if px < 0 or px > 2 or py < 0 or py > 2:
            return False # outside the playing field
        elif self.current_state[px][py] != '.':
            return False # the position is already taken
        else:
            return True

    def draw_board(self):
        """An auxiliary function for drawing the board."""

        for i in range(0, 3):
            print(' | '.join(self.current_state[i]))
        print()

### Game play functions

In [36]:
def play(game):
    """
    The main function for running the game. Alternates the moves of the 'X'
    and 'O' players until one of them wins, or until they draw. The function
    loads the user input and actually executes all the moves in alternating
    turns.

    Parameters
    ----------
    game : Game
        An object representing the game.
    """

    while True:
        game.draw_board()
        game.result = game.get_result()

        # printing the relevant message at the end of the game
        if game.result != None:
            if game.result == 'X':
                print('The winner is X!')
            elif game.result == 'O':
                print('The winner is O!')
            elif game.result == '.':
                print('Draw!')

            game.initialize_game()
            return

        # human player's move
        if game.player_turn == 'X':

            while True:

                # calculate the optimal recommended move in a minimization step
                print('Calculating the recommended optimal move...')
                start = time.time()
                (m, qx, qy) = mini(game)
                end = time.time()
                print(f'Calculation took {1000*(end - start):.4f} ms')
                print(f'Recommended move: X = {qx}, Y = {qy}')

                correct_format, px, py = False, -1, -1
                while not correct_format:
                    try:
                        px = int(input('Enter the X coordinate: '))
                        py = int(input('Enter the Y coordinate: '))
                        correct_format = True
                    except ValueError:
                        print('Invalid format, please repeat.')
                        correct_format = False

                if game.is_valid(px, py):
                    game.current_state[px][py] = 'X'
                    game.player_turn = 'O'
                    break
                else:
                    print('Invalid move, please repeat.')

        # AI move
        else:
            (m, px, py) = maxi(game)
            game.current_state[px][py] = 'O'
            game.player_turn = 'X'

### __Exercise 1.1: AI move selection__
- Implement a function to select the optimal move for the AI (i.e. maximize the goal value of the game from the current position).

In [37]:
def maxi(game):
    """The part of the mini-max algorithm that selects the optimal move for the
    maximizing player 'O' (in this case it would be the AI).

    Parameters
    ----------
    game : Game
        An object representing the game.

    Returns
    -------
    tuple
        A tuple consisting of three elements:
        - int : the maximized value of the game using the suggested optimal move
        - int : the "x" coordinate of the suggested optimal move
        - int : the "y" coordinate of the suggested optimal move
    """

    # possible end-game values for maximum are:
    # -1 - defeat
    # 0 - draw
    # 1 - win

    # initial maximum set to -2 (worse than worst case)
    maxv = -2

    px = None
    py = None

    # testing whether the game's over
    result = game.get_result()

    # if the game ends, the function must return the evaluation of the given
    # state (-1 for a loss, 0 for a draw, 1 for a win)
    if result == 'X':
        return (-1, None, None)
    elif result == 'O':
        return (1, None, None)
    elif result == '.':
        return (0, None, None)

    # selection of the move coordinates for 'O' by testing the optimality of
    # possible moves (i.e. taking the mini function result into account)

    '''
    START OF MY CODE
    '''
    for nx in range(3):
        for ny in range(3):
            if game.is_valid(nx, ny):
                game.play(nx, ny)
                m, x, y = mini(game)
                if maxv < m:
                    maxv, px, py = m, nx, ny
                game.undo()
    '''
    END OF MY CODE
    '''

    # we return the value and coordinates of the optimal move
    return maxv, px, py

### __Exercise 1.2: Simulating the human move selection__
- Implement a function for selecting the optimal move for a human (i.e. minimizing the goal value of the game from the current position).

In [38]:
def mini(game):
    """The part of the mini-max algorithm that recommends the optimal move for
    the minimizing player 'X' (in this case it would be a human player).

    Parameters
    ----------
    game : Game
        An object representing the game.

    Returns
    -------
    tuple
        A tuple consisting of three elements:
        - int : the minimized value of the game using the suggested optimal move
        - int : the "x" coordinate of the suggested optimal move
        - int : the "y" coordinate of the suggested optimal move
    """

    # possible end-game values for minimum are:
    # -1 - win
    # 0 - draw
    # 1 - defeat

    # initial minimum set to 2 (worse than worst case)
    minv = 2

    qx = None
    qy = None

    # testing whether the game's over
    result = game.get_result()

    # if the game ends, the function must return the evaluation of the given
    # state (-1 for a win, 0 for a draw, 1 for a loss)
    if result == 'X':
        return (-1, None, None)
    elif result == 'O':
        return (1, None, None)
    elif result == '.':
        return (0, None, None)

    # selection of the move coordinates for 'X' by testing the optimality of
    # possible moves (i.e. taking the maxi function result into account)

    '''
    START OF MY CODE
    '''
    for nx in range(3):
        for ny in range(3):
            if game.is_valid(nx, ny):
                game.play(nx, ny)
                m, x, y = maxi(game)
                if minv > m:
                    minv, qx, qy = m, nx, ny
                game.undo()
    '''
    END OF MY CODE
    '''

    # we return the value and coordinates of the optimal move
    return minv, qx, qy

### __Let's play!__

In [39]:
g = Game()
play(g)

. | . | .
. | . | .
. | . | .

Calculating the recommended optimal move...
Calculation took 5026.1617 ms
Recommended move: X = 0, Y = 0


Enter the X coordinate:  0
Enter the Y coordinate:  0


X | . | .
. | . | .
. | . | .

X | . | .
. | O | .
. | . | .

Calculating the recommended optimal move...
Calculation took 118.1972 ms
Recommended move: X = 0, Y = 1


Enter the X coordinate:  0
Enter the Y coordinate:  1


X | X | .
. | O | .
. | . | .

X | X | O
. | O | .
. | . | .

Calculating the recommended optimal move...
Calculation took 4.2315 ms
Recommended move: X = 2, Y = 0


Enter the X coordinate:  2
Enter the Y coordinate:  0


X | X | O
. | O | .
X | . | .

X | X | O
O | O | .
X | . | .

Calculating the recommended optimal move...
Calculation took 0.3874 ms
Recommended move: X = 1, Y = 2


Enter the X coordinate:  1
Enter the Y coordinate:  2


X | X | O
O | O | X
X | . | .

X | X | O
O | O | X
X | O | .

Calculating the recommended optimal move...
Calculation took 0.0374 ms
Recommended move: X = 2, Y = 2


Enter the X coordinate:  2
Enter the Y coordinate:  2


X | X | O
O | O | X
X | O | X

Draw!


---

## 2. [Alpha-beta pruning](https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning)

__Basic facts__
- Optimization of the minimax algorithm thanks to the determination of nodes (or moves) in the game tree that don't need to be searched further.
- The alpha-beta pruning algorithm stores two values, $\alpha$ and $\beta$, which represent:
  - $\alpha$: the minimum score guaranteed by the maximizing player,
  - $\beta$: the maximum score guaranteed by the minimizing player
- At the beginning of the game, $\alpha = - \infty, \beta = \infty $ applies (i.e. both players start with their worst possible score).
- Whenever the minimizing player's maximum guaranteed score ("beta") becomes less than the maximizing player's minimum guaranteed score ("alpha") (i.e. $\beta \leq \alpha$), the maximizing player does not have to explore moves based on the current node because it's clear they don't represent the optimal strategy and will not be achieved in the game.

__Example__ - a pruned minimax tree from the previous example:

![alfa-beta tree](https://www.fi.muni.cz/~novacek/courses/pb016/labs/img/alphabeta.png)

### Modified game play function
- A cycle of alternating moves of `'X'` and`' O'` players until one of them wins, or until a draw occurs.
- The same procedure as before, only uses versions of the `mini` and` maxi` functions extended by alpha-beta pruning.
- Instead of initial infinite values for $\alpha, \beta$, we use $-2, 2$ (for the Tic Tac Toe game defined in this way, this means de facto the same).

In [41]:
def play_alpha_beta(game):
    """A new version of the function for running the game that utilizes
    alpha-beta prunning. Just like the plain mini-max version, the function
    alternates the moves of the 'X' and 'O' players until one of them wins,
    or until they draw. The function loads the user input and actually executes
    all the moves in alternating turns.

    Parameters
    ----------
    game : Game
        An object representing the game.
    """

    while True:
        game.draw_board()
        game.result = game.get_result()

        if game.result != None:
            if game.result == 'X':
                print('The winner is X!')
            elif game.result == 'O':
                print('The winner is O!')
            elif game.result == '.':
                print('Draw!')


            game.initialize_game()
            return

        if game.player_turn == 'X':

            while True:
                print('Calculating the recommended optimal move...')
                start = time.time()
                # updated mini function with pruning
                (m, qx, qy) = mini_alpha_beta(game,-2, 2)
                end = time.time()
                print(f'Calculation took {1000*(end - start):.4} ms')
                print(f'Recommended move: X = {qx}, Y = {qy}')

                correct_format, px, py = False, -1, -1
                while not correct_format:
                    try:
                        px = int(input('Enter the X coordinate: '))
                        py = int(input('Enter the Y coordinate: '))
                        correct_format = True
                    except ValueError:
                        print('Invalid format, please repeat.')
                        correct_format = False

                if game.is_valid(px, py):
                    game.current_state[px][py] = 'X'
                    game.player_turn = 'O'
                    break
                else:
                    print('Invalid move, please repeat.')

        else:
            # updated maxi function with pruning
            (m, px, py) = maxi_alpha_beta(game, -2, 2)
            game.current_state[px][py] = 'O'
            game.player_turn = 'X'

### __Exercise 2.1: Pruned move selection for the AI__
- Implement a function to select the optimal move for the AI (i.e. maximize the end value of the game from the current position).
- This time, however, prune the tree of possible moves to explore.

In [46]:
def maxi_alpha_beta(game, alpha, beta):
    """The part of the mini-max algorithm with alpha-beta pruning that selects
    the optimal move for the maximizing player 'O' (in this case it would be
    the AI).

    Parameters
    ----------
    game : Game
        An object representing the game.
    alpha : int
        The current minimum score guaranteed by the maximizing player.
    beta : int
        The current maximum score guaranteed by the minimizing player.

    Returns
    -------
    tuple
        A tuple consisting of three elements:
        - int : the maximized value of the game using the suggested optimal move
        - int : the "x" coordinate of the suggested optimal move
        - int : the "y" coordinate of the suggested optimal move
    """

    maxv = -2
    px = None
    py = None

    result = game.get_result()

    if result == 'X':
        return (-1, None, None)
    elif result == 'O':
        return (1, None, None)
    elif result == '.':
        return (0, None, None)

    '''
    START OF MY CODE
    '''
    for nx in range(3):
        for ny in range(3):
            if game.is_valid(nx, ny):
                game.play(nx, ny)
                m, x, y = mini_alpha_beta(game, alpha, beta)
                if m >= beta:
                    game.undo()
                    return m, nx, ny
                if maxv < m:
                    maxv, px, py = m, nx, ny
                    alpha = max(alpha, m)
                game.undo()
    '''
    END OF MY CODE
    '''

    return maxv, px, py

### __Exercise 2.2: Simulation of human move selection with pruning__
- Implement a function for selecting the optimal move for the human (i.e. minimizing the goal value of the game from the current position).
- This time, however, prune the tree of possible moves to explore.

In [47]:
def mini_alpha_beta(game, alpha, beta):
    """The part of the mini-max algorithm with alpha-beta pruning that selects
    the optimal move for the minimizing player 'X' (in this case it would be a
    human).

    Parameters
    ----------
    game : Game
        An object representing the game.
    alpha : int
        The current minimum score guaranteed by the maximizing player.
    beta : int
        The current maximum score guaranteed by the minimizing player.

    Returns
    -------
    tuple
        A tuple consisting of three elements:
        - int : the minimized value of the game using the suggested optimal move
        - int : the "x" coordinate of the suggested optimal move
        - int : the "y" coordinate of the suggested optimal move
    """

    minv = 2

    qx = None
    qy = None

    result = game.get_result()

    if result == 'X':
        return (-1, None, None)
    elif result == 'O':
        return (1, None, None)
    elif result == '.':
        return (0, None, None)

    '''
    START OF MY CODE
    '''
    for nx in range(3):
        for ny in range(3):
            if game.is_valid(nx, ny):
                game.play(nx, ny)
                m, x, y = maxi_alpha_beta(game, alpha, beta)
                if m <= alpha:
                    game.undo()
                    return m, nx, ny
                if minv > m:
                    minv, qx, qy = m, nx, ny
                    alpha = min(m, alpha)
                game.undo()
    '''
    END OF MY CODE
    '''

    return (minv, qx, qy)

### __Let's play!__

In [48]:
g = Game()
play_alpha_beta(g)

. | . | .
. | . | .
. | . | .

Calculating the recommended optimal move...
Calculation took 873.9 ms
Recommended move: X = 0, Y = 0


Enter the X coordinate:  0
Enter the Y coordinate:  0


X | . | .
. | . | .
. | . | .

X | . | .
. | O | .
. | . | .

Calculating the recommended optimal move...
Calculation took 23.59 ms
Recommended move: X = 0, Y = 1


Enter the X coordinate:  0
Enter the Y coordinate:  1


X | X | .
. | O | .
. | . | .

X | X | O
. | O | .
. | . | .

Calculating the recommended optimal move...
Calculation took 1.702 ms
Recommended move: X = 2, Y = 0


Enter the X coordinate:  2
Enter the Y coordinate:  0


X | X | O
. | O | .
X | . | .

X | X | O
O | O | .
X | . | .

Calculating the recommended optimal move...
Calculation took 0.4861 ms
Recommended move: X = 1, Y = 2


Enter the X coordinate:  1
Enter the Y coordinate:  2


X | X | O
O | O | X
X | . | .

X | X | O
O | O | X
X | O | .

Calculating the recommended optimal move...
Calculation took 0.08249 ms
Recommended move: X = 2, Y = 2


Enter the X coordinate:  2
Enter the Y coordinate:  2


X | X | O
O | O | X
X | O | X

Draw!


### __Food for thought__
- How to modify the exercise code for larger playing areas, or a more difficult winning condition (i.e. a requirement to reach 4 or more fields in a row)?
- How will these changes affect the exploration efficiency?

---

#### _Final note_ - the materials used in this notebook are original works adapted from original works as follows:
- Examples of game trees and code:
  - Based on materials from the [Stack Abuse](https://stackabuse.com/) site
  - Author: [Mina Krivokuća](mina.krivokuca@gmail.com)
  - License: N/A (adapted for internal use in PB016 at FI MU with the kind permission of the author and David Landup, the StackAbuse operator)