<html>
<div>
  <img src="https://www.engineersgarage.com/wp-content/uploads/2021/11/TCH36-01-scaled.jpg" width=360px width=auto style="vertical-align: middle;">
  <span style="font-family: Georgia; font-size:30px; color: white;"> <br/> University of Tehran <br/> AI_CA3 <br/> Spring 02 </span>
</div>
<span style="font-family: Georgia; font-size:15pt; color: white; vertical-align: middle;"> low_mist - std id: 810100186 </span>
</html>

in this notebook we are to learn about adversarial search and minimax method to solve them.

## Problem Description
in this problem we have to play [othello](https://www.eothello.com) against computer which only does random moves. we have to find a way to defeat it and we will also look into some optimization like alpha-beta pruning in order to improve the running time of the algorithm. 

In [7]:
from __future__ import annotations
import random
import time
import turtle
import math
import time
from copy import deepcopy
from enum import Enum

### Othello UI
It will draw our board for us to have better experience using the app.

In [8]:
class OthelloUI:
    def __init__(self, board_size = 6, square_size = 60):
        self.board_size = board_size
        self.square_size = square_size
        self.screen = turtle.Screen()
        self.screen.setup(self.board_size * self.square_size + 50, self.board_size * self.square_size + 50)
        self.screen.bgcolor('white')
        self.screen.title('Othello low mist')
        self.pen = turtle.Turtle()
        self.pen.hideturtle()
        self.pen.speed(0)
        turtle.tracer(0, 0)

    def draw_board(self, board):
        self.pen.penup()
        x, y = -self.board_size / 2 * self.square_size, self.board_size / 2 * self.square_size
        for i in range(self.board_size):
            self.pen.penup()
            for j in range(self.board_size):
                self.pen.goto(x + j * self.square_size, y - i * self.square_size)
                self.pen.pendown()
                self.pen.fillcolor('green')
                self.pen.begin_fill()
                self.pen.setheading(0)
                for _ in range(4):
                    self.pen.forward(self.square_size)
                    self.pen.right(90)
                self.pen.penup()
                self.pen.end_fill()
                self.pen.goto(x + j * self.square_size + self.square_size / 2,
                              y - i * self.square_size - self.square_size + 5)
                if board[i][j] == 1:
                    self.pen.fillcolor('white')
                    self.pen.begin_fill()
                    self.pen.circle(self.square_size / 2 - 5)
                    self.pen.end_fill()
                elif board[i][j] == -1:
                    self.pen.fillcolor('black')
                    self.pen.begin_fill()
                    self.pen.circle(self.square_size / 2 - 5)
                    self.pen.end_fill()

        turtle.update()

## The Main Game Class
The main class `Othello` is defined here which has different methods:
- `constructor`: we initialize some useful values that will be used later.
Some utilities for determining winner, getting valid moves, check for terminal and so forth.
- `get_winner`
- `get_valid_moves`
- `terminal_test`
- `get_cpu_move`
- `get_human_move`  
---
And the main method is play which will run and gives us the result. 1 means that we have won, 0 indicates draw and -1 shows that we lost.  
Since we want to run algorithm in different modes (i.e. with and without pruning) with different minimax depth and check their running times we put these two setter to alter some values.
- `set_minimax_depth`
- `set_pruning`  
---
And there are some functions for evaluation such as
- `count_corners`
- `count_borders`
- `count_total`
- `heuristic`: which uses all the above function to map any state to a float value  

In [9]:
HUMAN, COMPUTER = 1, -1
Move = tuple[int, int]

class Othello:
    def __init__(self, ui = False, minimax_depth = 1, prune = True):
        self.size = 6
        self.ui = OthelloUI(self.size) if ui else None
        self.board = [[0 for _ in range(self.size)] for _ in range(self.size)]
        self.board[int(self.size / 2) - 1][int(self.size / 2) - 1] = self.board[int(self.size / 2)][
            int(self.size / 2)] = 1
        self.board[int(self.size / 2) - 1][int(self.size / 2)] = self.board[int(self.size / 2)][
            int(self.size / 2) - 1] = -1
        self.current_turn = random.choice([1, -1])
        self.minimax_depth = minimax_depth
        self.prune = prune
        self.CORNER_WEIGHT = 10
        self.BORDER_WEIGHT = 2
        self.TOTAL_WEIGHT = 1
        self.WIN_HEURISTIC = 1000
        self.seen_nodes = 0
        
    def set_minimax_depth(self, depth: int):
        self.minimax_depth = depth
        
    def set_pruning(self, prune: bool):
        self.prune = prune

    def get_winner(self):
        white_count = sum([row.count(HUMAN) for row in self.board])
        black_count = sum([row.count(COMPUTER) for row in self.board])
        if white_count > black_count:
            return HUMAN
        elif white_count < black_count:
            return COMPUTER
        else:
            return 0

    def get_valid_moves(self, player):
        moves = set()
        for i in range(self.size):
            for j in range(self.size):
                if self.board[i][j] == 0:
                    for di in [-1, 0, 1]:
                        for dj in [-1, 0, 1]:
                            if di == 0 and dj == 0:
                                continue
                            x, y = i, j
                            captured = []
                            while 0 <= x + di < self.size and 0 <= y + dj < self.size and self.board[x + di][
                                    y + dj] == -player:
                                captured.append((x + di, y + dj))
                                x += di
                                y += dj
                            if 0 <= x + di < self.size and 0 <= y + dj < self.size and self.board[x + di][
                                    y + dj] == player and len(captured) > 0:
                                moves.add((i, j))
        return list(moves)

    def make_move(self, player, move):
        i, j = move
        self.board[i][j] = player
        for di in [-1, 0, 1]:
            for dj in [-1, 0, 1]:
                if di == 0 and dj == 0:
                    continue
                x, y = i, j
                captured = []
                while 0 <= x + di < self.size and 0 <= y + dj < self.size and self.board[x + di][y + dj] == -player:
                    captured.append((x + di, y + dj))
                    x += di
                    y += dj
                if 0 <= x + di < self.size and 0 <= y + dj < self.size and self.board[x + di][y + dj] == player:
                    for (cx, cy) in captured:
                        self.board[cx][cy] = player

    def get_cpu_move(self):
        moves = self.get_valid_moves(COMPUTER)
        if len(moves) == 0:
            return None
        return random.choice(moves)

    def get_human_move(self):
        value, move = self.minimax(self.minimax_depth, HUMAN)
        return move
    
    def minimax(self, depth: int, turn: int, alpha: float = -math.inf, beta: float = math.inf) -> tuple[int, Move]:
        self.seen_nodes += 1
        if self.terminal_test():
            value = self.WIN_HEURISTIC if self.get_winner() == HUMAN else -self.WIN_HEURISTIC
            return value, None
        
        if depth <= 0:
            return self.heuristic(), None
        
        backup_board = [[x for x in row] for row in self.board]
        optimal_move = None
        
        if turn == HUMAN and len(self.get_valid_moves(turn)) == 0:
            turn *= -1
        
        if turn == HUMAN:
            node_value = -math.inf
            for move in self.get_valid_moves(HUMAN):
                self.make_move(HUMAN, move)
                value, successor_move = self.minimax(depth - 1, COMPUTER, alpha, beta)
                self.board = [[x for x in row] for row in backup_board]
                if value > node_value:
                    optimal_move = move
                    node_value = value
                    if self.prune and node_value >= beta:
                        break
                    alpha = max(alpha, value)
                
            return node_value, optimal_move 
        
        elif turn == COMPUTER:
            node_value = math.inf
            for move in self.get_valid_moves(COMPUTER):
                self.make_move(COMPUTER, move)
                value, successor_move = self.minimax(depth - 1, HUMAN, alpha, beta)
                self.board = [[x for x in row] for row in backup_board]
                if value < node_value:
                    optimal_move = move
                    node_value = value
                    if self.prune and node_value <= alpha:
                        break
                    beta = min(beta, value)
                    
            return node_value, optimal_move 
        
    def heuristic(self) -> int:
        human_corners = self.count_corners(HUMAN)
        computer_corners = self.count_corners(COMPUTER)
        corners_coefficient = (human_corners - computer_corners) 
        
        human_total = self.count_total(HUMAN)
        computer_total = self.count_total(COMPUTER)
        total_coefficient = (human_total - computer_total)
        
        return self.CORNER_WEIGHT * corners_coefficient + self.TOTAL_WEIGHT * total_coefficient
            #    self.BORDER_WEIGHT * self.count_empty() * (self.count_borders(HUMAN) - self.count_borders(COMPUTER)) + \
    
    def count_corners(self, player: int) -> int:
        sum = 0
        sum += self.board[0][0] == player
        sum += self.board[0][-1] == player
        sum += self.board[-1][0] == player
        sum += self.board[-1][-1] == player
        return sum
               
    def count_borders(self, player: int) -> int:
        sum = 0
        for i in range(self.size):
            sum += self.board[0][i] == player
            sum += self.board[-1][i] == player
            sum += self.board[i][0] == player
            sum += self.board[i][-1] == player
        return sum
    
    def count_total(self, player: int) -> int:
        return sum(row.count(player) for row in self.board)
        
    def terminal_test(self):
        return len(self.get_valid_moves(HUMAN)) == 0 and len(self.get_valid_moves(COMPUTER)) == 0

    def play(self):
        winner = None
        while not self.terminal_test():
            if self.ui:
                self.ui.draw_board(self.board)
            if self.current_turn == HUMAN:
                move = self.get_human_move()
                if move:
                    self.make_move(self.current_turn, move)
            else:
                move = self.get_cpu_move()
                if move:
                    self.make_move(self.current_turn, move)
            self.current_turn = -self.current_turn
            if self.ui:
                self.ui.draw_board(self.board)
                time.sleep(1)
                
        winner = self.get_winner()
        return winner
    
    def reset(self):
        self.board = [[0 for _ in range(self.size)] for _ in range(self.size)]
        self.board[int(self.size / 2) - 1][int(self.size / 2) - 1] = self.board[int(self.size / 2)][
            int(self.size / 2)] = 1
        self.board[int(self.size / 2) - 1][int(self.size / 2)] = self.board[int(self.size / 2)][
            int(self.size / 2) - 1] = -1
        self.current_turn = random.choice([1, -1])
        self.seen_nodes = 0

In above class `minimax` is a recursive function that runs the algorithm.  
It takes 4 parameters:

- `depth`: The current depth (which decreases every time we go deeper into the tree)
- `turn`: Specifies which player's turn it is.
- `alpha`: The maximum of the ancestor's branches along the path to the current node so far.
- `beta`: The minimum of the ancestor's branches along the path to the current node so far.

If the `prune` flag is false, alpha and beta do not do anything.

At each recursion, it is checked whether it's over or not.
If we have reached the depth limit of the tree and the game in that path is not over yet, we use the `heuristic` function to score the current state.  
Next if we it's someone's turn but we don't have any available move we change the turn (since terminate is not called we are sure that are player still has a move). Note that it's different from the one in the play method.  
Next, based on the current turn, for each move we change and minimax is recursively called again and then we undo the changes that me made. alpha and beta will be updated accordingly. they are first initialized to $\pm\infty$ so we can update them and also depth will get othello.minimax_depth when it's called.   
`minimax` returns a tuple of the score and move.  
We use minimax algorithm for the games that the opponent acts smartly. However, in this game, the opponent acts randomly so better choice would have been expectimax because we are somehow overprotective in this algorithm.

## Results 

In [10]:
TOTAL_TESTS = 100

def test(self, depth: int, prune: bool = True, num_of_test: int = TOTAL_TESTS) -> tuple[float, float, int]:
        ui = self.ui
        self.ui = None
        
        win = 0
        time_elapsed = 0
        seen_nodes = 0
        self.set_minimax_depth(depth)
        self.set_pruning(prune)
        
        for _ in range(num_of_test):
            start = time.time()
            win += (self.play() == HUMAN)
            time_elapsed += time.time() - start
            seen_nodes += self.seen_nodes
            self.reset()
            
        self.ui = ui
        
        return time_elapsed / num_of_test, win / num_of_test, seen_nodes // num_of_test       
    
Othello.test = test

### without pruning results

In [11]:
without_prune_depth = [1, 3, 5]

othello = Othello()
for i in without_prune_depth:
    average_time, win_percentage, seen_nodes = othello.test(i, prune = False, num_of_test = TOTAL_TESTS // 10 if i == 5 else TOTAL_TESTS)
    print(f"for depth {i}:")
    print(f"average time for each game was: {average_time}")
    print(f"win percentage: {win_percentage}")
    print(f"average number of seen nodes: {seen_nodes}")
    print("------------------------------------------")

for depth 1:
average time for each game was: 0.013106818199157716
win percentage: 0.98
average number of seen nodes: 100
------------------------------------------
for depth 3:
average time for each game was: 0.41977289199829104
win percentage: 0.97
average number of seen nodes: 3491
------------------------------------------
for depth 5:
average time for each game was: 13.243526005744934
win percentage: 1.0
average number of seen nodes: 129499
------------------------------------------


### with pruning results

In [None]:
with_prune_depth = [1, 3, 5, 7]

for i in with_prune_depth:
    average_time, win_percentage, seen_nodes  = othello.test(i, prune = True, num_of_test = TOTAL_TESTS // 2 if i == 7 else TOTAL_TESTS)
    print(f"for depth {i}:")
    print(f"average time for each game was: {average_time}")
    print(f"win percentage: {win_percentage}")
    print(f"average number of seen nodes: {seen_nodes}")
    print("------------------------------------------")

for depth 1:
average time for each game was: 0.011689422130584716
win percentage: 0.95
average number of seen nodes: 97
------------------------------------------
for depth 3:
average time for each game was: 0.21613157272338868
win percentage: 1.0
average number of seen nodes: 1562
------------------------------------------
for depth 5:
average time for each game was: 2.18675642490387
win percentage: 1.0
average number of seen nodes: 18509
------------------------------------------
for depth 7:
average time for each game was: 19.176905393600464
win percentage: 1.0
average number of seen nodes: 235853
------------------------------------------


## Questions

### 1. What did you take into consideration while writing heuristic function?
  number of conquered corners with high coefficient since it can never be owned by our rival and it can be influential in the end. And obviously total number of conquered cells.

### 2. What is the depth effect on total seen nodes, time, winning percentage? 
  As expected as we dive deeper and check more steps ahead we will have a better chance of winning, but everything comes with a price, the more nodes will get checked which means more memory and more time for checking them. We have to restore balance between them based on our needs. you can check the numbers and see the difference.

### 3. What is the effect of children orders in performance as we use alpha beta pruning?
   Worst ordering: In some cases, alpha-beta pruning algorithm does not prune any of the leaves of the tree, and works exactly as minimax algorithm. In this case, it also consumes more time because of alpha-beta factors, such a move of pruning is called worst ordering. In this case, the best move occurs on the right side of the tree. The time complexity for such an order is O(bm).
   Ideal ordering: The ideal ordering for alpha-beta pruning occurs when lots of pruning happens in the tree, and best moves occur at the left side of the tree. We apply DFS hence it first search left of the tree and go deep twice as minimax algorithm in the same amount of time. Complexity in ideal ordering is O(bm/2). you can get more information from this [link](https://www.javatpoint.com/ai-alpha-beta-pruning).

### 4. How does the branching factor change during the game?
   as we get to the middle of the game it gets bigger since there are more options compare to the beginning. But as we get closer to the end it reduces cause there are not too many empty cells, So it has kinda a form of normal distribution.

### 5. How does alpha-beta pruning helps to reduce time without dropping accuracy? 
   in alpha-beta pruning we only prune branches that we are sure we will never need for instance if the father of a node is max node so we are min node, and we seen a value which is less than that of our father, so regardless of what we see next we will never be able to change our father's value so it's useless to check them.

### 6. Was this algorithm efficient for this project?
   since our rival is acting randomly so the best choice as explained before is expectimax because we are too protective and a little risk can be beneficial.