# Implementation Ass_5

### Implement tree search 

Alternatives to monte carlo_

There are several alternatives to Monte Carlo for tree search implementations to learn to win at Tic-tac-toe:

- Minimax algorithm: This is a classic algorithm for turn-based games. It works by recursively exploring all possible moves and their outcomes, and then selecting the move that maximizes the minimum outcome.

- Alpha-beta pruning: This is a variant of the minimax algorithm that eliminates branches of the search tree that are guaranteed to be worse than previously explored branches. This can significantly reduce the search space and speed up the algorithm.

- Monte Carlo Tree Search (MCTS) with a policy network: Instead of using random rollouts, MCTS can be augmented with a neural network that predicts the value of each move based on the current board state. This allows the algorithm to focus its search on promising moves, improving its efficiency.

- Reinforcement learning: This approach involves training a neural network to predict the optimal move for any given board state, using a combination of supervised learning and self-play. The network is then used to guide the search during gameplay.

Each of these approaches has its own strengths and weaknesses, and the best choice depends on the specific requirements of the task at hand. For example, if computation time is limited, then MCTS with a policy network might be the best choice, while if accurate evaluation of all possible moves is critical, then minimax with alpha-beta pruning might be the way to go.

Kai: tycker vi kör på MCTS (eventuellt med tillägget av en neural network policy) då den kräver mindre beräkningtid/kraft, skalar bättre och är en lite coolare implementation som används för modeller av shack eller go 🤩 MCTS har också förmågan att utforska nya strategier som kanske inte är uppenbara med traditionella sökmetoder som minimax med alfa-beta-sortering. ->På bekostnad av att den inte garanterar bästa valet. Just för 3x3 brädet så hade vi likväl kunnat garantera bästa valet.   

In [16]:
import numpy as np
import math
import random
import copy

In [64]:
import numpy as np
import math
import random

class TicTacToeNode:
    def __init__(self, state, parent=None, move=None):
        self.state = state
        self.parent = parent
        self.move = move
        self.visits = 0
        self.wins = 0
        self.children = []
        self.untried_moves = state.get_available_moves()

    def select_child(self):
        # Use UCT formula to select the best child
        C = 0.8
        log_N = math.log(self.visits)

        def uct(node):
            exploitation_term = node.wins / node.visits
            exploration_term = C * math.sqrt(log_N / node.visits)
            uct_score = exploitation_term + exploration_term
            return uct_score

        return max(self.children, key=uct)

    def expand(self):
        # Choose a random untried move and create a new child node with that move
        copy_state = copy.deepcopy(self.state)
        move = random.choice(self.untried_moves)
        copy_state.make_move(move)

        new_node = TicTacToeNode(copy_state, parent=self, move=move)
        self.children.append(new_node)
        self.untried_moves.remove(move)
        return new_node

    def update(self, result):
        # Update the node with the result of a simulation
        self.visits += 1
        self.wins += result

    def get_best_move(self):
        # Return the move that leads to the child with the highest number of visits
        children_visits = [(child.visits, child.move) for child in self.children]
        children_visits.sort(reverse=True)
        return children_visits[0][1]

class MCSTAgent:
    def __init__(self):
        self.root = None

    def get_move(self, state):
        # Create a new search tree from the current state
        self.root = TicTacToeNode(state)

        # Run the MCST algorithm for a fixed number of iterations
        for i in range(10000):
            node = self.root

            # Selection: traverse the tree using UCB1 until a leaf node is reached
            while node.untried_moves == [] and node.children != []:
                node = node.select_child()

            # Expansion: if the node is not a terminal state, expand it by adding a new child node
            if node.untried_moves != []:
                node = node.expand()

            # Simulation: simulate a game from the new node until a result is obtained
            while node.state.winner is None:

                move = random.choice(node.state.get_available_moves())
                node.state.make_move(move)

            # Backpropagation: update the nodes visited and wins count for all nodes in the path from the new node to the root
            while node is not None:
                node.update(1 if node.state.winner == 1 else 0)
                node = node.parent

        # Get the best move from the current state by choosing the child with the highest number of visits
        best_move = self.root.get_best_move()

        return best_move


In [61]:
class TicTacToe:
    def __init__(self):
        self.board = [0] * 9
        self.current_player = 1
        self.winner = None

    def get_available_moves(self):
        return [i for i, val in enumerate(self.board) if val == 0]

    def make_move(self, move):
        self.board[move] = self.current_player
        self.check_gameover()
        self.switch_player()

    def switch_player(self):
        self.current_player = - self.current_player

    def current_state(self):
        return np.array(self.board).reshape((3, 3)).tolist()

    def check_gameover(self):
        for i in range(3):
            if self.board[i * 3] == self.board[i * 3 + 1] == self.board[i * 3 + 2] != 0:
                self.winner = self.board[i]
            if self.board[i] == self.board[i + 3] == self.board[i + 6] != 0:
                self.winner = self.board[i]
        if self.board[0] == self.board[4] == self.board[8] != 0:
            self.winner = self.board[0]
        if self.board[2] == self.board[4] == self.board[6] != 0:
            self.winner = self.board[2]
        if all(val != 0 for val in self.board):
            self.winner = 0

    def display(self):
        print("-------------")
        for i in range(3):
            print(f"| {self.board[i*3]} | {self.board[i*3+1]} | {self.board[i*3+2]} |")
            print("-------------")

In [62]:
def play_game():
    # Create a new TicTacToe game and an MCST agent
    game = TicTacToe()
    agent = MCSTAgent()

    # Main game loop
    while game.winner is None:
        # Display the current state of the game
        game.display()

        # If it's the player's turn, prompt them for a move and make the move
        if game.current_player == 1:
            valid_move = False
            while not valid_move:
                move = int(input("Enter your move (0-8): "))
                if move in game.get_available_moves():
                    valid_move = True
                    game.make_move(move)
                else:
                    print("Invalid move. Try again.")
        # If it's the agent's turn, get the agent's move and make the move
        else:
            game_copy = copy.deepcopy(game)
            move = agent.get_move(game_copy)
            print(f"Agent plays move {move}")
            game.make_move(move)

    # Display the final state of the game and the winner
    game.display()
    if game.winner == 0:
        print("Tie game!")
    elif game.winner == 1:
        print("Player wins!")
    else:
        print("Agent wins!")


In [65]:
play_game()

-------------
| 0 | 0 | 0 |
-------------
| 0 | 0 | 0 |
-------------
| 0 | 0 | 0 |
-------------
-------------
| 0 | 0 | 0 |
-------------
| 0 | 1 | 0 |
-------------
| 0 | 0 | 0 |
-------------
Agent plays move 7
-------------
| 0 | 0 | 0 |
-------------
| 0 | 1 | 0 |
-------------
| 0 | -1 | 0 |
-------------
-------------
| 0 | 0 | 0 |
-------------
| 1 | 1 | 0 |
-------------
| 0 | -1 | 0 |
-------------
Agent plays move 1
-------------
| 0 | -1 | 0 |
-------------
| 1 | 1 | 0 |
-------------
| 0 | -1 | 0 |
-------------


KeyboardInterrupt: Interrupted by user

In [32]:
play()

[0, 0, 0]
[0, 0, 0]
[0, 0, 0]
Draw.


## Experimentation with exploration constant

Try different exploration constants and maximum iterations to see how they affect the performance of the algorithm.

# Evaluation

Evaluate your algorithm and comment on its pros and cons. For example, is it fast? Is it sample efficient? Is the learned policy competitive? Does it lose? Would you, as a human, beat it? Would it scale well to larger grids such as 4x4 or 5x5?

# Next gen TTT AI

We could try using a neural network to estimate the value of each state instead of simulating games to the end. This could speed up the search and improve the performance of the algorithm.

In [None]:
#Assume square board
BOARD_SIZE = 3

class State:
    def __init__(self, p1, p2):
        self.board = np.zeros((BOARD_ROWS, BOARD_COLS))
        self.p1 = p1
        self.p2 = p2
        self.isEnd = False
        self.boardHash = None
        # init p1 plays first
        self.playerSymbol = 1





# Reinforcement learning

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=1262dda2-abb7-4af7-a1b6-72164064af5a' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>