# Group 5 - Module 6: Game Playing Systems

***
### Group Members:
* **Nils Dunlop, 20010127-2359, Applied Data Science, e-mail: gusdunlni@student.gu.se (16 hours)**
* **Francisco Erazo, 19930613-9214, Applied Data Science, e-mail: guserafr@student.gu.se (16 hours)**

#### **We hereby declare that we have both actively participated in solving every exercise. All solutions are entirely our own work, without having taken part of other solutions." (This is independent and additional to any declaration that you may encounter in the electronic submission system.)**

# Assignment 6
***

## Problem 1: Reading and Reflection
***
AlphaGo is a computer program designed to play the board game Go. It was created by DeepMind Technologies, now a part of Google. Go is known for its complex strategies and the vast number of possible moves, presenting a significant challenge for traditional AI methods. AlphaGo uses advanced deep neural networks and Monte Carlo Tree Search (MCTS), enabling it to learn from a large amount of data, including recorded games between human experts and games it played against itself. Through this learning process, AlphaGo developed judgment and intuition similar to human players, allowing it to accurately predict moves and game outcomes.

The architecture of AlphaGo includes policy networks that suggest probable next moves, and value networks that predict the game's winner from any position, marking a step forward in using AI to tackle complex problems. Its training involved supervised learning from games played by human experts and reinforcement learning from self-play. This approach enabled the system to continuously refine its strategies and adjust to new challenges. The integration of MCTS with neural networks enabled efficient exploration and evaluation of possible moves, balancing between relying on known effective strategies and exploring new ones.

AlphaGo's success was demonstrated by its 99.8% win rate against other Go programs and its historic victory over European Go champion Fan Hui 5-0, marking the first time a computer program defeated a professional human player in Go. Given Go's complexity compared to chess, this achievement was seen as a significant milestone in AI.

The strategies employed by AlphaGo, and its implications, extend beyond just games. They showcase the potential of deep learning and reinforcement learning to address complex issues in various fields.



## Problem 2: Implementation
***

In [30]:
# Import libraries
from copy import deepcopy
import numpy as np

class TicTacToe:
    def __init__(self):
        self.board = [' ' for _ in range(9)]
        self.current_winner = None
        
    def print_board(self):
        for row in [self.board[i*3:(i+1)*3] for i in range(3)]:
            print('| ' + ' | '.join(row) + ' |')

    def available_moves(self):
        return [i for i, spot in enumerate(self.board) if spot == ' ']

    def empty_squares(self):
        return ' ' in self.board

    def make_move(self, square, letter):
        if self.board[square] == ' ':
            self.board[square] = letter
            if self.winner(square, letter):
                self.current_winner = letter
            return True
        return False

    def winner(self, square, letter):
        # Check row
        row_ind = square // 3
        row = self.board[row_ind*3:(row_ind + 1)*3]
        if all([s == letter for s in row]):
            return True
        # Check column
        col_ind = square % 3
        column = [self.board[col_ind+i*3] for i in range(3)]
        if all([s == letter for s in column]):
            return True
        # Check diagonals
        if square % 2 == 0:
            diagonal1 = [self.board[i] for i in [0, 4, 8]]
            if all([s == letter for s in diagonal1]):
                return True
            diagonal2 = [self.board[i] for i in [2, 4, 6]]
            if all([s == letter for s in diagonal2]):
                return True
        return False

In [31]:
class MonteCarloTreeSearchNode:
    def __init__(self, game, parent=None, move=None):
        self.game = deepcopy(game)
        self.parent = parent
        self.move = move
        self.children = []
        self.wins = 0
        self.visits = 0
        self.untried_actions = self.game.available_moves()
        self.player_just_moved = 'O' if self.game.current_winner == 'X' else 'X'
        
    def UCB1(self, total_visits, cp=1.21):
        if self.visits == 0:
            return float('inf')
        return self.wins / self.visits + cp * (np.sqrt(np.log(total_visits) / self.visits))
    
    def select_child(self):
        total_visits = sum(child.visits for child in self.children)
        return max(self.children, key=lambda c: c.UCB1(total_visits))

    def expand(self):
        move = self.untried_actions.pop()
        new_game = deepcopy(self.game)
        new_game.make_move(move, self.player_just_moved)
        child_node = MonteCarloTreeSearchNode(new_game, parent=self, move=move)
        self.children.append(child_node)
        return child_node

    def simulate(self):
        current_simulation_game = deepcopy(self.game)
        while current_simulation_game.empty_squares():
            possible_moves = current_simulation_game.available_moves()
            if not possible_moves:  # No moves left, game is a draw
                break
            move = np.random.choice(possible_moves)
            player_to_move = 'O' if self.player_just_moved == 'X' else 'X'
            current_simulation_game.make_move(move, player_to_move)
            if current_simulation_game.current_winner:
                break
            self.player_just_moved = player_to_move

        if current_simulation_game.current_winner == self.player_just_moved:
            return 1
        elif current_simulation_game.current_winner and current_simulation_game.current_winner != self.player_just_moved:
            return -1
        else:
            return 0  # Draw

    def backpropagate(self, result):
        self.visits += 1
        self.wins += result if self.player_just_moved == 'O' else -result
        if self.parent:
            self.parent.backpropagate(-result)

    def best_move(self, simulations_number=1000):
        for _ in range(simulations_number):
            node = self
            while node.untried_actions == [] and node.children != []:
                node = node.select_child()
            if node.untried_actions != []:
                node = node.expand()
            result = node.simulate()
            node.backpropagate(result)

        return max(self.children, key=lambda c: c.visits).move

In [32]:
# Initialize the game
game = TicTacToe()

# Set the starting player
current_player = 'X'

while game.empty_squares() and not game.current_winner:
    # Create the root node for MCTS with the current game state
    root = MonteCarloTreeSearchNode(game, move=None, parent=None)

    # Find the best move using MCTS
    best_move = root.best_move(simulations_number=10000)

    # Make the move
    game.make_move(best_move, current_player)

    # Print the board state
    game.print_board()
    print(f"Player {current_player} makes a move to square {best_move}")

    # Check for a winner
    if game.current_winner:
        print(f"Player {current_player} wins!")
        break
    elif not game.empty_squares():
        print("The game is a draw!")
        break

    # Switch player
    current_player = 'O' if current_player == 'X' else 'X'

|   |   |   |
|   |   |   |
|   | X |   |
Player X makes a move to square 7
|   |   |   |
|   |   |   |
| O | X |   |
Player O makes a move to square 6
|   |   |   |
| X |   |   |
| O | X |   |
Player X makes a move to square 3
|   |   | O |
| X |   |   |
| O | X |   |
Player O makes a move to square 2
|   |   | O |
| X |   | X |
| O | X |   |
Player X makes a move to square 5
|   | O | O |
| X |   | X |
| O | X |   |
Player O makes a move to square 1
|   | O | O |
| X |   | X |
| O | X | X |
Player X makes a move to square 8
|   | O | O |
| X | O | X |
| O | X | X |
Player O makes a move to square 4
Player O wins!


## References
***

- Choudhary, A. (2018). Reinforcement Learning Guide: Solving the Multi-Armed Bandit Problem from Scratch in Python. [online] Analytics Vidhya. Available at: https://www.analyticsvidhya.com/blog/2018/09/reinforcement-multi-armed-bandit-scratch-python/ [Accessed 23 Feb. 2024].

## Self Check
***
- Have you answered all questions to the best of your ability?
Yes, we have.
- Is all the required information on the front page, is the file name correct etc.?
Indeed, all the required information on the front page has been included.
- Anything else you can easily check? (details, terminology, arguments, clearly stated answers etc.?)
We have checked, and everything looks good.