### Designing a Neural Network for Tic-Tac-Toe, using numpy only

##### Neural Network Architecture:

Input Layer: 9 nodes, corresponding to the 9 squares on the Tic-Tac-Toe board.
Hidden Layer: 100 nodes, utilizing the sigmoid activation function.
Output Layer: 9 nodes, employing the softmax function for a probability distribution over possible moves.
Initialization of Weights and Biases:
Training Algorithm:


Data Generation:

Generate training data by employing a self-play strategy or other suitable methods.

Initialization:

Initialize the weights and biases of the neural network randomly.
Gameplay:

Play the game of Tic-Tac-Toe using the neural network to make the next move.

Training Loop:

Repeat the following steps until a stopping criterion is met:

If the neural network wins or ties the game, stop training.
If the game continues, calculate the error of the neural network's output.

Backpropagation:

Update the weights and biases of the neural network using the backpropagation algorithm with an appropriate learning rate.

Stopping Criterion:

Define a specific stopping criterion, such as achieving a certain win rate or reaching a predefined performance threshold.

Training Details:

Define evaluation metrics, such as win rate and tie rate, to track the neural network's performance during training.

Additional Considerations:

Implement a mechanism to handle draws during training, specifying how the network will adapt.

Include sections on validation and testing to ensure the neural network's generalization to unseen data.


In [1]:
import numpy as np
from numba import jit, cuda

class AI():
    """
    The neural network consists of three layers: an input layer, a hidden layer, and an output layer.
    The input layer has 9 nodes, which correspond to the 9 cells of the 3x3 tic-tac-toe board.
    The hidden layer has 100 nodes, and the output layer has 9 nodes, which also correspond to the 9 cells of the board.
    The output of the network is a probability distribution over the 9 cells, indicating the likelihood of each cell being the best move.

    The forward method performs the forward propagation, where the input (the board state) is passed through the network to generate the output (the move probabilities).
    The backward method performs the backward propagation, where the weights and biases are updated based on the error between the predicted and actual output.
    The actual output in this case is determined by the reward received after making a move.
    The reward is typically 1 for a win, -1 for a loss, and 0 for a draw or an ongoing game.
    """

    def __init__(self):
        # Initialize the game board as a 3x3 matrix filled with zeros
        self.board = np.zeros((3, 3))
        # Set the starting player
        self.player = 1
        # Initialize the weights for the input layer (9 nodes) to the hidden layer (100 nodes)
        self.weights1 = np.random.randn(9, 100)
        # Initialize the weights for the hidden layer (100 nodes) to the output layer (9 nodes)
        self.weights2 = np.random.randn(100, 9)
        # Initialize the bias for the hidden layer (100 nodes)
        self.bias1 = np.random.randn(100)
        # Initialize the bias for the output layer (9 nodes)
        self.bias2 = np.random.randn(9)
        # Set the learning rate for the gradient descent
        self.learning_rate = 0.01
        # Set the initial epsilon value for the epsilon-greedy strategy
        self.epsilon = 1.0

    def sigmoid(self, x):
        # Define the sigmoid activation function
        return 1 / (1 + np.exp(-x))

    def softmax(self, x):
        # Define the softmax activation function
        return np.exp(x) / np.sum(np.exp(x))

    def forward(self):
        # Perform the forward propagation
        # Flatten the board to a 1D array and pass it through the input layer to the hidden layer
        self.layer1 = self.sigmoid(np.dot(self.board.flatten(), self.weights1) + self.bias1)
        # Pass the output of the hidden layer to the output layer
        self.output = self.softmax(np.dot(self.layer1, self.weights2) + self.bias2)

    def backward(self, reward):
        # Perform the backward propagation
        # Update the weights and biases based on the error between the predicted and actual output
        self.weights1 -= self.learning_rate * np.outer(self.board.flatten(), np.dot(self.weights2, (self.output - self.board.flatten()))) * reward
        self.weights2 -= self.learning_rate * np.outer(self.layer1, (self.output - self.board.flatten())) * reward
        self.bias1 -= self.learning_rate * np.dot(self.weights2, (self.output - self.board.flatten())) * reward
        self.bias2 -= self.learning_rate * (self.output - self.board.flatten()) * reward
            
    def make_move(self):
        self.forward()
        move = np.argmax(self.output)
        i = move // 3
        j = move % 3
        if self.board[i, j] == 0:
            self.board[i, j] = self.player
            self.player *= -1
        else:
            self.board[np.random.randint(3), np.random.randint(3)] = self.player
            self.player *= -1
            
    def ismoveValid(self, i, j):
        # check if user passed the move i.e. -1
        if i == -1 or j == -1:
            return -1
        # check values of i and j provided by user
        if i<0 or i>2 or j<0 or j>2:
            return 1
        # check if the position is already occupied
        if self.board[i, j] != 0:
            return 1
        return 0
         
    # run computation on GPU
    @jit(target_backend ="cuda")
    def train(self, iterations):
        # Train the AI for a specified number of iterations
        for i in range(iterations):
            # Reset the board and player for each game
            self.board = np.zeros((3, 3))
            self.player = 1
            # Play the game until someone wins
            while check_win(self.board) == 0:
                self.make_move()
            # Calculate the reward based on the game outcome
            reward = get_reward(self.board)
            # Update the weights and biases based on the reward
            self.backward(reward)
            # Print the game outcome every 1000 iterations
            if i % 1000 == 0:
                print("Iteration " + str(i) + ": " + str(check_win(self.board)))
            # Decay the epsilon value
            self.epsilon *= 0.99

            
    def play_game(self):
        self.board = np.zeros((3, 3))
        self.player = 1
        while check_win(self.board) == 0:
            print(self.board)
            if self.player == 1:
                i = int(input("Enter row for player " + str(self.player) + ": "))
                j = int(input("Enter column for player " + str(self.player) + ": "))
                if make_move(self.board, self.player, i, j):
                    self.player *= -1
            else:
                self.make_move()
        print("Player " + str(self.player * -1) + " wins!")
        
    def play_against_human(self):
        self.board = np.zeros((3, 3))
        self.player = 1
        while check_win(self.board) == 0:
            print(self.board)
            if self.player == 1:
                i = int(input("Enter row for player " + str(self.player) + ": "))
                j = int(input("Enter column for player " + str(self.player) + ": "))
                
                # invalid i, j
                if self.ismoveValid(i, j) == 1:
                    print("Invalid move! Try again")
                    continue
                # the player passed the move i.e. -1
                elif self.ismoveValid(i, j) == -1:
                    self.player *= -1
                    continue
                # combining both conditions in one line
                if make_move(self.board, self.player, i, j):
                    self.player *= -1
            else:
                # perform the forward propagation
                self.forward()
                # get the move probabilities from the output layer
                move_probs = self.output
                # sort the moves based on the probabilities
                sorted_moves = np.argsort(move_probs)[::-1]
                for move in sorted_moves:
                    # convert the move to board coordinates
                    i = move // 3
                    j = move % 3
                    # if the move is valid, make the move and break the loop
                    if make_move(self.board, self.player, i, j):
                        self.player *= -1
                        break   
        print("Player " + str(self.player * -1) + " wins!")
    
    
# tic-tac-toe from scratch
# Initialize the board
def init_board():
    return np.zeros((3, 3))

# Check if a player has won
def check_win(board):
    for i in range(3):
        if np.all(board[i, :] == 1) or np.all(board[:, i] == 1):
            return 1
        elif np.all(board[i, :] == -1) or np.all(board[:, i] == -1):
            return -1
    if np.all(np.diag(board) == 1) or np.all(np.diag(np.fliplr(board)) == 1):
        return 1
    elif np.all(np.diag(board) == -1) or np.all(np.diag(np.fliplr(board)) == -1):
        return -1
    return 0

# Make a move
def make_move(board, player, i, j):
    if board[i, j] == 0:
        board[i, j] = player
        return True
    return False

# Calculate reward based on game outcome
def get_reward(board):
    result = check_win(board)
    if result == 1:
        return 1.0  # AI wins
    elif result == -1:
        return -1.0  # AI loses
    else:
        return 0.0  # Draw

# Play against the AI
print("Play against the AI!")
print("AI is player 1 and you are player -1")
print("If you want to pass move, enter -1 for any row or column")
ai = AI()
ai.train(100000)
ai.play_against_human()


  @jit(target_backend ="cuda")


Play against the AI!
AI is player 1 and you are player -1
If you want to pass move, enter -1 for any row or column


Compilation is falling back to object mode WITH looplifting enabled because Function "train" failed type inference due to: [1mUntyped global name 'check_win':[0m [1m[1mCannot determine Numba type of <class 'function'>[0m
[1m
File "..\..\..\..\AppData\Local\Temp\ipykernel_6264\2806268925.py", line 91:[0m
[1m<source missing, REPL/exec in use?>[0m
[0m[0m
  @jit(target_backend ="cuda")
Compilation is falling back to object mode WITHOUT looplifting enabled because Function "train" failed type inference due to: [1m[1mCannot determine Numba type of <class 'numba.core.dispatcher.LiftedLoop'>[0m
[1m
File "..\..\..\..\AppData\Local\Temp\ipykernel_6264\2806268925.py", line 86:[0m
[1m<source missing, REPL/exec in use?>[0m
[0m[0m
  @jit(target_backend ="cuda")
[1m
File "..\..\..\..\AppData\Local\Temp\ipykernel_6264\2806268925.py", line 83:[0m
[1m<source missing, REPL/exec in use?>[0m
[0m
Fall-back from the nopython compilation path to the object mode compilation path has be

Iteration 0: -1
Iteration 1000: 1
Iteration 2000: 1


  return np.exp(x) / np.sum(np.exp(x))
  return np.exp(x) / np.sum(np.exp(x))


Iteration 3000: 1
Iteration 4000: -1
Iteration 5000: 1
Iteration 6000: -1
Iteration 7000: -1
Iteration 8000: -1
Iteration 9000: -1
Iteration 10000: 1
Iteration 11000: -1
Iteration 12000: 1
Iteration 13000: 1
Iteration 14000: 1
Iteration 15000: -1
Iteration 16000: 1
Iteration 17000: -1
Iteration 18000: 1
Iteration 19000: -1
Iteration 20000: -1
Iteration 21000: -1
Iteration 22000: -1
Iteration 23000: 1
Iteration 24000: 1
Iteration 25000: -1
Iteration 26000: -1
Iteration 27000: 1
Iteration 28000: -1
Iteration 29000: -1
Iteration 30000: -1
Iteration 31000: 1
Iteration 32000: -1
Iteration 33000: 1
Iteration 34000: 1
Iteration 35000: 1
Iteration 36000: 1
Iteration 37000: -1
Iteration 38000: 1
Iteration 39000: 1
Iteration 40000: 1
Iteration 41000: 1
Iteration 42000: 1
Iteration 43000: -1
Iteration 44000: 1
Iteration 45000: -1
Iteration 46000: 1
Iteration 47000: 1
Iteration 48000: -1
Iteration 49000: -1
Iteration 50000: -1
Iteration 51000: 1
Iteration 52000: -1
Iteration 53000: -1
Iteration 54