# COGS 118B - Final Project

# AI Chess Game

## Group members

- Ana Maria Baboescu
- Fatima Enriquez
- Eric Lin

# Abstract 
Our goal with this project is to create an artificial intelligence (AI) algorithm that can properly play a game of chess. The data that we used for this project was solely exploration of potential future states within the game itself. We used pyChess in order to create this game, and created and updated values based on potential future states according to a Monte Carlo simulation. The performance of the model is measured by the cumulative rewards that were achieved during a game of chess against an opponent who uses completely random moves. High rewards imply good performance, while low rewards imply poorer performance. 

# Background

There has been ample research regarding the use of AI to solve problems. Due to the increase in popularity of online games such as chess, sudoku, and solitaire, more individuals are inclined to play them virtually. Having this virtual access was key, especially during the pandemic <a name="cite_2"></a>[<sup>[2]</sup>](#cite_2). However, the concern for many is how to detect cheating. The European Online Championship details that it is paramount to keep the integrity of the game to the point where more than eight participants, in 2020, were deemed disqualified for utilizing aid <a name="cite_3"></a>[<sup>[3]</sup>](#cite_3). In this manner, there is a desire for softwares to be able to detect cheating in an online game <a name="cite_1"></a>[<sup>[1]</sup>](#cite_1). The use of AI in online chess matches makes it easier for the person cheating to win the game. While it is fascinating to imagine how a program can win such a strategic game, ethically, it is paramount that as a society we use AI as a learning tool.

Nonetheless, the desire to solve problems and expand the realm of artificial intelligence has been around for decades. In one of the most infamous games of chess played by Garry Kasparov and Deep Blue, the machine won by one point <a name="cite_1"></a>[<sup>[1]</sup>](#cite_1). Deep Blue is a software created by International Business Machines Corporation (IBM), which consisted of 32 processors and produced “high-speed computations in parallel”. Deep Blue was thus able to evaluate about 200 million chess positions per second and held a “processing speed of 11.38 billion floating-point operations per second” <a name="cite_4"></a>[<sup>[4]</sup>](#cite_4).This breakthrough to be able to have a machine detect and think like a human is evolutionary to the point where the legendary match, where Deep Blue won was due to the computational build of the machine. This is because the machine has such a high processing speed and evaluation power <a name="cite_1"></a>[<sup>[1]</sup>](#cite_1).

Aside from one of the most historical events in the history of AI, about half of a century later, the artificial intelligence system Hydra came along. In essence Hydra was created and played against chess Champion Vladimir Kramnik in 2005 <a name="cite_8"></a>[<sup>[8]</sup>](#cite_8) . Their system sparked the continuation of developing tools to help others build their algorithms. Essentially, the Stockfish engine is available for those who desire to analyze and compute pieces and positions of the chess game  <a name="cite_10"></a>[<sup>[10]</sup>](#cite_10). With the development of Stockfish to the community of chess players, this chess engine won ample amounts of matches. Such matches include the World Computer Chess and Top Chess Engine Championships <a name="cite_8"></a>[<sup>[8]</sup>](#cite_8) . The manner in which this chess engine functions is by implementing the minimax, pruning, and extensions implementations. Essentially, what occurs is that the system branches out to analyze the different child notes of possible moves and subsequently decides what is the most optimal move <a name="cite_10"></a>[<sup>[10]</sup>](#cite_10). Yet, even with this major development, the Stockfish algorithm was defeated in 2017 by AI Alpha Zero. Creators DeepMind won almost 30% of the games against Stockfish with a draw of roughly 70%. This in itself is a major accomplishment as this AI system could then over take the human chess champions in an efficient manner. The major difference between Hydra and AlphaZero was the implementation of their algorithms. As Mazurek recalled in the Medium article, AlphaZero utilized reinforcement learning within its structure. In turn, this allowed AlphaZero to significantly outperform Hyra <a name="cite_8"></a>[<sup>[8]</sup>](#cite_8). As detailed in the textbook, there is a consequent reward allocated after winning or losing the game. In this manner, the agent will learn to try and reach the positive reward at the end of its task. In the case of the chess game, it would be to reach a checkmate. As detailed in the text, there is an “unknown Markov decision process” at play where the agent selects states that maximize their likelihood of getting their desired outcome. Subsequently, the agent in the game learns a “Q-function” where they calculate how much their set of state and action costs (Russell and Norvig, 2009, pg 831). All in all since the development and advancements of technology today, artificial intelligence is significantly improving in all aspects such as computational efficiency and being able to beat the previous best AI system. 

In a similar manner, there are other games that have been tackled that have been played by AI systems. One of these games is one called Backgammon, which ties in the concepts of “reinforcement learning and neural networks” in order to gain a greater understanding of what to select next. The manner in which the system improves its performance is by repeatedly playing against itself. Another game that has been utilized as an inspiration for bringing AI into the field is GO. Within the game of Go, there is a board that is 19 units by 19 units. This means that there are 19 columns of cells and 19 columns or rows. Within the game, the AI system does not prune the game, instead, it utilizes the Monte Carlo algorithm to solve its problems (Russell and Norvig, 2009, pg 185-187). As one can note, there are many other games that could involve AI; however, coming in line with the purpose of this project the aforementioned games are in line with the complexity with the game of chess. In this manner, what our group attempted to solve was to implement an algorithm that would choose the optimal move at each state. This in itself is a complex task to accomplish because of the diverse set of rules in a game such as Chess and Go. There are many different loopholes in which the agent needs to learn how to play. 


# Problem Statement

Clearly describe the problem that you are solving. Avoid ambiguous words. The problem described should be well defined and should have at least one ML-relevant potential solution. Additionally, describe the problem thoroughly such that it is clear that the problem is quantifiable (the problem can be expressed in mathematical or logical terms), measurable (the problem can be measured by some metric and clearly observed), and replicable (the problem can be reproduced and occurs more than once).


One of the widely known games, Chess, is a game typically associated with strategy and skill with multiple different styles of playing/executing. Thus, the success of the player is determined by the manner in which both players move. This can frequently be quite difficult for beginners, because without AI, one would have to seek different skill-leveled individuals to build one's mastery of the game. However, with AI, each time the player plays the system, it would be different for the player due to the reinforcement learning of past games it has played including its experience with the current player.

# Data

For our data we did not implement any datasets due to the lack of not having the need for the. This is due to the fact that we implemented the Monte Carlo algorithm for our AI agent to follow. For our allocated data, we mainly created our own by analyzing how our Monte Carlo algorithm was learning. The manner in which the AI agent learned using the aforementioned algorithm was from the game simulation and observing the data through graphing the overall rewards per move that was gained throughout the chess game. 

**In regards to needing a dataset, we have consulted a TA to see if we needed one, we were informed that a dataset was not required if we did not need one. Thus, due to the aspects and details of our project, we formulated tangible visual representations of what the AI agent went through.*



# Proposed Solution

For our project, we propose to create an algorithm using Monte Carlo and reinforcement learning. While chess in itself is not a stochastic game, rather a deterministic game, due to the lack of random-ness and unpredictability, we can still apply the Monte Carlo algorithm to our game (Russell and Norvig, 2009, pg 177). Essentially, we decided to utilize the Monte Carlo algorithm over others such as the Minimax because by utilizing the latter, we were able to compute visualizations. In this manner, we would see how the agent learned to play and thus provide data on how it learned. In a human chess game, both players typically attempt to play optimally, in order to win. One would win by trying to take as many pieces from the opponent as possible; thus clearing the board, and attempting to checkmate. This makes the minimax algorithm, one of the ideal algorithms to use for reaching the aforementioned goal.However, in the case that we did utilize the minimax algorithm, this could essentially compute the optimal move and not provide tangible data for us to collect. This is because the minimax algorithm determines the optimal move at a specified state by “performing a complete depth-first” search of the possible moves(Russell and Norvig, 2009,pg 165).
Subsequently, according to Russel and Norvig, reinforcement learning is defined as an active process where an agent in a specified environment learns through allocated consequences (2009, pg 695). In this manner, reinforcement learning is used by the AI to help utilize what the algorithm learned from previous games in order to improve its performance in the next game and possibly recognize patterns. This is paramount because in the context of being able to train a model to navigate through an environment and win, this can get quite complex very quickly. By being able to recognize patterns of players, the AI could have more experience as well as adapt to different styles of playing.

We would be utilizing numpy to calculate mathematical calculations. Also, the library, python-chess library provides all the move generations, validations, and board/pieces of the game itself. Within the algorithm itself, we adapted the backbone from Ishaan Gupta work on the *Monte Carlo Tree Search Application on Chess*. The Node class allowed us to create a node object that will navigate itself through the game. Subsequently,the ucb1 function takes in the current node. Here, we utilized the Upper Confidence Bound algorithm which essentially selects the move with the highest upper bound, where this would provide either the move with the highest value or reward <a name="cite_14"></a>[<sup>[14]</sup>](#cite_14).  The following function, evaluate_board, evaluates each of the game pieces and returns a score. There is a specified dictionary which pairs each of the game pieces (king, queen, pawn, etc) to assigned numerical values. We also implemented the rollout function that details the resulting score at the end of each game. It does this by evaluating the different possible moves, derived from the expand function, until the end of the game <a name="cite_5"></a>[<sup>[5]</sup>](#cite_5).  In turn, the referenced expand function which runs through, recursively through the game. Thus, once the “current leaf node” is reached then this is when the call to the child nodes ends <a name="cite_5"></a>[<sup>[5]</sup>](#cite_5).  The rollback function updates the nodes in the game tree with the specified results of the aforementioned function. Starting from the end of the nodes, this function iterates and updates the rewards to each of the nodes. One of the main functions, mcts_pred in sum, predicts the best move to execute in the game. After iterating through each of the allocated moves in the game of class, the function utilizes the Upper Confidence Bound algorithm to choose the best one. In order to implement some randomness, essentially capturing the essence of Monte Carlo, we used a random legal moves function to randomize the possible legal moves. With the rest of the program, we initialized all of the possible moves and chose the allocated move  for both of the colors (black and white).

With all of these functions in mind to make this program possible, we essentially solved what we detailed in the problem statement which was to utilize reinforcement learning as a means of training an AI agent to efficiently play chess. The model which our AI agent is being compared to is essentially whether it wins or loses the game. Subsequently, we want to ensure that the AI agent following the Monte Carlo algorithm is increasing its rewards.


# Evaluation Metrics

The evaluation metric that we utilized to visualize the performance of our model is to create a graph that highlights the change in reward over the processed moves. As aforementioned in the proposed solution method, we implemented the Monte Carlo algorithm to be able to visualize how the AI agent learned to play. By doing so, we were able to visualize and quantify the performance of the AI agent. The manner in which we quantified the end performance was by computing the result at the end of the game, thus the agent either won, lost, or tied in the game. The agent, who utilized the Monte Carlo Algorithm, played against another agent which generated only random valid moves. 

The manner in which the evaluation metrics were derived was by taking the reward of the AI agent at each of the states. When the AI agent follows a possible future win in the chessboard, it gets rewarded 10 points. However, when the AI agent follows a possible future loss, then the agent gets penalized 10 points. However, one of the patterns that we noted was how the AI agent kept getting into stalemates with the other agent. This was partly due to the fact that the AI agent was navigating against each of the moves that the random generating agent was making at the current moment.



# Results

In [None]:
import chess
import chess.pgn
import chess.engine
import chess.svg
import random
from math import log, sqrt, e, inf
#from gymnasium.wrappers import RecordVideo

# for the gif
from matplotlib.animation import FuncAnimation
import matplotlib.pyplot as plt 
import os

import numpy as np

depth = 0


### Subsection 1 - Implementing the Node Class & Upper Confidence Bound

Before implementing the Monte Carlo algorithm, we first created a Node class since we were specifically focused on implementing a Monte Carlo Tree Search (MCTS). The reason for us implementing a Monte Carlo Tree Search is due to its main advantage of being able to operate effectively without prior knowledge over certain domains (exempting the general rules and termination states of the game) through discovering its own moves and learning through random plays. This leads to it providing the best probabilistic move given the current state of the game. Because of this it, is widely used in game theory like chess.


On top of creating the Node class for this algorithm, we have also implement an Upper Confidence Bound (UCB) which, according to a medium article by Ishaan Gupta on “Monte Carlo Tree Search Application on Chess”, helps in deciding which node to evaluate based on maximizing the probability of winning from the given state <a name="cite_5"></a>[<sup>[5]</sup>](#cite_5). UCB follows the general equation:

$$A_t=argmax_a(Qt(a) + c\sqrt{\frac{ln(t)}{N_t(a)}})$$

which consists of two factors, exploitation and exploration.

The exploitation factor, that is represented by $Qt(a)$, measures how successful the action (a) is whenever it is used. To put it simply, the higher the success rate of the exploitation the higher the UCB value is. The exploration factor, $c\sqrt{\frac{ln(t)}{N_t(a)}}$ explores the possible actions to take. The UCB factor is calculated in the ucb1 function.


In [None]:
class Node():
    def __init__(self):
        self.state = chess.Board() # current position of board
        self.children = set() # set of all possible states from legal action from current node
        self.parent = None # parent node of current node
        self.N = 0 # number of times parent node has been visited
        self.n = 0 # number of times current node has been visited
        self.v = 0 # exploitation factor of current node
        self.ucb = 0 # Upper confidence bound
    def __lt__(self, other):
        return self.ucb < other.ucb

def ucb1(curr_node):
    ans = curr_node.v + 2 * (sqrt(log(curr_node.N + e + (10 ** -6))/(curr_node.n + (10 ** -10))))
    return ans

### Subsection 2

For the MCTS algorithm we implemented several helper functions that would be used when implementing the main algorithm. These functions are:
- **Evaluate_board:** This function evaluates the pieces on the board and calculator values to help determine which pieces are more ideal to take.
- **Rollout:** This function generates random moves from the current node until termination  and returns either the reward or punishment with the current node.
- **Expand:** This function keeps calling the child node for a certain turn and will return the maximum priority till the current leaf node has been reached.
- **Rollback:** After the final node and reward is read, this function traverses back to the root and in turn updates the UCB value for each node of the path.


In [None]:
def evaluate_board(board):
    piece_values = {
        chess.PAWN: 1,
        chess.KNIGHT: 3,
        chess.BISHOP: 3,
        chess.ROOK: 5,
        chess.QUEEN: 10,
        chess.KING: 0
    }
    value = 0
    for piece_type in piece_values:
        value += len(board.pieces(piece_type, chess.WHITE)) * piece_values[piece_type]
        value -= len(board.pieces(piece_type, chess.BLACK)) * piece_values[piece_type]
    return value

In [None]:
def rollout(curr_node, reward):
    board = curr_node.state 
    depth = 0
    captured_piece = None
    while not board.is_game_over():
        legal_moves = list(board.legal_moves)
        move_weights = []
        for move in legal_moves:
            board.push(move)
            move_weights.append(evaluate_board(board))
            board.pop()
        total_weight = sum(move_weights)
        if total_weight == 0:
            move = random.choice(legal_moves)
        else:
            probabilities = [weight / total_weight for weight in move_weights]
            move = random.choices(legal_moves, probabilities)[0]
        if board.is_capture(move):
            if board.is_en_passant(move):
                captured_piece = chess.PAWN
            else:
                captured_piece = board.piece_at(move.to_square).piece_type
        if captured_piece is not None and depth == 0:
            if captured_piece == chess.PAWN:
                reward += 1
            elif captured_piece == chess.KNIGHT:
                reward += 3
            elif captured_piece == chess.BISHOP:
                reward += 3
            elif captured_piece == chess.ROOK:
                reward += 5
            elif captured_piece == chess.QUEEN:
                reward += 9
        board.push(move)
        depth += 1

    if board.is_game_over():
        result = board.result()
        if result == '1-0':
            return reward + 10
        elif result == '0-1':
            return reward - 10
        else:
            return reward
    return rollout(curr_node, reward)
    #return evaluate_board(board) / 100


In [None]:
def expand(curr_node, white):
    if len(curr_node.children) == 0:
        return curr_node
    max_ucb = -inf
    if white:
        sel_child = None
        for i in curr_node.children:
            tmp = ucb1(i)
            if tmp > max_ucb:
                max_ucb = tmp
                sel_child = i
        return expand(sel_child, 0)
    else:
        sel_child = None
        min_ucb = inf
        for i in curr_node.children:
            tmp = ucb1(i)
            if tmp < min_ucb:
                min_ucb = tmp
                sel_child = i
        return expand(sel_child, 1)

In [None]:
rewards = [] # list that stores rewards

def rollback(curr_node, reward):
    curr_node.n += 1
    curr_node.v += reward
    while curr_node.parent is not None:
        curr_node.N += 1
        curr_node = curr_node.parent
    rewards.append(reward) # Append the reward to the list
    return curr_node


### Subsection 3 - Implementing the Monte Carlo & Random Move Generator

In this section we implement the Monte Carlo Tree Search algorithm by calling all the helper functions together and calculate the rewards. We also implemented a random legal move generator function that randomly generates possible legal moves for an opponent computer to play.



In [None]:
def mcts_pred(curr_node, over, white, iterations=5000): #updated iterations from 10 to 100
    if over:
        return -1
    all_moves = [curr_node.state.san(i) for i in list(curr_node.state.legal_moves)]
    map_state_move = dict()

    for i in all_moves:
        tmp_state = chess.Board(curr_node.state.fen())
        tmp_state.push_san(i)
        child = Node()
        child.state = tmp_state
        child.parent = curr_node
        curr_node.children.add(child)
        map_state_move[child] = i

    while iterations > 0:
        if white:
            max_ucb = -inf
            sel_child = None
            for i in curr_node.children:
                tmp = ucb1(i)
                if tmp > max_ucb:
                    max_ucb = tmp
                    sel_child = i

            ex_child = expand(sel_child, 0)
            reward = rollout(ex_child, 0)
            curr_node = rollback(ex_child, reward)
            iterations -= 1
        else:
            min_ucb = inf
            sel_child = None
            for i in curr_node.children:
                tmp = ucb1(i)
                if tmp < min_ucb:
                    min_ucb = tmp
                    sel_child = i

            ex_child = expand(sel_child, 1)
            reward = rollout(ex_child)
            curr_node = rollback(ex_child, reward)
            iterations -= 1
        
        if white:
            max_ucb = -inf
            selected_move = ''
            for i in curr_node.children:
                tmp = ucb1(i)
                if tmp > max_ucb:
                    max_ucb = tmp
                    selected_move = map_state_move[i]
            return selected_move
        else:
            min_ucb = inf
            selected_move = ''
            for i in curr_node.children:
                tmp = ucb1(i)
                if tmp < min_ucb:
                    min_ucb = tmp
                    selected_move = map_state_move[i]
            return selected_move


In [None]:
def getRandomLegalMove(curr_node):
    legal_moves = list(curr_node.state.legal_moves)
    rand = random.randrange(len(legal_moves))
    return board.san(legal_moves[rand])

### Subsection 4 - Main Function

The main function calls upon the board and implements the game. It helps keep track of the status of the game and the different turns by switching between players. At the end it provides the results of the game along with the layout of the final board.

In [None]:

# Main Function
board = chess.Board()

white = 1
moves = 0
pgn = []
game = chess.pgn.Game()
evalutations = []
sm = 0
cnt = 0

# store each board state as an image
board_states = []
#root = Node()
while not board.is_game_over():
    if white == 1:
        

        root = Node()
        root.state = board
        result = mcts_pred(root, board.is_game_over(), white, iterations=500) # added in iterations = 100

        board.push_san(result)

        pgn.append(result)
        white ^= 1 # allows to switch between 2 different states black and white

        # Save the current board state
        board_states.append(board.copy())
        moves += 1
    else:
        
        root = Node()
        root.state = board
        result = getRandomLegalMove(root)
        board.push_san(result)
        pgn.append(result)
        white ^= 1 
        board_states.append(board.copy())
        moves += 1

board_states.append(board.copy())

print(board)
print(' '.join(pgn))
print()
print(board.result())
game.headers['Result'] = board.result()


### Subsection 5 - Gif and Graphs

After running all the Monte Carlo algorithms, we graphed the rewards for each of the move numbers to observe how the algorithm is learning throughout the whole game. As described in the previous subsections, we assigned different values to the different pieces to place higher importance on certain pieces versus others allowing the algorithm to prioritize the pieces with higher scores. This has also led to the overall rewards on the graph with each iteration of moves growing throughout the game, as can be seen from the graph below. Along with the graph we also provided a gif of the chess game being played for you to observe. In this game, **white** is using the Monte Carlo method while **black** is using a random move generator.

In [None]:
# Load unicode 
pieces_unicode = {
    "P": "♙", "R": "♖", "N": "♘", "B": "♗", "Q": "♕", "K": "♔",
    "p": "♟", "r": "♜", "n": "♞", "b": "♝", "q": "♛", "k": "♚" 
}

# Create animation using FuncAnimation (from L6)
fig, ax = plt.subplots()

def plot_board(board, ax):
    ax.clear()
    ax.set_xticks([])
    ax.set_yticks([])
    ax.imshow([[1, 0] * 4, [0, 1] * 4] * 4, cmap='gray', alpha=0.3) # gray and white
    for square in chess.SQUARES:
        piece = board.piece_at(square)
        if piece:
            piece_unicode = pieces_unicode[piece.symbol()]
            ax.text(square % 8, 7 - square // 8, piece_unicode, fontsize=36, ha='center', va='center')

def update(frame):
    plot_board(board_states[frame], ax)

animation = FuncAnimation(fig, update, frames=len(board_states), repeat=False)
animation.save('chess_game_2.gif', writer='pillow', fps=3)
plt.close(fig)

cumulative_rewards = []
sum = 0
for val in rewards:
    sum += val
    cumulative_rewards.append(sum)


plt.figure(figsize=(10, 6))
plt.plot(cumulative_rewards, marker='o')
plt.title('Updating Rewards Over the Game')
plt.xlabel('Move Number')
plt.ylabel('Reward')
plt.grid(True)
plt.show()


In [17]:
#os.listdir()

![winning chess game reward png](winning_reward.png)

![chess_game gif](winning_game.gif)

# Discussion
### Interpreting the result

In the graph above, which led to a win for **white**, the Monte Carlo algorithm, the reward steadily increases over the course of the game. As the algorithm makes moves that are the most likely to lead **white** to a victory, the reward increases. Meanwhile, in graphs of games that lead **white** to a loss, the reward tends to steadily go downwards. The reward is correlated to actions that the algorithm has sampled and detected a checkmate, as well as capturing the opponent’s pieces which results in a material advantage. 

However, this isn’t always the case. While the reward graphs for wins generally tend to trend upwards and the reward graphs for losses tend to trend downwards, on rare occasions, a positive reward graph can result in a loss and a negative reward graph can result in a win. 

![negative reward win png](negative_reward_win.png)

The above graph in which **white** won the game shows one such example; despite winning the game, the reward for the graph is in the negatives. The fact that there was a checkmate hidden beyond all the negative rewards implies that the algorithm did not see this checkmate until likely around move 178. This shows that, due to the randomness of Monte Carlo, in addition to the randomness of the random opponent, the algorithm still has many blind spots that it cannot see until it is right next to it.

But by far, the most frequent outcome of the Monte Carlo vs Random Algorithm is a tie. The graphs for ties often look very similar to the graphs of wins or losses, with the graphs tending to have a positive or negative trend. Despite this, they appear to get stuck for a while until the game results in a stalemate. 

![draw reward png](draw_reward.png)

This graph for a stalemate looks rather similar to the winning graph provided earlier. This is likely due to the fact that the algorithm detects a checkmate in the extremely near future, but doesn’t know how to adapt to the unpredictable actions of the random action opponent. As such, despite having an advantage and being really close to a checkmate, it dances around the checkmate until one side gets stuck in a position in which they cannot make a move, causing a tie.

### Limitations

One limitation with the work is simply the limitations of computers and the nature of the game of chess. It is estimated that there are approximately $10^{100}$ possible positions for a chess board to be in, so it is impossible to update a Q-table in this regard. We got around this by using a Node() class and updating values inside each node during the simulation stages. While this solved the problem, there are still issues such as different steps to get to the same state ending up as two different nodes and having two different values. 

Another limitation is that we had the chess AI play against a random algorithm in order to train it. Random algorithms do not act like humans and thus do not do certain things that would be beneficial, such as capturing a piece when given the opportunity. As such, the model may not be best equipped for playing against humans. 
   

### Ethics & Privacy

One potential concern regarding ethics that may come with this project would be the ability to use the AI to cheat in real online chess matches against unsuspecting players. For example, a cheater may be able to hook up the AI directly to the chess program or they may be able to use it to simulate future moves, gaining an unfair advantage. I think a way to address this may be to warn players that many multiplayer chess sites have anti-cheating detection, discouraging the use of this algorithm in cheating.

Another concern is that this algorithm is fairly resource-intensive. Scaling this algorithm to be significantly more efficient and effective enough to beat professional chess players would likely require very large amounts of processing power and time in order to trai, which would be bad for the environment.


### Conclusion

All in all, by utilizing the Monte Carlo algorithm we were successful in being able to train the AI agent to successfully navigate/learn the game of chess against an opponent computing random moves. Essentially, our results support our problem statement because, as aforementioned, despite the lack of the AI agent losing the game, there were still rewards and learning progress. Thus, this in itself would fit in the context of other work being created in the gaming industry. As detailed in the background section, there are ample Chess Championship matches occurring all around the world; hence, having an AI agent to practice against would be key to build one's skill-level. This is because one could train the AI model to exceed the individual's expertise. 

# Footnotes
<a name="cite_1"></a>1.[^](#cite_1): Duca Iliescu, D. M. (10 Dec 2020) The Impact of Artifical Intelligence on the Chess World. *National Library of Medicine*. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7759436/<br>
<a name="cite_2"></a>2.[^](#cite_2): Waldstein, D. (8 May 2020) Chess Thrives Online Despite Pandemic. *The New York Times*. https://www.nytimes.com/2020/05/08/sports/coronavirus-chess-online-tournament.html <br>
<a name="cite_3"></a>3.[^](#cite_3): Schormann, C. (21 May 2020) Can online chess overcome cheating? *ChessTech* https://www.chesstech.org/2020/can-online-chess-overcome-cheating/ <br>
<a name="cite_4"></a>4.[^](#cite_4): Deep Blue. IBM. (n.d.). *IBM* https://www.ibm.com/history/deep-blue <br>
<a name="cite_5"></a>5.[^](#cite_5): Gupta, I. (2020a, November 13). Monte Carlo Tree Search application on chess. *Medium*. https://medium.com/@ishaan.gupta0401/monte-carlo-tree-search-application-on-chess-5573fc0efb75  <br>
<a name="cite_6"></a>6.[^](#cite_6):  F, J. (2024, April 6). Please help creating an chess board image based on description. *OpenAI Developer Forum*. https://community.openai.com/t/please-help-creating-an-chess-board-image-based-on-description/645399/7 <br>
<a name="cite_7"></a>7.[^](#cite_7): Roy, R. (2023, May 23). ML: Monte Carlo Tree Search (MCTS). *GeeksforGeeks*. https://www.geeksforgeeks.org/ml-monte-carlo-tree-search-mcts/ <br>
<a name="cite_8"></a>8.[^](#cite_8): Mazurek, D. (2022, December 19). AI Journey — from chess to software development!. *Medium*.https://medium.com/@damian.s.mazurek/ai-path-from-chess-to-software-development-b99425ec12d1 <br>
<a name="cite_9"></a>9.[^](#cite_9): Byrne, R. (2005, July 10). It’s man vs. machine again, and man comes out limping. *The New York Times*. https://www.nytimes.com/2005/07/10/crosswords/chess/its-man-vs-machine-again-and-man-comes-out-limping.html <br>
<a name="cite_10"></a>10.[^](#cite_10): Stockfish. (n.d.). https://disservin.github.io/stockfish-docs/stockfish-wiki/Home.html <br>
<a name="cite_11"></a>11.[^](#cite_11): What is depth?. *Stockfish*. (2024, January 21). https://disservin.github.io/stockfish-docs/stockfish-wiki/Stockfish-FAQ.html#minimax <br>
<a name="cite_12"></a>12.[^](#cite_12): Russell, S., & Norvig, P. (2009). Artificial Intelligence: A Modern Approach. *Pearson*. <br>
<a name="cite_13"></a>13.[^](#cite_13): Ish2K. (2020, November 13). Chess-bot-ai-algorithms/git_chess/monte_carlo_implementation.py at main · Ish2K/Chess-Bot-ai-algorithms. GitHub. https://github.com/Ish2K/Chess-Bot-AI-Algorithms/blob/main/Git_chess/monte_carlo_implementation.py <br>
<a name="cite_14"></a>14.[^](#cite_14): GeeksforGeeks. (2020, February 19). Upper Confidence Bound Algorithm in Reinforcement Learning.* GeeksforGeeks*. https://www.geeksforgeeks.org/upper-confidence-bound-algorithm-in-reinforcement-learning/


# Contributions
**Fatima:** Background, Proposed Solution, Research, Evaluation Metrics

**Ana:** Results, Data, Proposed Solution, Research, Visualizations

**Eric:** Research, Discussion, Ethics and Privacy, Visualizations
