## Using Q-learning to train an AI for tic tac toe

We implement a basic reinforced learning algorithm, specifically Q-learning, to teach an AI to play tic tac toe. Being not the first one to ever do this, I would like to give credit to Heiko Hotz (https://github.com/heiko-hotz/tictactoe-q?tab=readme-ov-file), where I gained some inspiration for this, and started my journey of learning about Q-learning. Additionally, for the interactive game one can play against the AI, I got some inspiration from the Bro Code YouTube channel (https://www.youtube.com/watch?v=V9MbQ2Xl4CE).

In the included Python files, I include a training function, which trains the AI based on certain parameter values we can set manually: the number of games to train on, the learning rate, the discount factor, and the explore rate, as well as its minimum value and decay rate.

Additionally, the training file contains a test function, which lets two AIs play against one another based on their respective Q-tables (or randomly if None).

Finally, the interactive play file contains the code for launching an interactive game against an AI opponent. 

We represent our board as a 3x3 grid by using a 2D list, so the coordinates of the nine squares are (0, 0), (0, 1), (0, 2), (1, 0), ..., (2, 2), from left to right and top to bottom.

In [2]:
from training import *
import csv

### Training

In [None]:
Q_table = train(number_games=int(1e7), learning_rate=0.2, discount_factor=0.9, explore_rate=1.0,
explore_rate_min=0.01, explore_rate_decay=0.999, player1_table=None, player2_table=None)

In [None]:
#Save the q-table to a csv file
with open('q_table.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    for board, values in Q_table.items():
        writer.writerow([board] + [values])

print("Q-table saved to q_table.csv")

Q-table saved to TTT_q_table.csv


### Testing

In [7]:
# Open the saved q-table
Q_table = {}
with (open('q_table.csv', 'r') as file):
    reader = csv.reader(file)
    for row in reader:
        board = row[0]
        values = eval(row[1])
        Q_table[board] = values

In [8]:
# The trained AI plays a random opponent
wins, draws, losses = test(number_games=int(1e5), player1_table=Q_table, player2_table=None)

In [9]:
print('Wins against random opponent: %d' %wins)
print('Draws against random opponent: %d' %draws)
print('Losses against random opponent: %d' %losses)

Wins against random opponent: 95552
Draws against random opponent: 4347
Losses against random opponent: 101


As a reality check, let's see what happens when we test two random opponents, the trained AI as player 2, and two trained AIs against each other. Note that the wins and losses always refer to player 1, and that we have to swap 1's and 2's in the q-table when we want to use it for player 2. We do the latter first.

In [10]:
boards1 = list(eval(b) for b in Q_table.keys())
conv = {0: 0, 1: 2, 2: 1}
boards2 = list(list(list(conv[el] for el in row) for row in board) for board in boards1)

boards1_str = list(str(board) for board in boards1)
boards2_str = list(str(board) for board in boards2)

Q_table2 = {}
for board in boards2_str:
    Q_table2[board] = Q_table[boards1_str[boards2_str.index(board)]]

In [11]:
print('Two random players:')

win, draw, loss = test(number_games=int(1e5), player1_table=None, player2_table=None)

print('Wins:', win)
print('Draws:', draw)
print('Losses:', loss)

print('---------')
print('Trained AI as player 2 versus random opponent:')

win, draw, loss = test(number_games=int(1e5), player1_table=None, player2_table=Q_table2)

print('Wins:', win)
print('Draws:', draw)
print('Losses:', loss)

print('---------')
print('Two trained AIs facing each other:')

win, draw, loss = test(number_games=int(1e5), player1_table=Q_table, player2_table=Q_table2)

print('Wins:', win)
print('Draws:', draw)
print('Losses:', loss)

Two random players:
Wins: 43403
Draws: 12895
Losses: 43702
---------
Trained AI as player 2 versus random opponent:
Wins: 112
Draws: 4364
Losses: 95524
---------
Two trained AIs facing each other:
Wins: 0
Draws: 100000
Losses: 0
