In [2]:
import numpy as np
import pandas as pd
import chess as chess

Chess is a two-player strategy board game played on an 8x8 checkerboard with 64 square. Played by millions of people worldwide, chess is believed to be derived from the Indian game "chaturanga" sometime before the 7th century. Interestingly, chess has a storied and intraicate history with machine learning. This largely comes from its deterministic nature, and relatively simple ruleset. The combination of these two factors make chess an interesting playground for a host of AI and machine learning algorithms. 

A Brief History of Chess AI:
1951: Alan Turing publishes the first program on paper theoretically capable of playing chess.
1989: Chess world champion Gary Kasparov defeated IBM’s Deep Thought in a chess match.
1996: Kasparov defeated IBM’s Deep Blue in another match.
1997: IBM’s Deep Blue becomes the first chess AI to defeat a grandmaster in a match.
2017: AlphaZero, a neural net-based digital automaton, beats Stockfish 28–0, with 72 draws in chess matches.
2019: Leela Chess Zero (LCZero v0.21.1-nT40.T8.610) defeats Stockfish 19050918 in a 100-game match 53.5 to 46.5 for the Top Chess Engine Championship season 15 title.
Present: Modern chess AI engines deploy deep learning to learn from thousands of matches. They regularly have FIDE ratings, chess’ rating system, above 3,400, far beyond the best human players. 

So obviously there's been a lot of work done on machines designed to play chess at or above the level of the best humans in the world. For the purposes of this project, we'll look at something relatively simple in comparison. If you present a chess player with a board at some state of play, they can fairly quickly determine which side (black or white) is "winning". In some, more complex cases, they may compute some moves ahead to determine this, but usually there is some intutition they have, where they can almost instantly determine which side is winning. In this project I want to see if I can train a computer to do the same thing. 

Let's start with getting some data. Kaggle is a useful database for all kinds of data, and indeed they have a dataset that matches the needs of this project. Thanks to Mitchell J from Kaggle for this dataset: 
https://www.kaggle.com/datasets/datasnaek/chess
This particular dataset collects the records from 20,000 chess games on lichess.com. When we look at the data we see a lot:

In [3]:
gamesdata = pd.read_csv("archive/games.csv")
gamesdata.axes

[RangeIndex(start=0, stop=20058, step=1),
 Index(['id', 'rated', 'created_at', 'last_move_at', 'turns', 'victory_status',
        'winner', 'increment_code', 'white_id', 'white_rating', 'black_id',
        'black_rating', 'moves', 'opening_eco', 'opening_name', 'opening_ply'],
       dtype='object')]

Interestingly, we don't really care about most of this stuff, really just victory_status, winner, turns, and moves. Let's trim down our data to just these columns.

In [4]:
gamesdata_reduced = gamesdata[["victory_status", "winner", "turns", "moves"]]
gamesdata_reduced


Unnamed: 0,victory_status,winner,turns,moves
0,outoftime,white,13,d4 d5 c4 c6 cxd5 e6 dxe6 fxe6 Nf3 Bb4+ Nc3 Ba5...
1,resign,black,16,d4 Nc6 e4 e5 f4 f6 dxe5 fxe5 fxe5 Nxe5 Qd4 Nc6...
2,mate,white,61,e4 e5 d3 d6 Be3 c6 Be2 b5 Nd2 a5 a4 c5 axb5 Nc...
3,mate,white,61,d4 d5 Nf3 Bf5 Nc3 Nf6 Bf4 Ng4 e3 Nc6 Be2 Qd7 O...
4,mate,white,95,e4 e5 Nf3 d6 d4 Nc6 d5 Nb4 a3 Na6 Nc3 Be7 b4 N...
...,...,...,...,...
20053,resign,white,24,d4 f5 e3 e6 Nf3 Nf6 Nc3 b6 Be2 Bb7 O-O Be7 Ne5...
20054,mate,black,82,d4 d6 Bf4 e5 Bg3 Nf6 e3 exd4 exd4 d5 c3 Bd6 Bd...
20055,mate,white,35,d4 d5 Bf4 Nc6 e3 Nf6 c3 e6 Nf3 Be7 Bd3 O-O Nbd...
20056,resign,white,109,e4 d6 d4 Nf6 e5 dxe5 dxe5 Qxd1+ Kxd1 Nd5 c4 Nb...


Okay, now we have some interesting data to work with. As you can see, we have a little more than 20,000 examples. However, we have a problem! I said earlier we want to look at the board state at some point about halfway through the game, but we don't actually have this information. However, we do have a list of all the moves made throughout the game. We can use this to extrapolate the boardstate, but in order to do this, we need to take a quick detour to explain Standard Algebraic Notation. 

If we look at the first game, we can see the moves as follows:

In [5]:
gamesdata_reduced["moves"][0]

'd4 d5 c4 c6 cxd5 e6 dxe6 fxe6 Nf3 Bb4+ Nc3 Ba5 Bf4'

To those not familiar with SAN, this seems like gibberish, but it's actually a pretty simple and effective way to minimally represent moves on a chessboard. To begin with, we can label the rows and columns of a chessboard as follows. 

Now the moves are simple, the first, capital letter refers to the piece being moved (Q is queen, N is knight, B is bishop, and so on). No capital letter means a pawn is being moved. The coordinate refers to the coordinate the piece is being moved to, and an x means that the piece is capturing another piece. For more information on the history and intracacies of SAN, you can refer to the following article from chess.com: 
https://www.chess.com/terms/chess-notation
To interpret this into a board state we can use the chess library for python. This library initializes a default board and implements each ove fed into it. We can then output this into a simple 8 by 8 matrix with capital letters mean white pieces and lowercase letters mean black pieces, in the following way: 

In [6]:
def getboardstate(moves,end):
    board = chess.Board()
    moves = moves.split(" ")
    for move in moves[:end]:
        board.push_san(move)
    boardstate = board.fen().split(" ")[0]
    boardstate = boardstate.split("/")
    for i in range(len(boardstate)): 
        temp = []
        for b in boardstate[i]:
            if b in "123456789":
                temp.extend(["0"] * int(b))
            else:
                temp.append(b)
        boardstate[i] = temp
    return boardstate

print(getboardstate(gamesdata_reduced["moves"][0], gamesdata_reduced["turns"][0]))

[['r', 'n', 'b', 'q', 'k', '0', 'n', 'r'], ['p', 'p', '0', '0', '0', '0', 'p', 'p'], ['0', '0', 'p', '0', 'p', '0', '0', '0'], ['b', '0', '0', '0', '0', '0', '0', '0'], ['0', '0', '0', 'P', '0', 'B', '0', '0'], ['0', '0', 'N', '0', '0', 'N', '0', '0'], ['P', 'P', '0', '0', 'P', 'P', 'P', 'P'], ['R', '0', '0', 'Q', 'K', 'B', '0', 'R']]


For more information about the chess library (which you should do, it's great!), you can visit their Github: 
https://github.com/niklasf/python-chess/blob/master/docs/index.rst


Okay so now we need to define the "mid way" point that we handwaved earlier. Looking at the official data, chess.com claims that there is about 40 turns in an average chess game. When we look at our specific data, we have an average game length of 60 turns. Regardless, let's start by calling the midpoint as half of the total turns being played. Now we can add the boardstate column:

In [7]:
print(np.average(gamesdata_reduced["turns"]))

60.46599860404826


In [8]:
gamesdata_reduced["board_state"] = gamesdata_reduced.apply(lambda row: getboardstate(row["moves"], row["turns"]//2), axis = 1)
gamesdata_reduced

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  gamesdata_reduced["board_state"] = gamesdata_reduced.apply(lambda row: getboardstate(row["moves"], row["turns"]//2), axis = 1)


Unnamed: 0,victory_status,winner,turns,moves,board_state
0,outoftime,white,13,d4 d5 c4 c6 cxd5 e6 dxe6 fxe6 Nf3 Bb4+ Nc3 Ba5...,"[[r, n, b, q, k, b, n, r], [p, p, 0, 0, 0, p, ..."
1,resign,black,16,d4 Nc6 e4 e5 f4 f6 dxe5 fxe5 fxe5 Nxe5 Qd4 Nc6...,"[[r, 0, b, q, k, b, n, r], [p, p, p, p, 0, 0, ..."
2,mate,white,61,e4 e5 d3 d6 Be3 c6 Be2 b5 Nd2 a5 a4 c5 axb5 Nc...,"[[0, 0, 0, 0, k, b, n, r], [0, 0, 0, 0, 0, p, ..."
3,mate,white,61,d4 d5 Nf3 Bf5 Nc3 Nf6 Bf4 Ng4 e3 Nc6 Be2 Qd7 O...,"[[0, 0, 0, r, 0, b, 0, r], [N, p, B, k, p, p, ..."
4,mate,white,95,e4 e5 Nf3 d6 d4 Nc6 d5 Nb4 a3 Na6 Nc3 Be7 b4 N...,"[[0, 0, 0, 0, r, 0, k, 0], [p, p, 0, 0, 0, p, ..."
...,...,...,...,...,...
20053,resign,white,24,d4 f5 e3 e6 Nf3 Nf6 Nc3 b6 Be2 Bb7 O-O Be7 Ne5...,"[[r, n, 0, q, k, 0, 0, r], [p, b, p, p, b, 0, ..."
20054,mate,black,82,d4 d6 Bf4 e5 Bg3 Nf6 e3 exd4 exd4 d5 c3 Bd6 Bd...,"[[r, n, 0, 0, 0, 0, k, 0], [p, 0, p, 0, 0, p, ..."
20055,mate,white,35,d4 d5 Bf4 Nc6 e3 Nf6 c3 e6 Nf3 Be7 Bd3 O-O Nbd...,"[[r, 0, b, q, 0, r, k, 0], [p, 0, p, 0, b, p, ..."
20056,resign,white,109,e4 d6 d4 Nf6 e5 dxe5 dxe5 Qxd1+ Kxd1 Nd5 c4 Nb...,"[[0, 0, 0, r, 0, 0, 0, 0], [P, 0, k, 0, b, 0, ..."


Okay now we have the data we need, but this isn't something we can use as input for our algorithm. I think a good place to start would be taking a play out of the computer vision playbook. Essentially we can unravel the 8x8 matrix into a single, 64x1 column vector. Now, each row has a value corresponding to a specific piece. Instread of looking at each piece as a value, let's conver it to a 1-hot vector of size 12 (there are 6 unique pieces per side, so 12 total on the board). Then we can unravel this 64x12 matrix again, now into a 512x1 column vector. 

In [9]:
letter_to_value = {"r":0, "n":1,"b":2,"q":3,"k":4,"p":5,"R":6,"N":7,"B":8,"Q":9,"K":10,"P":11}

In [10]:

def convert_to_column_vector(boardstate):
    col_vec_64 = np.array(boardstate).flatten()
    mat_64_8 = np.zeros((64,12))
    for i in range(len(col_vec_64)):
        if(col_vec_64[i] in letter_to_value.keys()):
            mat_64_8[i][letter_to_value.get(col_vec_64[i])] = 1
    return mat_64_8.flatten()
                
        

print(convert_to_column_vector(gamesdata_reduced["board_state"][0]))
print(convert_to_column_vector(gamesdata_reduced["board_state"][0]).shape)

[1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.

In [11]:
gamesdata_reduced["single_col"] = gamesdata_reduced.apply(lambda row: convert_to_column_vector(row["board_state"]), axis = 1)
gamesdata_reduced["single_col"][0]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  gamesdata_reduced["single_col"] = gamesdata_reduced.apply(lambda row: convert_to_column_vector(row["board_state"]), axis = 1)


array([1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       1., 0., 0., 0., 0.

We have our input, now our final step of prepearation is we need output. The outcome of a chess game is a relatively simple: black win, white win, or draw. Let's check the numbers on each of these:

In [12]:
print(gamesdata_reduced["winner"].value_counts())

white    10001
black     9107
draw       950
Name: winner, dtype: int64


So we can see clearly that draws are in the extreme minority, with white wins about 10% more likely than black wins. To begin with, let's just look at the games with a winner (not a draw). We will come back to the draws, and make sure to include them in our final model. For now, let's work with just the games with an actual winner. 

In [13]:
gamesdata_binary = gamesdata_reduced.loc[gamesdata_reduced["winner"] != "draw"]
gamesdata_binary["winner"].value_counts()

white    10001
black     9107
Name: winner, dtype: int64

For these, let's set white wins as 1 and black wins as 0.

In [14]:
gamesdata_binary["binary_winner"] = gamesdata_binary.apply(lambda row: 0 if row["winner"] == "black" else 1, axis = 1)
gamesdata_binary["binary_winner"].value_counts()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  gamesdata_binary["binary_winner"] = gamesdata_binary.apply(lambda row: 0 if row["winner"] == "black" else 1, axis = 1)


1    10001
0     9107
Name: binary_winner, dtype: int64

Finally! We now have a classic binary classification problem. Let's start with the basics and build a perceptron. We can use sklearn's perceptron class. For an overview of a what a perceptron is, you can check out the following website: 
https://www.simplilearn.com/tutorials/deep-learning-tutorial/perceptron
If you don't want to do that, I'll give you a quick overview. A perceptron is a simple machine learning algorithm that looks to find the division boundary between two categories of data. Importantly, the data has to be linearly separable, because the perceptron can only draw a linear boundary. Our data is in too high of a dimension to be graphed comprehensibly, so let's apply Principal Component Analysis to reduce the data into something we can graph, to get an estimation of the separability. 

In [15]:
from sklearn.decomposition import PCA
from sklearn.datasets import load_digits
from sklearn.decomposition import IncrementalPCA

X = np.vstack(gamesdata_binary["single_col"])
transformer = IncrementalPCA(n_components = 2, batch_size = 100)
transformer.partial_fit(X[:100, :])
X_transformed = transformer.fit_transform(X)


In [26]:
from sklearn.model_selection import train_test_split
data = np.vstack(gamesdata_binary["single_col"])
datasets = train_test_split(data, gamesdata_binary["binary_winner"],
                            test_size=0.2)

train_data, test_data, train_labels, test_labels = datasets
from sklearn.metrics import accuracy_score
from sklearn.linear_model import Perceptron
p = Perceptron(random_state=42, verbose=0)
p.fit(train_data, train_labels)
predictions_train = p.predict(train_data)
predictions_test = p.predict(test_data)
train_score = accuracy_score(predictions_train, train_labels)
print("score on train data: ", train_score)
test_score = accuracy_score(predictions_test, test_labels)
print("score on test data: ", test_score)

score on train data:  0.5829517205285882
score on test data:  0.5638409209837781
