# AIChess
## Training a Neural Network Model to evaluate positions in a game of Chess

### Step 1: Download a Chess databse from the Internet

First we install the required libraries and test if TensorFlow can detect our GPU. This will be very important later when training the model.

In [None]:
!pip3 install zstandard pandas numpy matplotlib seaborn stockfish tensorflow

In [3]:
import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


Then we download the database from the Internet. In this case, we are using the Lichess puzzle database, because it includes many complex and unique positions.

In [None]:
import requests
import zstandard as zstd
import os

URL = "https://database.lichess.org/lichess_db_puzzle.csv.zst"
data_dir = "data"

def download_file(url):
    local_filename = url.split('/')[-1]
    local_filename = os.path.join(data_dir, local_filename)
    # NOTE the stream=True parameter below
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        with open(local_filename, 'wb') as f:
            for chunk in r.iter_content(chunk_size=8192): 
                # If you have chunk encoded response uncomment if
                # and set chunk_size parameter to None.
                #if chunk: 
                f.write(chunk)
    return local_filename

def decompress_file(filename):
    cctx = zstd.ZstdDecompressor()
    with open(filename, 'rb') as f:
        with open(filename[:-4], 'wb') as fout:
            dctx = cctx.stream_reader(f)
            fout.write(dctx.read())

filename = download_file(URL)
decompress_file(filename)
os.remove(filename)


### Step 2: Analyze the database

With the database in our hands, it's time to import it using pandas and analyze the data we have.

In [1]:
import pandas as pd
import numpy as np

data_dir = "data"
df = pd.read_csv(os.path.join(data_dir, 'lichess_db_puzzle.csv'), 
                  header=None, 
                  delimiter=',', 
                  names=('PuzzleId', 'FEN','Moves','Rating','RatingDeviation','Popularity','NbPlays','Themes','GameUrl','OpeningFamily','OpeningVariation'),
                  usecols=('PuzzleId', 'FEN')
                 )
df.set_index('PuzzleId', inplace=True)
df.head()

Unnamed: 0_level_0,FEN
PuzzleId,Unnamed: 1_level_1
00008,r6k/pp2r2p/4Rp1Q/3p4/8/1N1P2R1/PqP2bPP/7K b - ...
0000D,5rk1/1p3ppp/pq3b2/8/8/1P1Q1N2/P4PPP/3R2K1 w - ...
0009B,r2qr1k1/b1p2ppp/pp4n1/P1P1p3/4P1n1/B2P2Pb/3NBP...
000Vc,8/8/4k1p1/2KpP2p/5PP1/8/8/8 w - - 0 53
000Zo,4r3/1k6/pp3r2/1b2P2p/3R1p2/P1R2P2/1P4PP/6K1 w ...


We can see that our database uses the FEN notation to write the board position. More about it can be found here: https://www.chess.com/terms/fen-chess#how-does-fen-work 

In [5]:
df.size

3080529

Our database has around 3 million positions. It would take a very long time to calculate the evaluations for all of them, so we will only use a sample of 330.000 puzzles.

You should note that this will still take a long time (close to 30 hours in the machine used for this test).

### Step 3: Calculate evaluation

Our database only contains the positions, so we need to calculate the evaluation using Stockfish. This is done by using a chess engine to analyze each position.

Make sure to download Stockfish from https://stockfishchess.org/download and change the path variable below accordingly.

In [None]:
from stockfish import Stockfish
import multiprocessing as mp

path =  'bin/stockfish_15_x64_avx2.exe'
sample_size = 5000
n_imports = 100

n_cores = mp.cpu_count()
parameters = {
    "Threads": n_cores-6,
    "Ponder": "false",
    "Hash": 1024*4
}


stockfish = Stockfish(path=path, parameters=parameters, depth=15)

def get_eval(fen):    
    stockfish.set_fen_position(fen)
    return stockfish.get_evaluation()['value']

def create_evaluations(sample_size):
    if os.path.exists(os.path.join(data_dir, 'lichess_db_puzzle_sample.pkl')):
        df_sample_old = pd.read_pickle(os.path.join(data_dir, 'lichess_db_puzzle_sample.pkl'))
        df.drop(df_sample_old.index, inplace=True, errors='ignore')

    df_sample = df.sample(sample_size, random_state=23)
    df_sample['Eval'] = df_sample['FEN'].apply(get_eval)

    if os.path.exists(os.path.join(data_dir, 'lichess_db_puzzle_sample.pkl')):
        df_sample= pd.concat([df_sample_old, df_sample])

    df_sample.to_pickle(os.path.join(data_dir, 'lichess_db_puzzle_sample.pkl'))
    
for i in range(n_imports):
    create_evaluations(sample_size)
    print(f'Finished creating evaluations for {(i+1)*sample_size} positions.')

### Step 4: Preprocess the data
Before initializing our machine learning method, we first need to do some preprocessing. In particular, we have to change the `FEN` field to another type of encoding that uses the same number of characters for every position, so we can easily input it in our neural network.

In the new representation, we use 64 characters to indicate the pieces on the board, with `1` representing a white pawn, `2` a white knight, `3` a white bishop, `4` a white rook, `5` a white queen and `6` a white king. Black pieces follow the same pattern but use negative values, and a blank square is represented by a `0`. In order to simplify the learning process, we can only calculate moves for white. To do this, we vertically flip the board if the next player was black, and mulitply all piece values by `-1`. In order to keep the evaluations consistent, we also multiply the `Eval` column by `-1` if the player to move is black.

It's important to note here that our representation is incomplete. It does not indicate available castle moves for each player, possible en passant targets, or halfmove clock. This information was originally available in the `FEN` representation, but to simplify our model we removed it.

In [46]:
df_sample = pd.read_pickle(os.path.join(data_dir, 'lichess_db_puzzle_sample.pkl'))

piece_to_int = {
    'p': 1,
    'n': 2,
    'b': 3,
    'r': 4,
    'q': 5,
    'k': 6,
    'P': -1,
    'N': -2,
    'B': -3,
    'R': -4,
    'Q': -5,
    'K': -6
}

def fen_to_board(fen):
    fen = fen.split(' ')
    board = np.zeros((8,8), dtype=np.int8)
    position, next_player = (fen[0], fen[1])
    i = 0
    for p in position:
        if p.isdigit():
            i += int(p)
        elif p.isalpha():
            if next_player == 'b':
                board[7-i//8, i%8] = piece_to_int[p]
            else:
                board[i//8, i%8] = piece_to_int[p]                
            i += 1    
    if next_player == 'b':
        board = -board
    return board


df_sample['Board'] = df_sample['FEN'].apply(fen_to_board)

for idx in df_sample.index:
    fen = df_sample.loc[idx, 'FEN']
    next_player = fen.split(' ')[1]
    if next_player == 'b':
        df_sample.loc[idx, 'Eval'] = -df_sample.loc[idx, 'Eval']

df_sample.to_pickle(os.path.join(data_dir, 'lichess_db_puzzle_sample.pkl'))
df_sample.head()

Unnamed: 0_level_0,FEN,Eval,Board
PuzzleId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1gVtG,4Q1k1/2p4p/3bb3/4pr2/8/1P1Pq3/r5PP/2R2R1K b - ...,1.0,"[[0, 0, 4, 0, 0, 4, 0, 6], [-4, 0, 0, 0, 0, 0,..."
hIFlt,2r1B1k1/1p3ppp/1b3P2/p7/P3P1P1/B1q4P/1K1N4/3Q3...,1.0,"[[0, 0, 4, 0, -3, 0, 6, 0], [0, 1, 0, 0, 0, 1,..."
5UoM7,1kn5/1p6/2p2p2/5P2/1Q2pqP1/P2R1B2/1Pr5/1K6 w -...,1.0,"[[0, 6, 2, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, ..."
j1gxA,r3kb1r/pp3pp1/2n1pnp1/1N6/2BP1P2/7P/PP4P1/R1B2...,1.0,"[[4, 0, 3, 0, 0, 4, 6, 0], [1, 1, 0, 0, 0, 0, ..."
m9Vnd,2kn3r/1ppb2p1/p3p3/1PPpP1q1/3N4/2PB2np/P6P/R2Q...,-1.0,"[[0, 0, 6, 2, 0, 0, 0, 4], [0, 1, 1, 3, 0, 0, ..."


Also, it's useful to define maximum and minimum values for the evaluation. In chess, an advantage of 1 point corresponds to a pawn, 3 points to a bishop or a knight, 5 points to a rook and 9 points to a queen.

Having an advantage of, for example, 10 points gives the player an almost guaranteed win unless they make a huge blunder, so for this project there's not much need to store values greater than 10, and the same logic applies to having a disadvantage of 10 points. In fact, keeping large values such as -1007 or 555 should only make it harder for our model, while not providing any actual improvement in the results. Lastly, we 

In [4]:
df_sample = pd.read_pickle(os.path.join(data_dir, 'lichess_db_puzzle_sample.pkl'))

df_sample['Eval'] = df_sample['Eval'].apply(lambda x: max(min(x, 10), -10))

df_sample.to_pickle(os.path.join(data_dir, 'lichess_db_puzzle_sample.pkl'))
df_sample.head()

Unnamed: 0_level_0,FEN,Eval,Board
PuzzleId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1gVtG,4Q1k1/2p4p/3bb3/4pr2/8/1P1Pq3/r5PP/2R2R1K b - ...,-10,....Q.k...p....p...bb.......pr...........P.Pq....
hIFlt,2r1B1k1/1p3ppp/1b3P2/p7/P3P1P1/B1q4P/1K1N4/3Q3...,10,..r.B.k..p...ppp.b...P..p.......P...P.P.B.q......
5UoM7,1kn5/1p6/2p2p2/5P2/1Q2pqP1/P2R1B2/1Pr5/1K6 w -...,10,.kn......p........p..p.......P...Q..pqP.P..R.B...
j1gxA,r3kb1r/pp3pp1/2n1pnp1/1N6/2BP1P2/7P/PP4P1/R1B2...,-10,r...kb.rpp...pp...n.pnp..N........BP.P...........
m9Vnd,2kn3r/1ppb2p1/p3p3/1PPpP1q1/3N4/2PB2np/P6P/R2Q...,-10,..kn...r.ppb..p.p...p....PPpP.q....N......PB.....


Lastly, we divide the `Eval` column by 10, so our values will stay between -1 and 1. This will help train the neural network later because the `tanh` function can only output between those two values. If we later want to find the real evaluation of the position, we can just multiply it by 10.

In [48]:
df_sample['Eval'] = df_sample['Eval'].apply(lambda x: x/10)

df_sample.to_pickle(os.path.join(data_dir, 'lichess_db_puzzle_sample.pkl'))
df_sample.head()

Unnamed: 0_level_0,FEN,Eval,Board
PuzzleId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1gVtG,4Q1k1/2p4p/3bb3/4pr2/8/1P1Pq3/r5PP/2R2R1K b - ...,1.0,"[[0, 0, 4, 0, 0, 4, 0, 6], [-4, 0, 0, 0, 0, 0,..."
hIFlt,2r1B1k1/1p3ppp/1b3P2/p7/P3P1P1/B1q4P/1K1N4/3Q3...,1.0,"[[0, 0, 4, 0, -3, 0, 6, 0], [0, 1, 0, 0, 0, 1,..."
5UoM7,1kn5/1p6/2p2p2/5P2/1Q2pqP1/P2R1B2/1Pr5/1K6 w -...,1.0,"[[0, 6, 2, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, ..."
j1gxA,r3kb1r/pp3pp1/2n1pnp1/1N6/2BP1P2/7P/PP4P1/R1B2...,1.0,"[[4, 0, 3, 0, 0, 4, 6, 0], [1, 1, 0, 0, 0, 0, ..."
m9Vnd,2kn3r/1ppb2p1/p3p3/1PPpP1q1/3N4/2PB2np/P6P/R2Q...,-1.0,"[[0, 0, 6, 2, 0, 0, 0, 4], [0, 1, 1, 3, 0, 0, ..."


### Step 5: Divide our dataset into multiple partitions
Now that our dataset is created and formatted properly, we can divide it into train and test partitions. We are separating 80% of the dataset to be used for training/validation and the rest for testing. We also convert the `Board` column to an array of integers, so we can input it in our model.

In [1]:
import pandas as pd
import numpy as np
import os

data_dir = "data"
df_sample = pd.read_pickle(os.path.join(data_dir, 'lichess_db_puzzle_sample.pkl'))

df_train = df_sample.sample(frac=0.8, random_state=23)
df_test = df_sample.drop(df_train.index)

X_train = np.stack(df_train['Board'].values)
y_train = df_train['Eval'].values

X_test = np.stack(df_test['Board'].values)
y_test = df_test['Eval'].values

### Step 6: Create the model
We can now create the model to be used for evaluating the positions. We will be using a 10 layers Deep Neural Network with 2048 neurons in each layer. We are also adding a 20% Dropout layer between every Dense layer. For the output, we will be using a `tanh` activation function to keep our results between -1 and 1.

In [2]:
import tensorflow as tf

layers = 10
neurons = 2048
dropout = 0.2
optimizer = tf.keras.optimizers.Adam(learning_rate = 1e-6)

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten(input_shape=(8,8)))

for i in range(layers):
    model.add(tf.keras.layers.Dense(neurons, activation='relu'))
    model.add(tf.keras.layers.Dropout(dropout))

model.add(tf.keras.layers.Dense(1, activation='tanh'))

model.compile(optimizer=optimizer,
                loss='mean_squared_error',
                metrics=['mean_squared_error'])

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten (Flatten)           (None, 64)                0         
                                                                 
 dense (Dense)               (None, 2048)              133120    
                                                                 
 dropout (Dropout)           (None, 2048)              0         
                                                                 
 dense_1 (Dense)             (None, 2048)              4196352   
                                                                 
 dropout_1 (Dropout)         (None, 2048)              0         
                                                                 
 dense_2 (Dense)             (None, 2048)              4196352   
                                                                 
 dropout_2 (Dropout)         (None, 2048)              0

### Step 7: Train the model
We are ready to train the model and verify the results. We will be using 20% of the data for validation. A model checkpoint gets saved every few epochs. In total, the model was trained for around 10000 epochs.

In [5]:
import pickle
from math import ceil

batch_size = 256
checkpoint_epochs = 60
epochs = 2000

checkpoint_path = "models/cpp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                                 save_weights_only=True,
                                                 verbose=1,
                                                 save_freq=int( ceil(X_train.shape[0]*0.8/batch_size) * checkpoint_epochs))

if os.path.exists(os.path.join('models', 'model.h5')):
    model = tf.keras.models.load_model(os.path.join('models', 'model.h5'))
    
latest = tf.train.latest_checkpoint(checkpoint_dir)
if latest:
    model.load_weights(latest)

model.fit(X_train, y_train, epochs=epochs, validation_split=0.2, shuffle=True, callbacks=[cp_callback],batch_size=batch_size)

model.save(os.path.join('models/model.h5'))



### Step 8: Evaluate the model

Now that we trained our model, we can evaluate its ability to evaluate positions in our test partition. We should first notice that during the training our model obtained an MSE of 0.01, but our validation steps had an MSE of around 0.89, so this doesn't look promising.

In [6]:
model = tf.keras.models.load_model(os.path.join('models/model.h5'))

loss, mse = model.evaluate(X_test, y_test)

print(f"Restored model, mean squared error: {mse:.4f}")

error = 10 * (mse ** 0.5)

print(f'Evaluation is wrong by {error:.2f} points')

Restored model, mean squared error: 0.8988
Evaluation is wrong by 9.48 points


As was expected, our model wasn't able to generalize well enough, so it offers an acceptable result with the train data, but a very bad result with new data. We can conclude that our model indicates overfitting.

### Step 9: Conclusions

Sadly, it wasn't possible to create a model that could accurately evaluate positions in a game of chess. The best model created showed a very drastic overfitting and was basically useless when used to evaluate new positions.

We can conclude what should have been obvious from the start: Evaluating chess positions is a hard and complex problem. 

In order to obtain a good result, at least some steps below would be required:

- Obtain more training data: While our original dataset contained around 3 million positions, we only used 330.000 for training the model. Obtaining more data would reduce the likelihood of overfitting.
- Use a more complex network structure: It's possible that the network structure utilized simply isn't mathematically able of achieving a good result for this particular problem. So, by increasing the number of layers and/or the number of neurons, it would be able to achieve a better result.
- Changing the data representation: There are probably better ways of representing a chess position in order to calculate its evaluation. By choosing a better representation, it could make simplify the model and allow for better results.