# The brain

## Creating the database
This is where I train the neural network to handle evaluation predictions. The idea behind this engine is to learn to predict how Stockfish would evaluate a position. Lichess kindly provides us with a database of 21M positions with stockfish's evaluation in a JSON format. We'll create a sqlite database as storage and interact with it using pandas.
### Compiling the data
From a given position, Stockfish is looking at several lines into the future and determining the current position's eval based on that information.
To keep it manageable, I will average the eval of the lines that Stockfish looked into and used that result as the final evaluation.

In [1]:
# imports

import pandas as pd
import sqlite3
import chess
import json

In [None]:
# Connect to SQLite database
with sqlite3.connect('../assets/data/evaluations_avg.sqlite') as conn:
    cursor = conn.cursor()

    # Create table to store evaluations (we don't need depth, knodes or lines)
    cursor.execute('''CREATE TABLE IF NOT EXISTS evaluations_avg
                  (fen TEXT PRIMARY KEY, average_cp REAL)''')

    # Read JSONL file and insert data into the SQLite database
    with open('../assets/data/lichess_db_eval.jsonl', 'r') as jsonl_file:
        for line in jsonl_file:
                data = json.loads(line.strip())  # Load JSON object from each line
                fen = data["fen"]
                
                # looking at all the evals and averaging their cp
                total_cp = 0
                num_eval = 0
                for eval_data in data["evals"]:
                    for pv in eval_data["pvs"]:
                        try: # some evals don't have "cp" because mate is present. 
                            cp = pv["cp"]
                        except:
                            cp = pv["mate"] # treating mate just like cp; if mate is present, it just means the position is extremely winning for one side
                        cp = max(-15, min(cp, 15))  # Clamp the value of "cp" between -15 and 15
                        total_cp += cp
                        num_eval += 1
                        
                if num_eval > 0: #guard clause to avoid division by zero errors
                    average_cp = total_cp / num_eval
                else:
                    average_cp = 0

                # Insert average cp for the position into the database
                cursor.execute('''INSERT INTO evaluations_avg (fen, average_cp)
                                VALUES (?, ?)''', (fen, average_cp))

    # Commit changes and close connection
    conn.commit()


In [None]:
# make a new connection to avoid rerunning the cell above
conn_ = sqlite3.connect('../assets/data/evaluations_avg.sqlite')
df_sample = pd.read_sql(""" SELECT *
                    FROM evaluations_avg
                    LIMIT 10000
                 """, conn_)

df_sample.head(20)


### Remarks
I was hoping that we would get an evaluation that's closer to the truth: the first position in the df is the starting position
and it's evaluated at +3.11. In reality, it should evaluate to 0.5. Inspecting the data, it's far off the actual eval when I paste
the FEN in an online engine.

My hope is that with this large dataset to train on, we can make sensible decisions regardless of the fact that a portion of the data is off.
While +3 is wrong for the starting position, it is true that it's generally better for white. I'm curious to see if that is enough to make a sensible decision on which move to make.

We'll call this V1. If it doesn't work, there is always a database of games available on lichess that contains the exact evaluation from Stockfish.

## Translating FEN notation to neuron-speak
There is a precise way to achieve this in chess computing and that's with the use of BitBoards. These are 64-bits numbers that represent the presence or absence of a given piece. As there are 12 different pieces (6 for each colors), we need at least 12 bitboards to represent a complete board state. One approach would be to feed a (8,8,12) array to the neural network, but it will flatten it into 1D array anyway to read it and lose some spatial information. Another approach would be to leverage what other clever people have figured out; the direct translation of 12 bitboards together to an integer. Since the spatial information is going to get lost anyway, might as well try this one out.

In [None]:
# the chess module has a function that returns an int representation of the bitboards for a given position

board = chess.Board(chess.STARTING_FEN)
board.occupied

# I am not convinced that this is the correct integer representation, but as long as it's consistent, the
# model will learn properly

In [None]:
# loading the entire db into memory. Luckily it takes only 320 mb

df_full = pd.read_sql(""" SELECT *
                    FROM evaluations_avg
                        """, conn_)
df_full.head(20)

In [None]:
# adding a column with the int representation of the FEN position

def fen_to_bitboard_int(fen: str) -> int:
    """ return an int representation of the fen, returns None if there's an error"""
    try:
        return chess.Board(fen).occupied
    except ValueError:
        return None


df_full["bitboard_int"] = df_full["fen"].apply(fen_to_bitboard_int)
df_full.dropna(inplace=True) # drop some errors
df_full.drop(columns="fen", inplace=True) # we don't need the fen anymore

# Save the DataFrame as a new SQLite database
temp_conn = sqlite3.connect('../assets/data/train_data_v1.sqlite')
df_full.to_sql('train_data_v1', temp_conn, if_exists='replace', index=False)
temp_conn.close()

## Training the Neural Network
I'm doing this with Tensorflow using Keras interface. I want to go up to 64 neurons, 1 for every square. It's only looking at 1 feature and has 1 output.

In [26]:
# load the training df from sqlite
temp_conn = sqlite3.connect('../assets/data/train_data_v1.sqlite')
df_train = pd.read_sql(""" SELECT *
                    FROM train_data_v1
                        """, temp_conn)
temp_conn.close()
X = df_train["bitboard_int"]
y = df_train["average_cp"]
print(X)
print(y)

KeyboardInterrupt: 

In [2]:
from tensorflow import keras
from keras import layers
from sklearn.preprocessing import MinMaxScaler
import numpy as np
import tensorflow as tf

In [None]:

# scaling between 0 and 1
scaler = MinMaxScaler()
X = np.array(X)
X = X.reshape(-1, 1)
X = scaler.fit_transform(X)

model1 = keras.Sequential(
    [
        # Input layer -> 1 integer
        keras.Input(shape=(1,)),
        # Hidden layers
        layers.Dense(32, activation="relu"),
        layers.Dense(64, activation="relu"),
        layers.Dense(64), #inserting 1 batch norm
        layers.BatchNormalization(),
        layers.Activation("relu"),
        layers.Dense(64, activation="relu"),
        layers.Dense(32, activation="relu"),
        # output layer -> 1 raw integer (prediction)
        layers.Dense(1, activation="linear"),
    ]
)

model1.compile(loss="mean_absolute_error", optimizer="adam", metrics=["root_mean_squared_error"])

In [None]:
# first training session, batch size of 100 and 1 epoch seems like a good start
model1.fit(X, y, batch_size=50, epochs=1, validation_split=0.1)

# did 2 more epochs, it's not learning anymore, I'll run 200 batch size and see if it improves
# after 3 epochs of 200 batch size, it's barely improving. increasing to 500 batch size.
# 3 more epochs, it's actually getting worse. Let's try a different architecture and start over

# doing 1, 32, 64, 64, 64, 32, 1 and running 3 epochs with a starting batch size of 10. EDIT: it takes 10 min/epoch, ran only 1
# it's already better with 1 epoch. I'll increase the batch size to 50 so I don't die of old age.
# added a batchnormalization to try to improve accuracy
# scaled the data between 0 and 1, immediate increase!

In [None]:

def get_prediction(fen:str) -> int:
    """Takes a fen a returns a prediction int made by the model"""
    x = fen_to_bitboard_int(fen)
    # Reshape the array to have shape (1,)
    x_array = np.array([x])
    x_reshaped = x_array.reshape(1, -1)
    x_transformed = scaler.transform(x_reshaped)
    return model1.predict(x_transformed)[0][0]

print(get_prediction("rnbqr1k1/ppp2ppp/8/4N3/1P1bQ3/P7/5PPP/R1B1KB1R w KQ - 0 13"))



### Changing the way I translate the FEN
To no one's surprise, the approach of transforming the position to an integer makes it lose a lot of its meaning, and the result integer is so enormous that it needs to get scaled back down, which blurs it even further.
On a range of -15 to 15, the model was getting at best an average error of 13.3. Horrible.

Instead, I will compile all 12 bitboards into a single array to represent the piece positions and add a few numbers for game state details. Hopefully this will help the model paint a clearer picture of what's going on.

I am 80% sure that it will be necessary to retrain the model using the games database instead because they contain the accurate Stockfish evaluation instead of this averaging that I came up with. I don't want to this just yet though because that db is HUGE and I'm still looking for the best recipe to train this NN. 

In [3]:
def fen2bitboard(fen: str, to_bits: bool=False) -> np.array:
    """
    Returns bitboard [np 1D array(773)] from fen
    """
    # each square is assigned 12 bits to represent each piece, that's what the mapping is for
    mapping = {
                "p": 0,
                "n": 1,
                "b": 2,
                "r": 3,
                "q": 4,
                "k": 5,
                "P": 6,
                "N": 7,
                "B": 8,
                "R": 9,
                "Q": 10,
                "K": 11
                }
    
    # initialize the array with zeros
    bitboard = np.zeros(773, dtype=int)
    currIndex = 0
    
    try:
        position, turn, castling, _, _, _ = fen.split(" ") # keep only useful data
    except:
        position, turn, castling, _ = fen.split(" ")
    
    for ch in position:
        if ch == "/": # "/" represent rows, simply ignore that
            continue
        elif ch.isdigit(): # a digit means an empty space, skip ahead that many indexes
            currIndex += int(ch) * 12 # multiply by 12 because there are 12 bits used for each square
        else:
            bitboard[currIndex + mapping[ch]] = 1 # set the correct bit to 1
            currIndex += 12 # get to next bit
    
    # add details about the game state
    bitboard[768] = 1 if turn == "w" else 0
    bitboard[769] = 1 if "K" in castling else 0
    bitboard[770] = 1 if "Q" in castling else 0
    bitboard[771] = 1 if "k" in castling else 0
    bitboard[772] = 1 if "q" in castling else 0
    
    if to_bits:
        return np.packbits(bitboard)
    return bitboard

fen2bitboard(chess.STARTING_FEN, True)

array([ 16,   4,   0,  32,   0, 128,   4,   2,   0,  64,   1,   0, 128,
         8,   0, 128,   8,   0, 128,   8,   0, 128,   8,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   2,   0,  32,   2,   0,  32,
         2,   0,  32,   2,   0,  32,   0,  64,  16,   0, 128,   2,   0,
        16,   8,   1,   0,   4, 248], dtype=uint8)

In [5]:
# Now that we have a working bitboard maker, lets rewrite the training data
conn_ = sqlite3.connect('../assets/data/evaluations_avg.sqlite')
df_full = pd.read_sql(""" SELECT *
                    FROM evaluations_avg
                        """, conn_)
conn_.close()
df_full["bitboard"] = df_full["fen"].apply(fen2bitboard)
df_full.drop(columns="fen", inplace=True)
df_full.head(20)

Unnamed: 0,average_cp,bitboard
0,3.111111,"[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ..."
1,15.0,"[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ..."
2,9.545455,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
3,-0.222222,"[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ..."
4,15.0,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
5,14.913043,"[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ..."
6,1.666667,"[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ..."
7,-1.4,"[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ..."
8,9.571429,"[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ..."
9,0.0,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."


In [10]:
df_full

NameError: name 'df_full' is not defined

In [None]:

temp_conn = sqlite3.connect('../assets/data/train_data_v2.sqlite')
df_full.to_sql('train_data_v2', temp_conn, if_exists='replace', index=False)
temp_conn.close()

In [4]:
# now we have a problem, this new database doesn't fit into memory.
# trying a generator
import tensorflow as tf

# open a connection
conn = sqlite3.connect('../assets/data/evaluations_avg.sqlite')

def data_generator():
        
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM evaluations_avg")
    batch_size = 1000
    while True:
        rows = cursor.fetchmany(batch_size)
        if not rows:
            break
        X_batch = [fen2bitboard(row[0]) for row in rows]  # Feature column
        y_batch = [row[1] for row in rows]  # Label column
        yield X_batch, y_batch
        
# # close when done
# conn.close()
# cursor.close()


# create a tensor dataset
dataset = tf.data.Dataset.from_generator(generator = data_generator,
                                        output_signature= (
                                        tf.TensorSpec(shape=(773,), dtype=tf.int16), # bitboard
                                        tf.TensorSpec(shape=(1), dtype=tf.float32)  # average_cp
                                    )
                                    )


dataset.element_spec
# it doesn't work with SQL query, I'm running into graph errors and asks me to use only 1 thread for the sql queries

(TensorSpec(shape=(773,), dtype=tf.int16, name=None),
 TensorSpec(shape=(1,), dtype=tf.float32, name=None))

In [9]:
# let's adjust our model for this

model2 = keras.Sequential(
    [
        # Input layer -> 1 integer
        keras.Input(shape=(773,)),
        # Hidden layers
        layers.Dense(350, activation="relu"),
        layers.Dense(180, activation="relu"),
        layers.Dense(90), #inserting 1 batch norm
        layers.BatchNormalization(),
        layers.Activation("relu"),
        layers.Dense(64, activation="relu"),
        layers.Dense(32, activation="relu"),
        # output layer -> 1 raw integer (prediction)
        layers.Dense(1, activation="linear"),
    ]
)

model2.compile(loss="mean_absolute_error", optimizer="adam", metrics=["root_mean_squared_error"])
model2.summary()


# making train and test data
test_dataset = dataset.take(10000) # takes the first 10000 of the data for
train_dataset = dataset.skip(10000) # takes the remaining for training

In [12]:

model2.fit(train_dataset.batch(100), epochs=1, validation_data=test_dataset.batch(100))

[1m209893/209893[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m523s[0m 2ms/step - loss: 4.8852 - root_mean_squared_error: 8.5216 - val_loss: 6.6086 - val_root_mean_squared_error: 10.0279


<keras.src.callbacks.history.History at 0x23000ea0590>

In [13]:
model2.save("first_model_keep_training.keras")

In [None]:
# found out tensorflow has a built-in method for sql db...

dataset = tf.data.experimental.SqlDataset("sqlite", "../assets/data/train_data_v2.sqlite", 
                                          "SELECT average_cp, bitboard FROM train_data_v2", 
                                          (tf.float64, tf.string))





dataset

In [None]:
# let her rip
X = df_full["bitboard"].apply(lambda x: tf.convert_to_tensor(x))
y = df_full["average_cp"]

In [None]:

model2.fit(X, y, batch_size=50, epochs=3, validation_split=0.1)

In [4]:
# Open a connection
conn = sqlite3.connect('../assets/data/evaluations_avg.sqlite')

# Fetch the number of rows in the table
cursor = conn.cursor()
cursor.execute("SELECT COUNT(*) FROM evaluations_avg")
num_rows = cursor.fetchone()[0]
cursor.close()

# Initialize empty arrays
X_data = np.empty((num_rows, 773), dtype=np.int16)
y_data = np.empty((num_rows,), dtype=np.float32)

# Fetch data from the SQLite database in batches
cursor = conn.cursor()
cursor.execute("SELECT * FROM evaluations_avg")
batch_size = 1000
start_idx = 0
while True:
    rows = cursor.fetchmany(batch_size)
    if not rows:
        break
    for row in rows:
        X_data[start_idx] = fen2bitboard(row[0])
        y_data[start_idx] = row[1]
        start_idx += 1
cursor.close()
conn.close()


In [5]:

# Create TensorFlow dataset
dataset = tf.data.Dataset.from_tensor_slices((X_data, y_data))
dataset.element_spec

(TensorSpec(shape=(773,), dtype=tf.int16, name=None),
 TensorSpec(shape=(), dtype=tf.float32, name=None))

In [8]:
# completely crashes kernel... even if it separates the dataset into shard. idk why
dataset.save("../assets/data/train_dataset_v1")

In [6]:
model2 = keras.models.load_model("first_model_keep_training.keras")

# making train and test data
test_dataset = dataset.take(10000) # takes the first 10000 of the data for
train_dataset = dataset.skip(10000) # takes the remaining for training


In [8]:
# save after every epoch to make sure we don't lose progress
for _ in range(10):
    model2.fit(train_dataset.batch(20), epochs=1, validation_data=test_dataset.batch(100))
    model2.save("first_model_keep_training.keras")

[1m1049464/1049464[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1395s[0m 1ms/step - loss: 4.5089 - root_mean_squared_error: 8.0830 - val_loss: 5.6494 - val_root_mean_squared_error: 8.9893
[1m1049464/1049464[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1400s[0m 1ms/step - loss: 4.4555 - root_mean_squared_error: 8.0211 - val_loss: 5.8084 - val_root_mean_squared_error: 9.0748
[1m1049464/1049464[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1422s[0m 1ms/step - loss: 4.4323 - root_mean_squared_error: 7.9969 - val_loss: 5.6499 - val_root_mean_squared_error: 8.9295
[1m1049464/1049464[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1429s[0m 1ms/step - loss: 4.4154 - root_mean_squared_error: 7.9812 - val_loss: 5.6932 - val_root_mean_squared_error: 9.0842
[1m1049464/1049464[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1449s[0m 1ms/step - loss: 4.3997 - root_mean_squared_error: 7.9620 - val_loss: 5.7224 - val_root_mean_squared_error: 9.1301
[1m1049464/1049464[0m [32m━━━━━━━━━━━

In [6]:
#test_dataset.save("../assets/data/test_dataset")
model2 = keras.models.load_model("first_model_keep_training.keras")
model2.predict(fen2bitboard("r1b1k1nr/ppp2ppp/8/2b1P3/4p3/2P5/PP3PPP/RNBK1B1R w kq - 1 8"))

ValueError: Exception encountered when calling Sequential.call().

[1mInvalid input shape for input Tensor("sequential_1_1/Cast:0", shape=(32,), dtype=float32). Expected shape (None, 773), but input has incompatible shape (32,)[0m

Arguments received by Sequential.call():
  • inputs=tf.Tensor(shape=(32,), dtype=int32)
  • training=False
  • mask=None

In [29]:
# model needs to be loaded before hand
def get_pred(fen: str) -> np.array:
    pos = fen2bitboard(fen)
    pos = np.reshape(pos, (1, 773)) # reshaping because model expects a dimension for the batch size
    return model2.predict(pos)

get_pred("r1b1k1nr/pppp2pp/4pp2/8/7q/2NB4/PPP3PP/R1BQ1RK1 w - - 0 1")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step


array([[14.912656]], dtype=float32)

In [21]:
# DON'T FORGET TO CLOSE CONN WHEN DONE TRAINING

cursor.close()
conn.close()

NameError: name 'cursor' is not defined