# Overview
Here, we predict whether a human is going to make a mistake in a given chess position. More precisely, we estimate both the expected centi-pawn loss and the expected blunder (CP loss > 200) probability given a player's Elo rating. We base this analysis on a dataset of 23,000 games that have been fully evaluated by a computer (1.9 Million evaluations).

This model can be used in a variety of ways:
  1. If we hold the Elo constant, these predictions are, intuitively, a good measure a position's sharpness, since players will be more likely to make mistakes in sharp positions. 
  2. We can understand how players of certain strength play in certain positions. For instance, some positions are handled almost perfectly by elite players, but beginners will likely blunder in them.
  
At a high level, our model supplements the computer evaluation of a position. A widely held belief among chess players is that a computer evaluation is less important when a position is sharp. To quantify this belief, we need a definition of sharpness, and our predictive score is tailored to this purpose.

The predictions (CP loss or blunder probability) are obtained from a supervised model that simply predicts the same outcome in a large sample of games.

**We have turned this model into an interactive [web app](https://chessinsights.org/analysis/)! So you can just go there and see what you think of the results**. You can also check out [the database](https://chessinsights.org) and even add your own games for evaluation!

## Comparison to previous work

Existing research (Guid and Bratko, 2006, G-B going forward) has defined sharpness using the change in evaluation as an engine's depth increases. This measure has the advantage that it is very easy to obtain when running computer evaluations. We find that our measure is only weakly correlated with the G-B measure.

## Next steps

These predictions can be used to create "human evaluations" of a given chess position. Specifically, one can ask the following: Given a certain chess position, what is the best move I should play to win the game? The best move (highest win probability) might not have the highest computer evaluation, and a crucial factor is that humans make mistakes when the position is sharp.

## Notes on the model

Currently, our model tries to minimize the required prior knowledge about chess. The input is simply a position's FEN, and this is split into a large vector and fed into a two-stage dense neural network.

# Validation overview: Candidates 2018
Here, we present an overview of the validation and analyses with the data. You can also think of this as "fun statistics we can create with our model".

We use games from the Candidates 2018 to evalute our results.  We can use this tournament to investigate whether our results are reasonable. First, we present how sharply each participant played.

We find that Levon Aronian played the sharpest chess, and Wesley So played least sharply. The one surprising result is that one would expect Vladimir Kramnik to show up more highly in the list, because his play was widely deemed to be extremely dynamic.

In [1]:
player_stats

NameError: name 'player_stats' is not defined

## Game Statistics
We now investigate how sharply each game was played. Overall, these results feel sensible. A likely reason is that, by averaging predictions over a number of moves, this statistic eliminates noise in the prediction.

The comparison between the "sharpest" and "least sharp" games is stark: The sharpest games consistently involve huge complexity and attacking chess. The least sharp games usually involve quick piece exchanges and simplification into a drawn endgame.

## Sharpest Games

In [2]:
game_stats.head(10)

NameError: name 'game_stats' is not defined

## Least sharp games

In [3]:
game_stats.tail(10)

NameError: name 'game_stats' is not defined

## Example positions
This is the most intuitive validation: We simply show positions that are deemed either sharp or non-sharp, and chess players can get a feeling for whether these predictions are sensible.

In [4]:
df = df_cand.query('10 <= move_number <= 30').sort_values('pred_sharpness', ascending=False)

NameError: name 'df_cand' is not defined

### Sharp

In [5]:
df_sharp = df.head(100)
df_sharp.pred_sharpness.mean()

NameError: name 'df' is not defined

In [6]:
fen_to_svg(df_sharp.fen.iloc[0])

NameError: name 'fen_to_svg' is not defined

In [7]:
fen_to_svg(df_sharp.fen.iloc[1])

NameError: name 'fen_to_svg' is not defined

In [8]:
fen_to_svg(df_sharp.fen.iloc[2])

NameError: name 'fen_to_svg' is not defined

### Not sharp

In [9]:
df_nonsharp = df.tail(100)
df_nonsharp.pred_sharpness.mean()

NameError: name 'df' is not defined

In [10]:
fen_to_svg(df_nonsharp.fen.iloc[0])

NameError: name 'fen_to_svg' is not defined

In [11]:
fen_to_svg(df_nonsharp.fen.iloc[1])

NameError: name 'fen_to_svg' is not defined

In [12]:
fen_to_svg(df_nonsharp.fen.iloc[2])

NameError: name 'fen_to_svg' is not defined

# Analysis

In [1]:
# Ignoring warnings here.
import warnings
warnings.filterwarnings('ignore')

In [2]:
import pandas as pd
import numpy as np
import datetime
import psycopg2
import pandas.io.sql as sqlio
from sklearn.ensemble import RandomForestRegressor

import statsmodels
import statsmodels.formula.api as smf

import chess
import chess.svg
from IPython.display import SVG

from tensorflow import keras
from tensorflow.keras import layers
import tensorflow as tf
from tensorflow.keras import regularizers
import chess
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.layers import LeakyReLU

In [3]:
import queries
import functions as f
import imp
imp.reload(queries); imp.reload(f);

In [5]:
pd.set_option('precision', 2)
pd.set_option('display.width', 200)
pd.set_option('max_colwidth',90)

use_dev = False
connstring = f.CONNSTRING_DEV if use_dev else f.CONNSTRING_PROD

conn = psycopg2.connect(connstring)
params = {
  'move_number_start': 20
, 'move_number_end': 40
}

# Loading the raw data

In [6]:
df_games = sqlio.read_sql_query(queries.q_games, conn).rename(columns={'id': 'game_id'})
df_db = sqlio.read_sql_query(queries.q_db, conn).rename(columns={'id': 'database_id'})
df_players = sqlio.read_sql_query(queries.q_players, conn).rename(columns={'id': 'player_id'})
df_tournaments = sqlio.read_sql_query(queries.q_tournaments, conn).rename(columns={'id': 'tournament'})
df_attributes = sqlio.read_sql_query("SELECT * from game_attribute", conn)

In [7]:
df_eval_raw = sqlio.read_sql_query("SELECT * from move_eval", conn)
len(df_eval_raw)

2218574

In [138]:
df = df_games.merge(df_attributes
                    .query('attribute=="BlackElo"')[['game_id', 'value']]
                   .rename(columns={'value': 'elo_black'}))
df = df.merge(df_attributes
                    .query('attribute=="WhiteElo"')[['game_id', 'value']]
                    .rename(columns={'value': 'elo_white'}))
df = df.merge(df_players[['player_id', 'last_name']].rename(
    columns={'player_id': 'player_white_id', 'last_name': 'last_name_white'}), on='player_white_id')
df = df.merge(df_players[['player_id', 'last_name']].rename(
    columns={'player_id': 'player_black_id', 'last_name': 'last_name_black'}), on='player_black_id')
df = df.merge(df_tournaments[['tournament', 'name']].rename(
    columns={'name': 'tournament_name'}), on='tournament')
df = df.merge(df_db[['database_id', 'name']].rename(
    columns={'name': 'database_name'}), on='database_id')
df = df[df.elo_white!=""]
df = df[df.elo_black!=""]
for var in ['elo_white', 'elo_black']:
    df = df[pd.to_numeric(df[var], errors='coerce').notnull()].copy()
    df[var] = df[var].astype(int)
    

In [139]:
cols_add = ['tournament', 'tournament_name', 'database_name', 'game_id', 
            'elo_white', 'elo_black', 
            'last_name_white', 'last_name_black',
            'player_white_id', 'player_black_id']
df_eval = df_eval_raw.merge(df[cols_add])

In [140]:
df_eval.database_name.value_counts()

kingbase_random                       1198196
Rejkjavik Open 2018                     74934
Candidates 2011-2018                    19663
World Championships 1886-2014           19280
Wijk An Zee (Tata Steel) 2012-2018      16912
Supertournaments 2017                   14358
Name: database_name, dtype: int64

In [141]:
summary = {
    'Number of evaluations': len(df_eval),
    'Number of games': df_eval.game_id.nunique(),
    'Number of players': df_eval.player_white_id.nunique()
}
summary

{'Number of evaluations': 1343343,
 'Number of games': 16238,
 'Number of players': 7797}

In [142]:
tournament_cand = df_tournaments.query('name=="FIDE Candidates 2018"').tournament.iloc[0]
tournament_cand

132

In [143]:
tournaments_eval = [tournament_cand]

# Cleaning the data

In [144]:
MAX_CP_LOSS = 200

cploss_raw = (-1 + 2 * df_eval.is_white) * (df_eval.eval_best - df_eval['eval'])

df_eval['cploss'] = f.top_and_bottom(cploss_raw, 0, MAX_CP_LOSS)
df_eval['is_blunder'] = 1 * cploss_raw >= 200
df_eval['eval_lagged'] = df_eval.groupby('game_id').eval_best.shift(1)
df_eval['avg_cp_loss'] = df_eval.groupby('game_id').cploss.transform(np.mean)
df_eval = df_eval.query('is_white')

df_eval['fen_simple'] = [s[1] for s in df_eval.fen.str.split(' ')]

df_eval['is_excluded'] = df_eval.tournament.isin(tournaments_eval)

In [145]:
df_eval[['is_blunder', 'cploss']].mean()

is_blunder     0.02
cploss        20.97
dtype: float64

In [146]:
def fen_clean(fen):
    cleaned = fen
    for number in range(1, 9):
        cleaned = cleaned.replace(str(number), 'E' * number)
    cleaned = cleaned.replace('/', '')
    assert len(cleaned) == 64
    return cleaned

In [147]:
all_pieces = 'pPnNbBrRqQkK'
def fen_to_pieces(fen):
    pieces = fen_clean(fen)
    return [piece == p for piece in pieces for p in all_pieces]

In [148]:
castling_values = 'kKqQ'
def castling_rights(rights):
    return [v in rights for v in castling_values]

In [149]:
def fen_covars(fen):
    s = fen.split(' ')
    s_pos = s[1]
    s_castle = s[3]
    covars_pos = fen_to_pieces(s_pos)
    covars_castle = castling_rights(s_castle)
    is_white = s[2] == 'w'
    return covars_pos + covars_castle #+ [is_white]

In [150]:
ELO_SCALE = 3000

In [151]:
def fen_features(row):
    #return [row.elo_white]
    covars_fen = fen_covars(row.fen)
    return covars_fen + [row.elo_white / ELO_SCALE]

In [152]:
ix = df_eval.cploss.notnull()
df_reg = df_eval[ix] #

df_reg = df_reg.sample(len(df_reg)) #.sample(len(df_reg))
df_reg = df_reg[df_reg.cploss.notnull()].copy()
df_reg['val'] = (np.random.random(len(df_reg)) > 0.8) | (df_reg.is_excluded)

df_train = df_reg.query('not val').copy()
df_val = df_reg.query('val').copy()

X_train = np.array([fen_features(row[1]) for row in df_train.iterrows()]).astype(float)
X_val = np.array([fen_features(row[1]) for row in df_val.iterrows()]).astype(float)

In [153]:
outcomes = ['cploss', 'is_blunder']
stds = df_train[outcomes].std()
Y_train = np.array(df_train[outcomes] / stds)
Y_val = np.array(df_val[outcomes] / stds)
n_covars = X_train.shape[1]

In [154]:
stds

cploss        35.84
is_blunder     0.12
dtype: float64

In [155]:
X_train.shape, Y_train.shape, Y_val.shape

((538287, 773), (538287, 2), (136318, 2))

In [156]:
import itertools

In [157]:
def dict_product(dicts):
    return (dict(zip(dicts, x)) for x in itertools.product(*dicts.values()))

In [158]:
def get_model(x_train, y_train, x_val, y_val, params):
    model = keras.Sequential([
        layers.Dense(params['num_activations'], activation='relu', input_shape=[n_covars]), 
        keras.layers.Dropout(params['dropout']),
        layers.Dense(params['num_activations_2'], activation=params['activation']),
        keras.layers.Dense(Y_train.shape[1])
    ])
    optimizer = tf.keras.optimizers.Adamax(lr=params['lr'], beta_1=0.9, beta_2=0.999)
    model.compile(loss='mse', optimizer=optimizer, metrics=['mae', 'mse'])
    model.build()
    out = model.fit(x_train, y_train, epochs=params['epochs'], 
              batch_size=params['batch_size'], 
              validation_data=(x_val, y_val))
    return out, model

In [181]:
p = {
    'activation': ['relu'],
    'num_activations': [20, 50, 100, 200],
    'num_activations_2': [5, 20],
    'dropout': [0.2, 0.5],
    'batch_size': [100, 1000],
    'epochs': [10, 50],
    'lr': [0.1, 0.001]
}
ps = list(dict_product(p))

In [182]:
NUM_MODELS = 10

ps = list(np.random.choice(ps, size=NUM_MODELS, replace=False))
len(ps)

10

In [None]:
for p in ps:
    out, model = get_model(X_train, Y_train, X_val, Y_val, p)
    p['val_loss'] = out.history['val_loss'][0] * 100
    p['loss'] = out.history['loss'][0] * 100

In [184]:
num_estimated = len([p.get('val_loss', None) for p in ps if p.get('val_loss', None) is not None])
num_estimated

5

In [185]:
df = pd.DataFrame(ps)
df = df[df.val_loss.notnull()].sort_values('val_loss')
df.head(20)

Unnamed: 0,activation,num_activations,num_activations_2,dropout,batch_size,epochs,lr,val_loss,loss
4,relu,100,20,0.2,100,50,0.1,93.41,109.31
3,relu,20,20,0.2,100,50,0.001,93.68,98.25
1,relu,100,20,0.5,1000,10,0.001,93.92,100.18
2,relu,50,5,0.5,100,50,0.1,96.95,100.24
0,relu,200,5,0.5,100,10,0.1,97.11,108.18


In [159]:
optimizer = tf.keras.optimizers.Adamax(lr=0.03, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0)
model.compile(loss='mse', optimizer=optimizer, metrics=['mae', 'mse'])
model.build()

In [160]:
hist = model.fit(X_train, Y_train, epochs=50, batch_size=1000, validation_data=(X_val, Y_val), verbose=1)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


In [161]:
model.save('sharpness.h5')

# Validation

## What is the predictive power of the score?
Here, we look at the correlation between the score and the actual loss. We find that it stands at around 0.28. By holding the player's ELO constant, we obtain a measure of a position's sharpness, we find that this is still correlated decently with the outcome variables (0.24).

There is no clear sense of how good this performance is. In future work, we plan to investigate how this prediction compares to human estimates of sharpness.

In [173]:
ELO_ELITE = 2800
df_val['pred'] = model.predict(X_val)[:, 0] * stds['cploss']
X_val_2 = X_val.copy()
X_val_2[:, -1] = ELO_ELITE / ELO_SCALE
df_val['pred_sharpness'] = model.predict(X_val_2)[:, 0] * stds['cploss']
df_val.query('is_white')[['cploss', 'elo_white', 'pred', 'pred_sharpness']].corr() * 100

Unnamed: 0,cploss,elo_white,pred,pred_sharpness
cploss,100.0,-7.99,25.36,24.03
elo_white,-7.99,100.0,-10.78,-2.61
pred,25.36,-10.78,100.0,96.58
pred_sharpness,24.03,-2.61,96.58,100.0


In [167]:
df_val[['cploss', 'pred_sharpness']].mean()

cploss            20.77
pred_sharpness    18.16
dtype: float64

# Evaluating the results: 2018 Candidates

In [168]:
df_cand = df_val[df_val.tournament == tournament_cand]

## Sharpness by player

In [169]:
player_stats = (
    df_cand
    .query('is_white')
    .groupby('last_name_white')['pred_sharpness', 'cploss']
    .mean()
    .rename(columns={'pred_sharpness': 'Predicted Sharpness', 'cploss': 'CP Loss'}))

player_stats

Unnamed: 0_level_0,Predicted Sharpness,CP Loss
last_name_white,Unnamed: 1_level_1,Unnamed: 2_level_1
Aronian,22.15,26.84
Caruana,19.16,15.01
Ding,20.14,16.05
Grischuk,16.94,16.9
Karjakin,20.18,19.17
Kramnik,20.5,15.23
Mamedyarov,15.79,14.52
So,15.44,11.77


## Game statistics

In [170]:
group_vars = ['game_id', 'last_name_white', 'last_name_black']
game_stats = (
    df_cand.groupby(group_vars)[['pred_sharpness', 'cploss']]
    .mean()
    .sort_values('pred_sharpness', ascending=False)
    .reset_index()
    .rename(columns={'pred_sharpness': 'Predicted Sharpness', 'cploss': 'CP Loss'})
)

### Sharpest games

In [171]:
game_stats.head(10)

Unnamed: 0,game_id,last_name_white,last_name_black,Predicted Sharpness,CP Loss
0,3449,Caruana,Mamedyarov,29.51,16.02
1,3476,Karjakin,Kramnik,27.97,24.66
2,3466,Aronian,Caruana,27.9,35.51
3,3482,Aronian,Karjakin,27.03,24.25
4,3456,Kramnik,Caruana,26.77,25.62
5,3443,Aronian,Ding,26.49,38.32
6,3468,Kramnik,Ding,25.99,10.92
7,3483,Ding,Grischuk,25.79,23.96
8,3442,Karjakin,Mamedyarov,23.74,24.54
9,3492,Caruana,Aronian,21.86,29.21


### Least sharp games

In [172]:
game_stats.tail(10)

Unnamed: 0,game_id,last_name_white,last_name_black,Predicted Sharpness,CP Loss
46,3460,Ding,Mamedyarov,14.07,11.42
47,3479,Ding,So,14.04,8.0
48,3491,So,Karjakin,14.03,7.0
49,3484,So,Mamedyarov,13.95,12.51
50,3458,Caruana,Karjakin,13.93,11.03
51,3465,Karjakin,So,13.92,11.0
52,3471,Mamedyarov,Karjakin,13.87,11.93
53,3473,So,Grischuk,13.75,6.56
54,3475,Aronian,Mamedyarov,13.38,9.76
55,3480,Grischuk,Karjakin,13.25,7.11


In [178]:
(df_val.groupby(group_vars)[['pred_sharpness', 'cploss']].mean()
    .sort_values('pred_sharpness', ascending=False)
    .reset_index())

Unnamed: 0,game_id,last_name_white,last_name_black,pred_sharpness,cploss
0,21510,Hornsgaard,Lindfeldt,91.91,133.33
1,18807,Nogueiras Santiago,Calderin Gonzalez,91.16,120.00
2,20089,Zakhartsov,Loginov,78.87,27.67
3,16254,Ristic,Feletar,78.48,73.20
4,26760,Petrov,Radovanovic,73.37,0.00
...,...,...,...,...,...
16138,27715,Lederman,Van Riemsdijk,12.67,25.00
16139,18346,Ibar,Suba,12.67,5.50
16140,23340,Narciso Dublan,Motylev,12.67,46.00
16141,17499,Blechzin,Titan,12.66,9.00
