# **Project Description**
As an avid chess player, I follow several chess content creators for educationak and entertainment purposes. One such creator is Levy Rozman, AKA Gothamchess, and as the most popular chess content creator on Youtube (as of January 2025), Levy's content spans a wide variety of topics including recaps of games played by world class players, chess opening tutorials, and comedic showcases of extremely low-quality games riddled with hilarious mistakes.

One day, I was watching a video covering one such low-quality game, titled "Bye." (for those who are interested, please see the video at this link: https://www.youtube.com/watch?v=cIPpHQTrcsU). As he often does, Levy was commenting on the difference between the Stockfish evaluation of a certain position (Stockfish being the world's strongest engine) and the way the game was likely to go. More specifically, he was talking about a position where Stockfish was evaluating the position to be minus 5.1 (meaning Black was winning, with an advantage equivalent to being 5 pawns up). He then said "... I wish we had a stockfish filter and we could tell Stockfish, 'hey, these players are 500', and then Stockfish would be smart enough to be like 'oh, then it's like plus 1'. Wouldn't that be genius?". Immediately, I agreed; **it would be genius**. This is what caused me to pursue this project.

Here, I attempt to create a model similar to the one Levy spoke of in this video: one that adjusts its evaluation based on the strength of the players, assuming the players are around the same rating. The hope is that this can become an effective tool for improvement in chess. Depending on the accuracy of this model, I envision being it distinguishing between perfect play and, for example, moves I'm likely to make as an intermediate level player, thereby informing me of mistakes I'm likely to make at my level.

### **Note:**
I primarily play on chess.com and, given the choice, would prefer to use chess.com game data for model training due to game ratings being slightly less inflated than on Lichess. However, chess.com's API unfortunately has very strict rate limiting, with mass downloading being discouraged and often resulting in bans. Additionally, it is only possible to download games per player.

In contrast, Lichess, the second most popular online chess platform, actively encourages data analysis in the following ways:


*   Providing complete monthly database dumps ([here](https://database.lichess.org))
*   More permissive API with clear documentation and rate limits
*   Direct bulk download options through API and web interface

Thus, I'll be using Lichess games as training data for this project.




## Mounting to Google Drive

Before continuing, we connect Colab to Google Drive. If anyone other than myself wishes to run this code, you will need to follow these steps:
1.   Upload a copy of this project and all dependancies (with the same directory system as on my github's [Guess_The_Eval](https://github.com/daichijoseph/Guess_The_Eval) repository) to your google drive
2.   Change the `PROJECT_PATH` variable based on the directory you place the code in
3.   Run the cell below, choose and authenticate your google account, and accept the requested permissions (no viruses, trust)

In [3]:
from google.colab import drive
drive.mount('/content/drive')
PROJECT_PATH = '/content/drive/MyDrive/GuessTheEval'
%cd $PROJECT_PATH

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/MyDrive/GuessTheEval


From here on, we'll be assuming that the most recent version of Stockfish at this time (Stockfish 17) is the standard for "perfect play".

The raw output of the model we create will likely be a set of probabilities representing the chances of White winning, White and Black drawing, and Black winning. We will then need to convert those probabilities to the aforementioned standard evaluation units (White's pawn advantage). Thus, a sensible place to start is to find the function (and inverse of the function) that converts pawn advantage to probability of white winning. As consistent with intuition, this function is known to be a logistic curve with the following equation,
<br>
<br>
$$W = \frac{1}{1+e^\frac{-(a-P)}{b}}$$
<br>
where $W$ is the probability of White winning, $P$ is White's advantage in centipawns (i.e. $P=100$ corresponds to an evaluation of +1), and $a$ and $b$ are parameters that influence the shape of the curve. Normally, these parameters must be found empirically through a process involving games Stockfish plays against itself. Thanfully, Lichess has done that part for us, and uses a version of the conversion formula with the following coefficients:
<br>
<br>

$$
W = \frac{1}{100}\left(50 + 50\left(\frac{2}{1+e^{-0.00368208*cp}} - 1\right)\right)
$$

<br>
where $W$ is the probability of white winning and $cp$ is the centipawn loss (the same as P in the general formula). Since we're going to be using Lichess data, I believe it is reasonable to use this formula.

Solving for cp, we have the following:
<br>
<br>
$$
\begin{align}
\frac{100W - 50}{50} &= \frac{2}{1+e^{-0.00368208*cp}} - 1 \\
\frac{100W}{50} &= \frac{2}{1+e^{-0.00368208*cp}} \\
\frac{50}{100W} &= \frac{1+e^{-0.00368208*cp}}{2} \\
e^{-0.00368208*cp} &= \frac{1-W}{W} \\
cp &= \frac{-\ln\left(\frac{1-W}{W}\right)}{0.00368208}
\end{align}
$$
<br>
Thus, we get our inverse function for converting probability of white winning to centipawn loss:
<br>
<br>

$$ cp = \frac{-\ln\left(\frac{1-W}{W}\right)}{0.00368208} $$

<br>





**NOTE TO SELF**

When it comes down to it, we're not going to be analyzing games but rather positions. Hence, one interesting (perhaps original) idea is to **train the model on moves by black and moves by white *separately***.

At the very least, the model will have a feature labeling whether it's black or white to move (likely ply), and this may be enough. However, while mentality is often the same whether a player is playing white or black (especially for weaker players and/or when one person has a significant advantage), as a chess player, I know anecdotally that the mindset of someone playing White tends towards aggression and making something happen, whereas that of someone playing Black tends to be more defensive.

In [6]:
import LichessData

client = LichessData.create_client()
games = LichessData.get_games(Client=client)
print(games)

AttributeError: module 'berserk' has no attribute 'client'