ChessCompress uses novel, hopefully better than state-of-the-art chess compression techniques to provide an efficient way to store chess games.
Note: This code is slow, but could be sped up by implementing the same techniques in a faster language.
Finally, a rough writeup of the techniques used lives here: https://compress.max.fan/.
To calculate the theoretical entropy rate, I averaged the Shannon information of each move in the dataset using each prediction function, assuming that the prediction function is perfect. The entropy of each move
Note that we assume that the function p is a probability mass function. The data used consists of the first 10000 Lichess games played in the year 2018.
This allows us to figure out our maximum compression rate (bits/move).
To reproduce these stats, run:
python3 stats.py
There are tests written for most of the important functions/classes.
- better markup of README and more formal language
- better explanation of calculations
https://lichess.org/blog/Wqa7GiAAAOIpBLoY/developer-update-275-improved-game-compression (previous state of the art)