Skip to content

Ashishkabaab/ChessDiffusion

Repository files navigation

Learning Chess Structure Without Rules: Discrete Diffusion on FEN Token Sequences

Ashish Behal · Department of Computer Science and Engineering · University at Buffalo


Overview

Can a generative model learn what a valid chess position looks like purely from examples — without ever being told the rules?

This project applies D3PM (Discrete Denoising Diffusion Probabilistic Models) to chess board positions encoded as token sequences. The model is trained on 500,000 puzzle positions from the Lichess database and learns to generate valid chess positions through a learned denoising process, with no explicit knowledge of chess rules.

The model achieves 69.5% valid position generation compared to a 16.3% random baseline — more than four times the rate of a model that knows nothing. It also closely matches the training distribution on pawn structure metrics, suggesting it has learned genuine chess structure implicitly from data.


Key Results

Source Valid Positions Valid %
D3PM (ours) 695 / 1000 69.5%
Random baseline 1000 / 6148 attempts 16.3%
Training data 1000 / 1000 100.0%

KL Divergence from Training Distribution (lower is better):

Metric D3PM Random
White pawn count 0.092 17.640
Passed pawn count 0.010 0.088
Material balance 0.572 0.577
Total material 3.217 5.582

Architecture

  • Model: DDiT-Llama Transformer — 30.8M parameters, 512-dimensional embeddings, 6 layers
  • Diffusion framework: D3PM with uniform noise corruption over 1,000 timesteps
  • Input representation: 72-token integer sequences encoding full FEN board state
  • Training data: 500,000 chess puzzle positions from the Lichess database
  • Optimizer: AdamW with linear warmup, learning rate 2×10⁻⁴, batch size 256
  • Training hardware: NVIDIA B200 GPU (~25 minutes for 10 epochs)

FEN Tokenization

Each chess position is encoded as a 72-token integer sequence:

  • Tokens 0–63: Board squares (one per square), each taking one of 13 values (empty or one of 12 piece types)
  • Tokens 64–71: Metadata — side to move, castling rights (4 binary tokens), en passant target, halfmove clock, fullmove counter

This tokenization follows the approach of Ruoss et al. (2024), who showed that clean structured token representations outperform raw PGN notation for neural chess models.


Evaluation Framework

Positions are evaluated at two levels:

Level 1 — Syntactic Validity Generated token sequences are converted back to FEN strings and validated using python-chess. This checks hard legality constraints: exactly one king per side, no pawns on the back rank, valid piece counts, and the non-moving side not being in check.

Level 2 — Distributional Realism KL divergence is computed between generated positions and the training distribution across four structural metrics: white pawn count, material balance, passed pawn count, and total material. A random baseline (positions generated by randomly placing pieces subject to hard legality constraints) provides a null model representing a system that learned nothing beyond the rules.


Repository Structure

ChessDiffusion/
├── prepare_data.py     # Download and extract FEN strings from Lichess puzzle CSV
├── tokenizer.py        # FEN ↔ token sequence conversion
├── train.py            # D3PM training loop
├── evaluate.py         # Level 1 validity + Level 2 distributional evaluation
├── decompress.py       # Decompress the Lichess .zst file
├── requirements.txt    # Python dependencies
└── README.md

Note: Model checkpoints (checkpoints/) and tokenized data (tokens.npy) are not included in this repository due to file size. See the setup guide below to reproduce from scratch.


Setup & Reproduction Guide

1. Clone the repository

git clone https://github.com/Ashishkabaab/ChessDiffusion.git
cd ChessDiffusion

2. Clone the d3pm dependency

git clone https://github.com/cloneofsimo/d3pm.git

This provides d3pm_runner.py and dit.py which are required to run the model. Excluded from this repo since it is a separate project.

3. Install dependencies

pip install -r requirements.txt

Requires Python 3.8+ and a CUDA-capable GPU for training. The model was trained on an NVIDIA B200 with CUDA 12.8 and PyTorch 2.12.

4. Download the Lichess puzzle database

wget https://database.lichess.org/lichess_db_puzzle.csv.zst

This file is approximately 1.5GB compressed.

5. Decompress the dataset

python decompress.py

6. Prepare training data

python prepare_data.py

This extracts FEN strings from the puzzle CSV and tokenizes them into 72-token sequences, saving the result to tokens.npy.

7. Train and save the model

python train.py

Training runs for 10 epochs over 500,000 FEN sequences. Checkpoints are saved to checkpoints/. On an NVIDIA B200, training takes approximately 25 minutes.

8. Evaluate

python evaluate.py

Generates 1,000 positions from the trained model and reports:

  • Level 1 validity rate
  • KL divergence across all four structural metrics
  • Game phase distribution (opening / middlegame / endgame)

Limitations

The model generates too many opening-like positions relative to the puzzle training distribution (67.2% opening vs 22.6% in training data). This likely reflects difficulty learning long-range dependencies — total material on the board is determined by the interaction of all 64 square tokens simultaneously, which is harder to learn than local pawn patterns.


Future Work

  • Train longer with a larger dataset
  • Design a structured transition matrix that encodes chess-specific domain knowledge into the D3PM corruption process
  • Explore conditional generation (e.g. generate positions matching a specific game phase)

References


License

MIT License. See LICENSE for details.

About

Training a model to learn chess board positions without explicitly being told the rules

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages