# Probabilistic Machine Learning - Project Report

* **Course:** Probabilistic Machine Learning (SoSe 2025)
* **Lecturer:** Alvaro Diaz-Ruelas
* **Student(s) Name(s):** Luca Thale-Bombien
* **GitHub Username(s):** Kavlahkaff
* **Date:** June 8, 2025
  **PROJECT-ID:** 13-2TLXXXX

---

## 1. Introduction

### Predicting Player Elo from Opening Moves

**Project Overview:**
The aim of this project is to leverage the Lichess Standard Rated Games dataset to predict a player’s Elo rating based solely on the first *n* moves of a game. By focusing on opening patterns, we seek to identify which early-game features correlate most strongly with player strength. Our hypothesis posits that stronger players adhere more closely to established opening theory, resulting in measurable patterns in their early moves.

**Data Source:**

* **Dataset:** Lichess Standard Rated Games (Huggingface)
* **Format:** Parquet files organized by year/month (≈20–30 GB per month)
* **Key Fields:** `UTCDate`, `UTCTime`, `White`, `Black`, `WhiteElo`, `BlackElo`, `movetext`, `ECO`, `Termination`, `TimeControl`

**Pipeline Overview:**

1. **Data Extraction:** Stream only relevant Parquet shards; filter incomplete or abnormal games.
2. **Feature Engineering:** Parse `movetext` for move sequences, evaluations, centipawn deltas, board‐state summaries (e.g., piece development, pawn structure), and accuracy metrics.
3. **Modeling:** Train classification models (Random Forest, SVM) to predict Elo decile (binned into ten categories).
4. **Evaluation:** Assess via accuracy, precision/recall/F₁, and interpret feature importances.

---

## 2. Data Loading and Exploration

We loaded \~50 000 filtered games from 10 Parquet shards using the Huggingface `datasets` library in streaming mode. Filtering criteria:

* Presence of `[%eval ...]` annotations in `movetext`
* At least 4 full moves
* Time control exactly `600+0`

After filtering, we collected 232 177 games.

### Elo Distribution

| Color | Mean Elo | Median Elo | Min Elo | Max Elo | Std Dev |
| :---: | :------: | :--------: | :-----: | :-----: | :-----: |
| White |  1536.36 |    1533    |   400   |   3119  |  390.93 |
| Black |  1536.52 |    1537    |   400   |   3095  |  390.99 |

### Game Length

| Statistic | Full Moves |
| :-------: | :--------: |
|   Count   |   232 177  |
|    Mean   |    33.28   |
|  Std Dev  |    14.89   |
|    Min    |      4     |
|    25%    |     23     |
|   Median  |     30     |
|    75%    |     41     |
|    Max    |     227    |

Histograms of White and Black Elo distributions confirmed a roughly normal distribution centered around 1536.

---

## 3. Data Preprocessing

1. **Move Extraction:** Parsed SAN moves and engine evaluations from `movetext`, capturing per-ply evaluation and centipawn deltas.
2. Feature computation: 'total_full_moves',
    'moves_before_castle_white',
    'moves_before_castle_black',
    'legal_move_diff_at_25',
    'legal_move_diff_at_50',
    'legal_move_diff_at_75',
    'legal_move_diff_at_100',
    'development_before_white',
    'development_before_black',
    'center_control_after_5_white',
    'center_control_after_5_black',
    'unique_pieces_after_10_white',
    'unique_pieces_after_10_black',
    'blunder_counts_white',
    'blunder_counts_black',
    'first_blunder_move_white',
    'first_blunder_move_black',
    'mistake_counts_white',
    'mistake_counts_black',
    'first_mistake_move_white',
    'first_mistake_move_black',
    'first_win_opportunity_white',
     'first_win_opportunity_black',
    'first_opp_win_opportunity_white',
    'first_opp_win_opportunity_black',
    'knight_edge_first_white',
    'knight_edge_first_black',
    'knight_edge_count_white',
    'knight_edge_count_black',
    'rook_7th_first_white',
    'rook_7th_first_black',
    'pawn_counts_at_white_25_iso',
    'pawn_counts_at_white_25_dbl',
    'pawn_counts_at_white_25_tri',
    'pawn_counts_at_white_50_iso',
    'pawn_counts_at_white_50_dbl',
    'pawn_counts_at_white_50_tri',
    'pawn_counts_at_white_75_iso',
    'pawn_counts_at_white_75_dbl',
    'pawn_counts_at_white_75_tri',
    'pawn_counts_at_white_100_iso',
    'pawn_counts_at_white_100_dbl',
    'pawn_counts_at_white_100_tri',
    'pawn_counts_at_black_25_iso',
    'pawn_counts_at_black_25_dbl',
    'pawn_counts_at_black_25_tri',
    'pawn_counts_at_black_50_iso',
    'pawn_counts_at_black_50_dbl',
    'pawn_counts_at_black_50_tri',
    'pawn_counts_at_black_75_iso',
    'pawn_counts_at_black_75_dbl',
    'pawn_counts_at_black_75_tri',
    'pawn_counts_at_black_100_iso',
    'pawn_counts_at_black_100_dbl',
    'pawn_counts_at_black_100_tri',

2. **Accuracy Metrics:** Converted evaluation changes to centipawns and computed mean positive delta per player (ACPL).
3. **Feature Flattening:** Normalized nested JSON features (e.g., move counts before castling, piece count diversification) using `pd.json_normalize` and tuple unpacking.
4. **Additional Features:** Included ECO codes, game termination methods, and Elo decile labels via `pd.qcut` on WhiteElo.

---

## 4. Probabilistic Modeling Approach

Frame Elo decile prediction as a multi-class classification problem. Models:

* **Random Forest Classifier:** Captures non-linear dependencies and interactions among engineered features; robust to outliers and mixed data types.
* **Linear SVM:** Provides a linear decision boundary baseline for comparison; efficient on high-dimensional data.

Preprocessing pipeline:

* Numeric features standardized via `StandardScaler`.
* Categorical (ECO, Termination) one-hot encoded via `OneHotEncoder`.

Hyperparameters for the Random Forest (via grid search):

* `n_estimators=1200`, `max_depth=70`, `min_samples_split=3`, `min_samples_leaf=4`, `max_features='auto'`.

---

## 5. Model Training and Evaluation

Split the data (80% train, 20% test) on Elo labels.

**Random Forest Results:**

* **Test Accuracy:** 20.7%
* **Macro‑averaged F₁‑score:** 0.183

| Buckets | Precision | Recall | F₁‑Score | Support |
|:-------:| :-------: | :----: | :------: | :-----: |
|    0    |   0.319   |  0.587 |   0.414  |   4664  |
|   ...   |    ...    |   ...  |    ...   |   ...   |
|    9    |   0.313   |  0.558 |   0.401  |   4622  |

The model shows higher performance at extreme deciles (0, 9), reflecting more distinct opening patterns among very low- and very high-rated players.

**Feature Importances (Top 10):**

1. Termination  (0.0403)
2. legal\_move\_diff\_at\_100  (0.0332)
3. legal\_move\_diff\_at\_75   (0.0326)
4. legal\_move\_diff\_at\_50   (0.0323)
5. total\_full\_moves        (0.0317)
6. legal\_move\_diff\_at\_25   (0.0316)
7. moves\_before\_castle\_black (0.0299)
8. moves\_before\_castle\_white (0.0298)
9. first\_mistake\_move\_white  (0.0290)
10. first\_mistake\_move\_black (0.0289)

These suggest that both game length and the evolution of engine evaluation late in the opening strongly signal Elo.

---

## 6. Results

* **Predictive Power of Openings:** Early move patterns, especially mistakes and evaluation swings in the first 25–100 plies, meaningfully differentiate player strength.
* **Game Termination:** Whether a game ended by resignation, checkmate, or other methods emerged as the single most informative categorical feature.
* **Model Limitations:** Overall accuracy remains modest (\~20%), indicating that opening moves alone capture only part of the variability in Elo.

---



## 7. Discussion

* **Interpretation:** Strong players exhibit more stable engine evaluations across opening moves and fewer early mistakes. Length of game and termination also reflect skill differences.

* **Limitations:**

  * Analysing only 230k games from the first days of one month may underrepresent rarer high- or low-rated patterns.
  * Exclusion of mid- and endgame features overlooks information beyond the opening.
  * Binning Elo into deciles simplifies continuous ratings but may lose granularity.

* **Extensions:**

  * Incorporate sequence models (e.g., HMMs or LSTMs) on SAN moves.
  * Expand to regression framing for predicting continuous Elo.
  * Integrate additional metadata (player country, time-of-day patterns).

---

## 8. Conclusion

This project demonstrates that opening moves contain measurable signals of player strength, as evidenced by our classification models. While opening features alone yield modest predictive accuracy, they highlight critical aspects—evaluation stability, mistake timing, and termination style—that distinguish player skill levels. Further work incorporating richer game phases and advanced sequence modeling is warranted.

---

## 9. References

* Lichess Standard Rated Games Dataset, Huggingface: [https://huggingface.co/datasets/Lichess/standard-chess-games](https://huggingface.co/datasets/Lichess/standard-chess-games)
*  @inproceedings{tijhuis2023predicting,
  title={Predicting chess player rating based on a single game},
  author={Tijhuis, Tim and Blom, Paris Mavromoustakos and Spronck, Pieter},
  booktitle={2023 IEEE Conference on Games (CoG)},
  pages={1--8},
  year={2023},
  organization={IEEE}
}
