# Introduction

Chess has fascinated humanity for centuries, not only as a game but as a powerful medium for cognitive development. In recent years, the intersection of chess with artificial intelligence and computer vision has opened new possibilities for automated chess systems. This blog explores one such innovative approach from the paper "[Development of an autonomous chess robot system using computer vision and deep learning](https://www.sciencedirect.com/science/article/pii/S2590123025001793)" by Truong Duc Phuc and Bui Cao Son.

The paper talks about building an autonomous chess robot which sees and understands chessboard just like a human player. This involves two main tasks
1. Detecting the chessboard
2. Recognizing the chess pieces

## Chessboard detection:

![Pipeline from the Paper](image1.png)

### Canny Edge detection: (Finding the Borders of Squares)
- It takes the chessboard image as input and converts it into grayscale
- A Gaussian blur is applied to smooth the image and reduce noise
- Then Canny algorithm is used to detect edges of the squares
 
### Line Detection: (Finding the Grid Lines)
- The edges detected by `Canny Edge Detection` are processed using the `Hough Line Transform` algorithm to identify straight lines
- It identifies both vertical and horizontal grid lines that make up the board
 
### Finding Intersection Points: (Corners of Squares)
- It looks for where the lines intersect to find the corners of each square
- A clustering algorithm groups nearby lines to avoid duplicates (e.g., multiple lines detected for the same edge). See Fig 7
 
### Flattening the Chessboard: (Correcting Perspective)
- If the camera isn’t perfectly overhead, the chessboard might look tilted or distorted
- The Homography Transform fixes this by warping the image to make it look like a perfect top down view

![Clustering of lines](image2.png)

# Training and using the Deep Learning model

## Training the deep learning model
In paper, they mentioned that they generated their own dataset, manually labeled the images and data augmentation techniques such as:
- Horizontal and vertical flipping
- Rotation at various angles
- Brightness and contrast variation
- Synthetic perspective distortion

The final dataset included 1,000 labeled images, split into:
- 60% training
- 20% validation
- 20% testing

## Model Architecture
They used the `YOLOv8x object detection model` for this task. YOLO (You Only Look Once) is a SOTA one stage detector that is capable of identifying objects and their bounding boxes in a single forward pass.

## Training Details
The model was trained using the `Ultralytics YOLOv8` training pipeline with the following parameters:
- Model: YOLOv8x pretrained on COCO, fine-tuned on our chess dataset
- Epochs: 50
- Batch Size: 16
- Image Size: 416×416
- Optimizer: SGD
- Loss Function: Combined classification + objectness + bounding box loss

During training, they observed that the model’s loss decreased steadily over the first 20 epochs and evened out around epoch 50. Precision and recall metrics also stabilized in the later stages.

## Evaluation and Results
The model’s performance was assessed using standard object detection metrics:
- Intersection over Union (IoU) to measure bounding box accuracy
- Precision and recall per class
- Confusion matrix to analyze class-wise errors

Most chess pieces were identified with high accuracy, getting IoU scores between 0.78 and 0.9, which is considered reliable for practical applications. As shown in the confusion matrix (Fig. 11), the model had minor misclassifications, mainly between:
- Black bishop vs black king
- Some white pieces confused with background due to low contrast

![Confusion Matrix](image3.png)

These could be further improved by expanding the dataset and refining the loss function for class imbalance.

# Development of Intuition (Custom Dataset + Training)

While the original paper uses a deep learning model for chess piece recognition, it does not provide access to the codebase or the dataset used for training. To gain a deeper understanding and to evaluate the approach independently, we designed and trained our own model from scratch.

## Preparing the dataset
We downloaded the dataset from Kaggle ("https://www.kaggle.com/datasets/imtkaggleteam/chess-pieces-detection-image-dataset"). This dataset has ~1300 images.

We performed following actions on the dataset:
- Remove first unused Classname from `data.yaml` file
- Cleaned useless images
- Converted `green and white` chessboard images to `black and white` images
- Created new images with `_bnw` appended to the original filename in the same folder

## Training our own deep learning model
We selected `YOLOv8m`, a medium size variant of the `YOLOv8 object detection` family.

To provide realistic visual conditions, we applied perspective augmentation with a low setting (`perspective=0.001`). It helped the model learn how chess pieces appear under slight 3D distortions.

Training was conducted on `2× NVIDIA® T4 GPUs` on Kaggle. We fine-tuned the model over 50 epochs, employing:
- Cosine learning rate scheduler (lr_cos) for stable convergence
- Dropout rate of 0.4, which provided regularization and helped reduce overfitting

After that, we evaluated it on a manually prepared test set. Our model achieved significantly higher accuracy and IoU scores compared to the results reported in the paper. The improvement is attributed to a better curated dataset, advanced augmentation strategies, and careful tuning of training hyperparameters.

The custom implementation by us validated the effectiveness of the deep learning approach, it also deepened our understanding of the practical challenges and trade offs involved in designing a realtime object detection system for physical applications.

## Integrating Chessboard Positioning via Custom Post-Processing Logic
To move beyond raw detection and enable real chess gameplay, we extended our system with a custom post-processing pipeline that maps the detected chess pieces to exact board positions.

While the original paper relied on Canny edge detection and line intersections for detecting the chessboard grid, we instead utilized the trained YOLOv8 model (best.pt) to directly predict bounding boxes around each piece in the image. This allowed us to bypass traditional edge-based board detection and operate purely from learned object locations.

Once bounding boxes are detected:
- We extract the (x, y) coordinates of each piece's center by calculating the midpoint of its bounding box.
- To correct vertical bias (e.g., tall pieces like queens or rooks being detected slightly into the square above), we apply a +25% height offset. This ensures the label stays within the correct square.

## Wrapping the Chessboard to Estimate Piece Positions
For mapping detected pieces to actual squares (e.g., "e4", "g6"), we applied a homography-based image wrap.
1. Chessboard Corner Detection: We used OpenCV’s advanced findChessboardCornersSB() function to accurately identify all 81 (9×9) grid intersections. This method is robust even under perspective distortion or imperfect lighting.
2. Perspective Correction and Warp: With the chessboard corners identified, we applied a perspective transform to warp the image into an 800×800 square grid. This effectively removes any distortion, allowing us to treat the board as a regular 8×8 matrix.
3. Transforming Detected Piece Coordinates: The YOLO-predicted (x, y) coordinates (from the original image) were converted into the warped grid using OpenCV’s cv2.perspectiveTransform. This step maps each piece's location from the original image space into the flattened board.
4. Final Position Assignment: Based on the warped coordinates, we calculated the exact board cell (row, column) that each piece occupies. This positional labeling allowed us to map detections to standard chess notation (e.g., "e2": "white pawn"), making the system compatible with chess engines or UI overlays.

![Our confusion matrix](image4.png)

# Creation of Dataset

We downloaded the dataset from Kaggle ("https://www.kaggle.com/datasets/imtkaggleteam/chess-pieces-detection-image-dataset"). This dataset has ~1300 images.

We performed following actions on the dataset:
- Remove first unused Classname from `data.yaml` file
- Cleaned useless images
- Converted `green and white` chessboard images to `black and white` images
- Created new images with `_bnw` appended to the original filename in the same folder
- Modified all three folders `train`, `valid`, and `test`

In [2]:
import os
import cv2
import numpy as np
import pandas as pd
import uuid
import shutil

def change_to_black_and_white(image_path, output_path):

    img_cv = cv2.imread(image_path)
    hls = cv2.cvtColor(img_cv, cv2.COLOR_BGR2HLS)
    lower_green = np.array([40, 40, 40])
    upper_green = np.array([100, 255, 255])

    hsv = cv2.cvtColor(img_cv, cv2.COLOR_BGR2HSV)
    mask_green = cv2.inRange(hsv, lower_green, upper_green)
    hls_modified = hls.copy()
    hls_modified[mask_green > 0, 2] = 0
    hls_modified[mask_green > 0, 0] = 0
    result_bgr = cv2.cvtColor(hls_modified, cv2.COLOR_HLS2BGR)
    result_rgb = cv2.cvtColor(result_bgr, cv2.COLOR_BGR2RGB)
    Image.fromarray(result_rgb).save(output_path)

folders=['train','valid','test']

for folder in folders:
    image_folder = f'Chess_pieces/{folder}/images'
    txt_folder = f'Chess_pieces/{folder}/labels'
    ann=pd.read_csv(f'Chess_pieces/{folder}/_annotations.csv')
    files_name = [f for f in os.listdir(image_folder)]

    for file_name in files_name:
        original_image_path = os.path.join(image_folder, file_name)
        original_txt_path = os.path.join(txt_folder, file_name.replace('.jpg', '.txt'))
        if not os.path.exists(original_txt_path):
            continue
        new_name = str(uuid.uuid4())
        new_image_path = os.path.join(image_folder, new_name+'.jpg')
        new_txt_path = os.path.join(txt_folder, new_name+'.txt')

        new_name_bnw = os.path.join(image_folder, new_name+'_bnw.jpg')
        change_to_black_and_white(original_image_path, new_name_bnw)
        new_name_bnw_txt = os.path.join(txt_folder, new_name+'_bnw.txt')
    
        os.rename(original_image_path, new_image_path)
        os.rename(original_txt_path, new_txt_path)
        shutil.copyfile(new_txt_path, new_name_bnw_txt)
        ann.loc[ann['filename'] == file_name, 'filename'] = new_name+'.jpg'
        bnw_row = ann.loc[ann['filename'] == new_name+'.jpg'].copy()
        bnw_row['filename'] = new_name+'_bnw.jpg'
        ann = pd.concat([ann, bnw_row], ignore_index=True)

    ann.to_csv(f'Chess_pieces/{folder}/_annotations.csv', index=False)
    print(f'Folder {folder} done')