# Testing The Pipeline 🚀

### 📂 Download Testing Dataset
In this step, we are downloading the testing dataset from GitHub. This dataset will be used to evaluate the pipeline's performance.

In [None]:
!git clone https://github.com/samryan18/chess-dataset

### 📁 Organizing the Dataset
This command moves the downloaded dataset from its original location to the workspace/tests directory for easier access and organization.

In [None]:
!mv chess-dataset/labeled_originals /workspace/tests

In [None]:
!rm -r chess-dataset

### 📦 Installing Dependencies
This step installs all the necessary Python libraries required for the pipeline. These libraries include tools for image processing, numerical operations, plotting, and chess-specific functionalities.

In [None]:
!pip install opencv-python-headless numpy pillow shapely ultralytics python-chess cairosvg matplotlib python-dotenv svglib reportlab python-Levenshtein numpy

### 📚 Importing Required Modules and Classes
In this step, we are importing essential Python modules and custom classes that will be used throughout the pipeline.

In [17]:
from tqdm import tqdm
import cv2
import sys
from Levenshtein import ratio
from glob import glob
import os
import numpy as np

In [2]:
from core.shared import CornerDetector, PerspectiveTransformer, GridCalculator, ChessPieceMapper, FENConverter, ImageGenerator

### 🔍 Comparing Processed Image with the Original
This function compares a processed chessboard image with its original by analyzing and transforming it into a format (FEN notation) that can be compared for accuracy.

In [3]:
def compare_image_with_original(image_path):
    original_image = cv2.imread(image_path)
    corners = CornerDetector.detect_corners(original_image)
    transformed_image = PerspectiveTransformer.four_point_transform(image_path, corners)
    ptsT, ptsL = GridCalculator.plot_grid_on_transformed_image(transformed_image)
    detections, boxes = ChessPieceMapper.chess_pieces_detector(transformed_image)
    predicted_fen = FENConverter.generate_fen(ptsT, ptsL, detections, boxes)

    real_fen = "/".join(image_path.split("/")[-1].split(".")[0].split("-"))

    similarity = ratio(real_fen, predicted_fen)

    return similarity

### 🖼️ Loading Test Images
This step retrieves all test images from the **./tests** directory to prepare them for evaluation.

In [4]:
test_images = glob("./tests/*.JPG", recursive=True)
print("Number of the testing image:", len(test_images))

Number of the testing image: 500


### ⚙️ Evaluating Test Images
In this step, we process each test image to evaluate its similarity to the original FEN notation. The results are stored in a list for further analysis.

In [9]:
results = []
for image in tqdm(test_images):
    try:
        results.append(compare_image_with_original(image))
    except Exception as e:
        pass

100%|██████████| 500/500 [24:34<00:00,  2.95s/it]


### ⚙️ Evaluation Metrics

In this step, we analyze the performance of the chessboard detection and FEN conversion system using several evaluation metrics. These metrics provide insights into the effectiveness and accuracy of the approach, helping to identify areas of strength and potential improvement.

#### **1. Total Test Images**
This metric represents the total number of images in the testing dataset. It gives an overview of the dataset size used for evaluation and forms the baseline for all other metrics.

#### **2. Processed Images**
The number of test images successfully processed by the system. A higher number indicates robustness and fewer errors during execution.

#### **3. Failed Images**
The count of images that could not be processed due to errors or exceptions. This metric is crucial for understanding system reliability and stability.

#### **4. Accuracy (Mean Similarity)**
The average similarity score between the predicted FEN notation and the original FEN notation across all processed images. This is the primary indicator of the system’s overall accuracy.

#### **5. Minimum Similarity**
The lowest similarity score achieved in the evaluation. This metric identifies outliers or instances where the system struggled the most.

#### **6. Maximum Similarity**
The highest similarity score achieved in the evaluation. It highlights the system's best performance and serves as a benchmark for improvement.

#### **7. Median Similarity**
The middle value of similarity scores, providing a robust measure of central tendency, especially useful in datasets with outliers.

#### **8. Standard Deviation**
This metric measures the spread or variability of the similarity scores. A lower standard deviation indicates consistent performance across the dataset, while a higher value suggests variability.

By analyzing these metrics, we gain a comprehensive understanding of the system's performance and identify areas that may require optimization.

In [19]:
if results:
    min_similarity = np.min(results)
    max_similarity = np.max(results)
    mean_similarity = np.mean(results)
    median_similarity = np.median(results)
    std_deviation = np.std(results)
    total_images = len(results)
    failed_images = len(test_images) - total_images

    accuracy = mean_similarity * 100

    print(f"{'Metric':<20}{'Value':<15}")
    print("-" * 35)
    print(f"{'Total Test Images':<20}{len(test_images):<15}")
    print(f"{'Processed Images':<20}{total_images:<15}")
    print(f"{'Failed Images':<20}{failed_images:<15}")
    print(f"{'Accuracy (Mean Sim.)':<20}{accuracy:>14.2f}%")
    print(f"{'Minimum Similarity':<20}{min_similarity:>14.2f}")
    print(f"{'Maximum Similarity':<20}{max_similarity:>14.2f}")
    print(f"{'Median Similarity':<20}{median_similarity:>14.2f}")
    print(f"{'Standard Deviation':<20}{std_deviation:>14.2f}")
else:
    print("\nNo results to analyze. Please check your data or debugging logs.")


Metric              Value          
-----------------------------------
Total Test Images   500            
Processed Images    448            
Failed Images       52             
Accuracy (Mean Sim.)         81.93%
Minimum Similarity            0.40
Maximum Similarity            1.00
Median Similarity             0.83
Standard Deviation            0.09
