# Correction from Label Studio and Model Evaluation

This notebook SHOULD NOT be launched until any corrections have been made in Label Studio.
The purpose of the various functions is to be able to extract the data to be analysed after training in order to assess the robustness of the model.

**Warning 1**
The requirement is to use, or create, a Label Studio account: (https://labelstud.io/guide/install.html) and to have followed the workflow of the previous notebooks or at least to have created similar files.

**Warning 2**
The export format for corrections from Label Studio MUST be csv ONLY.
Label Studio's YOLO export format does not allow you to keep the names of the images as they were imported:  in the context of this workflow, these are the URLs of the images from which this script can retrieve the name of the image and generate .txt files with the same name as the image (thus allowing you to use this new data for a new training session).

**Warning 3**
It is essential to complete the "Labeling Interface" by specifying strictly the same class names (case, special characters, etc.) as those declared in the .json file.

**Warning 4**
If there is some issue in Label Studio, you can change the labeling results' name from "predictions" to "annotations", since Label documentation explain that you can't change the bounding box coordinates for prediction, only for detection : https://labelstud.io/guide/predictions.html#Predictions-are-read-only.
But I tried and you can change the predicted bounding boxes and export the new data with the changes without any problem.

**Notice concerning use** 
Any use, even partial, of the content of this notebook must be accompanied by an appropriate citation.

&copy; 2023 Marion Charpier

## Environment

In [60]:
import os
import glob
import json
import ast

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import metrics

import sys
sys.path.append(os.path.join('..', 'modules'))

from class_names_functions import get_labels, get_class_name, get_class_code
from transform_coordinates_functions import from_ls_to_yolo
from folders_path import get_results_folder
from manipulate_files import open_json_file, save_json_file, get_files, exclude_training_images, load_data_from_files

## Get the Label Studio correction in new .txt files

### Generate new txt files with correct bounding boxes 

In [61]:
def get_corrected_label(img_dataset_folder, yolo_model_folder):
    
    corrections_folder = img_dataset_folder.replace('image_inputs/eval_images', 'annotations/prediction_corrections')

    # Recompose the path to the model results folder
    results_folder = os.path.join(
            os.path.dirname(os.path.dirname(yolo_model_folder)), 
            'predict',
            img_dataset_folder.split('/')[-3] + '_' + os.path.basename(yolo_model_folder))

    # If it doesn't exist, create the "correctedLabels" folder for corrected labels
    os.makedirs(os.path.join(results_folder, 'correctedLabels'), exist_ok = True)

    # Retrieve labels from the model folder
    labels = get_labels(os.path.join(yolo_model_folder, 'labels.txt'))

    # Retrieve corrected JSON files as a list and open them
    corrected_files = [file for file in os.listdir(corrections_folder) if not file.startswith('.')]
    print(corrected_files)
    
    for corrected_file in corrected_files:
        corrections = open_json_file(os.path.join(corrections_folder, corrected_file))

        for result_item in corrections['result']:
            result_item.pop('score', None)
        save_json_file(os.path.join(corrections_folder, corrected_file), corrections)
        
        # Retrieve image name from corrected annotations file
        name = corrections['task']['data']['image']
        img_name = '.'.join(os.path.basename(name).split('.')[:-1])
        result = corrections['result']

        # Check if an annotation box has been deleted during correction
        for item in result:
            if "id" not in item:
                print("Prediction box erased.")
        
            else:
                # Create a .txt file with annotation data
                with open(os.path.join(results_folder, 'correctedLabels', img_name + '.txt'), 'w') as yolo_correction:
                    for i, result_item in enumerate(result):
                        
                        # Retrieve annotation box coordinates
                        value = result_item['value']
                        x, y, w, h = from_ls_to_yolo(value['x'], value['y'], value['width'], value['height'])

                        # Retrieve the annotation label and associate it with its number in the "labels.txt" file
                        class_name = value['rectanglelabels'][0]
                        class_id = get_class_code(class_name, labels)
                        
                        yolo_correction.write(f"{class_id} {x} {y} {w} {h}\n")
                        
        # print(f"Corrections made to the {img_name} image file have been saved.")
                        
    print(f"Corrected annotations have been successfully converted and saved to to the results folder {os.path.join(results_folder, 'correctedLabels')}.")


# Generate the corrected files in YOLO format
get_corrected_label(img_dataset_folder, yolo_model_folder)

['1890', '1891', '1892', '1886', '1888', '1889', '1887']
Corrected annotations have been successfully converted and saved to to the results folder /Users/marioncharpier/Documents/TORNE-H/GitHub/TiamaT/output/runs/predict/dragon_project_20250418_1n_i640_e10_b-1_w8/correctedLabels.


In [62]:
def get_corrected_label_files(img_dataset_folder, yolo_model_folder):
    """
    This function processes corrected annotation files in JSON format and converts them into YOLO-compatible `.txt` files.
    It ensures that any corrections made to bounding boxes and labels are reflected in the YOLO format, 
    and saves the corrected annotations in a `correctedLabels` folder.

    :param img_dataset_folder: 
        - Type: str
        - Description: The path to the folder containing the dataset images (unannotated images). 
                       This folder is used to locate the corresponding corrections folder.
    :param yolo_model_folder: 
        - Type: str
        - Description: The path to the folder containing the trained YOLO model. This folder is used to 
                       access the `labels.txt` file for retrieving class names and IDs.
    
    :return: 
        - Type: None
        - Description: This function does not return a value. It creates `.txt` files with the corrected 
                       annotations in YOLO format and saves them in the `correctedLabels` subdirectory of 
                       the results folder.

    This function automates the conversion of corrected annotations into a format suitable for YOLO training, 
    ensuring that all changes made during annotation review are correctly applied.
    """

    corrections_folder = img_dataset_folder.replace('image_inputs/eval_images', 'annotations/prediction_corrections')

    # Recompose the path to the model results folder
    results_folder = os.path.join(
            os.path.dirname(os.path.dirname(yolo_model_folder)), 
            'predict',
            img_dataset_folder.split('/')[-3] + '_' + os.path.basename(yolo_model_folder))

    # If it doesn't exist, create the "correctedLabels" folder for corrected labels
    os.makedirs(os.path.join(results_folder, 'correctedLabels'), exist_ok = True)

    # Retrieve labels from the model folder
    labels = get_labels(os.path.join(yolo_model_folder, 'labels.txt'))

    # Retrieve corrected JSON files as a list and open them
    corrected_files = [file for file in os.listdir(corrections_folder) if not file.startswith('.')]
    
    for corrected_file in corrected_files:
        corrections = open_json_file(os.path.join(corrections_folder, corrected_file))

        for result_item in corrections['result']:
            result_item.pop('score', None)
        save_json_file(os.path.join(corrections_folder, corrected_file), corrections)
        
        # Retrieve image name from corrected annotations file
        name = corrections['task']['data']['image']
        img_name = '.'.join(os.path.basename(name).split('.')[:-1])
        result = corrections['result']

        # Check if an annotation box has been deleted during correction
        for item in result:
            if "id" not in item:
                print("Prediction box erased.")
        
            else:
                # Create a .txt file with annotation data
                with open(os.path.join(results_folder, 'correctedLabels', img_name + '.txt'), 'w') as yolo_correction:
                    for i, result_item in enumerate(result):
                        
                        # Retrieve annotation box coordinates
                        value = result_item['value']
                        x, y, w, h = from_ls_to_yolo(value['x'], value['y'], value['width'], value['height'])

                        # Retrieve the annotation label and associate it with its number in the "labels.txt" file
                        class_name = value['rectanglelabels'][0]
                        class_id = get_class_code(class_name, labels)
                        
                        yolo_correction.write(f"{class_id} {x} {y} {w} {h}\n")
                        
        # print(f"Corrections made to the {img_name} image file have been saved.")
                        
    print(f"Corrected annotations have been successfully converted and saved to to the results folder {os.path.join(results_folder, 'correctedLabels')}.")

## Get results in csv

### Get a list of images used for training the model

In [63]:
def get_img_from_training(yolo_model_folder, img_dataset_folder):
    """
    This function returns a list of images that may have been used to train a YOLO model. 
    It compares the images in the dataset folder with those listed in the `traindata.txt` file 
    found in the `dataset_statistics` subdirectory of the YOLO model folder.

    **Warning**: This function assumes that the training and prediction data have similar naming conventions 
    and that the `traindata.txt` file accurately reflects the images used during training.

    :param yolo_model_folder: 
        - Type: str
        - Description: The absolute path to the folder containing the YOLO model and its associated training data. 
                       The function uses the `traindata.txt` file in the `dataset_statistics` subdirectory of this folder 
                       to retrieve the list of training images.
    :param img_dataset_folder: 
        - Type: str
        - Description: The path to the folder containing the dataset images. This folder is searched to find images 
                       that match those listed in the `traindata.txt` file.
    
    :return: 
        - Type: list
        - Description: A list of image filenames that were used to train the model. If no matching images are found, 
                       an empty list is returned.

    This function is useful for verifying whether specific images in a dataset were part of the training process, 
    helping ensure consistency between training and inference datasets.
    """    
    train_data_list = os.path.join(yolo_model_folder, 'dataset_statistics/training_dataset.txt')
    
    train_data = []
    
    with open(train_data_list, 'r') as train_data_file:
        for line in train_data_file:
            img_path = line.strip()
            train_data.append(img_path)
      
    image_files = [file for file in os.listdir(img_dataset_folder) if file.endswith(('.jpg', '.png', 'tiff'))]
    
    matching_images = [image_name for image_name in train_data if os.path.basename(image_name) in image_files]
   
    if matching_images:
        print("The following images were used to train the model:")
        print(matching_images)
        
    else:
        print(f"No images in the {img_dataset_folder} folder were used to train the model {os.path.basename(yolo_model_folder)}.")
    
    return matching_images

### Calculate IoU

In [None]:
def calculate_iou(box1, box2):
    """
    This function calculates the Intersection over Union (IoU) between two bounding boxes. IoU is a measure 
    of the overlap between two bounding boxes and is commonly used to evaluate the accuracy of object detection models.

    The function is adapted from the 'bb_intersection_over_union' function on PyImageSearch, which uses 
    bounding box coordinates in (x_min, y_min, x_max, y_max) format. The adaptation accounts for the 
    fact that YOLOv8 provides bounding box coordinates in relative format (x_center, y_center, width, height).

    :param box1: 
        - Type: list or tuple
        - Description: The first bounding box defined as a list or tuple of values [class_id, x_center, y_center, width, height]. 
                       The coordinates are relative to the image dimensions.
    :param box2: 
        - Type: list or tuple
        - Description: The second bounding box defined as a list or tuple of values [class_id, x_center, y_center, width, height]. 
                       The coordinates are relative to the image dimensions.
    
    :return: 
        - Type: float
        - Description: The IoU value, which ranges from 0 to 1. A value of 0 indicates no overlap, 
                       while a value of 1 indicates perfect overlap between the two bounding boxes.

    This function is useful for evaluating object detection models and determining how well the predicted bounding boxes 
    match the ground truth annotations.
    """
    
    # Convert coordinates (x, y, w, h) in (x_min, y_min, x_max, y_max)
    box1_x_min = box1[1] - box1[3] / 2
    box1_y_min = box1[2] - box1[4] / 2
    box1_x_max = box1[1] + box1[3] / 2
    box1_y_max = box1[2] + box1[4] / 2
    
    box2_x_min = box2[1] - box2[3] / 2
    box2_y_min = box2[2] - box2[4] / 2
    box2_x_max = box2[1] + box2[3] / 2
    box2_y_max = box2[2] + box2[4] / 2
    
    # Calculate coordinates (x,y) of the overlap
    x_min = max(box1_x_min, box2_x_min)
    y_min = max(box1_y_min, box2_y_min)
    x_max = min(box1_x_max, box2_x_max)
    y_max = min(box1_y_max, box2_y_max)
    
    # Calculate the area of the overlap
    intersection_area = max(0, x_max - x_min + 1) * max(0, y_max - y_min + 1)

    # Calculer the area of the two bounding boxes
    box1_area = (box1_x_max - box1_x_min + 1) * (box1_y_max - box1_y_min + 1)
    box2_area = (box2_x_max - box2_x_min + 1) * (box2_y_max - box2_y_min + 1)
    
    # Calculate the Intersection over Union (IoU)
    iou = intersection_area / float(box1_area + box2_area - intersection_area)
    
    return iou

### Get the best iou macthes

In [65]:
def get_best_iou_matches(predictions, corrected_predictions):
    """
    This function finds the best matching corrected bounding box for each predicted bounding box based on 
    the Intersection over Union (IoU) value. For each prediction, it calculates the IoU with all corrected 
    bounding boxes and selects the one with the highest IoU as the best match.
    
    :param predictions: 
        - Type: list of str
        - Description: A list of predicted bounding boxes in YOLO format (class_id, x_center, y_center, width, height). 
                       Each bounding box is represented as a string of space-separated values.
    :param corrected_predictions: 
        - Type: list of str
        - Description: A list of corrected bounding boxes in YOLO format (class_id, x_center, y_center, width, height). 
                       Each bounding box is represented as a string of space-separated values.
    
    :return: 
        - Type: list of tuples
        - Description: A list of tuples, where each tuple contains:
            - The predicted bounding box (str)
            - The best matching corrected bounding box (str) based on the highest IoU
            - The IoU value (float) for the best match
    
    This function is useful for evaluating the performance of a model by comparing its predictions with manually corrected 
    ground truth annotations, identifying the best matches based on spatial overlap.
    """

    # Create an empty list for the best matches
    best_matches = []

    for prediction in predictions:
        prediction_box = prediction.split(" ")
        prediction_box = [float(coord) for coord in prediction_box]
        best_iou = 0
        best_correction = None

        for correction in corrected_predictions:
            correction_box = correction.split(" ")
            correction_box = [float(coord) for coord in correction_box]

            iou = calculate_iou(prediction_box, correction_box)
            
            if iou > best_iou:
                best_iou = iou
                best_correction = correction
        
        best_matches.append((prediction, best_correction, best_iou))
    
    return best_matches

### Load data from files

### Save results in a csv

In [66]:
def save_results_to_csv(rows, output_file):
    """
    This function saves a list of generated and corrected annotations into a CSV file. If no annotations are provided, 
    it logs a message indicating that no corrections were made and exits the function. Otherwise, it creates a 
    DataFrame from the provided data, sorts it by the 'Filename' column, and writes it to the specified CSV file.
    
    :param rows: 
        - Type: list of dict
        - Description: A list of dictionaries containing the generated and corrected annotations. Each dictionary should 
                       represent a single annotation entry with keys as column names.
    :param output_file: 
        - Type: str
        - Description: The path where the CSV file will be created. This file will store the sorted annotations for easy review.
    
    :return: 
        - Type: None
        - Description: This function does not return a value. It either creates the CSV file or prints a message if no 
                       annotations are provided.
    
    This function is useful for storing annotation results in a structured format, facilitating further analysis or 
    review of corrected and generated annotations.
    """

    if not rows:
        print('No correction made')
        return
    df = pd.DataFrame(rows)
    df_sorted = df.sort_values('Filename')
    df_sorted.to_csv(output_file, sep=';',index=False)
    print(f"The {output_file} file has been created.")

### Generate the csv with the results

In [None]:
def get_csv_results(img_dataset_folder, yolo_model_folder, all_results):
    """
    This function generates correction results in CSV format by evaluating each annotation as True Positive (TP), 
    False Positive (FP), or False Negative (FN). It compares the predictions made by a YOLO model against 
    manually corrected annotations and stores the results in a CSV file for further analysis.
    
    :param img_dataset_folder: 
        - Type: str
        - Description: The path to the folder containing the dataset images. This folder is used to locate the 
                       corresponding results and corrections.
    :param yolo_model_folder: 
        - Type: str
        - Description: The path to the folder containing the YOLO model and its associated data. This folder 
                       is used to locate labels and other files related to the model.
    :param all_results: 
        - Type: bool
        - Description: A flag indicating whether to include all results or only those not used during training. 
                       If `True`, it includes all results for evaluation. If `False`, it excludes training images 
                       to focus only on non-training data.
    
    :return: 
        - Type: None
        - Description: This function does not return a value. It generates a CSV file containing the evaluation 
                       results and saves it in the appropriate results folder.
    """

    results_folder = get_results_folder(yolo_model_folder, img_dataset_folder)
    labels = get_labels(os.path.join(yolo_model_folder, 'labels.txt'))

    prediction_folder = os.path.join(results_folder, 'labels')
    predictions_files = get_files(prediction_folder, 'txt')

    correction_folder = os.path.join(results_folder, 'correctedLabels')
    corrected_files = get_files(correction_folder, 'txt')

    if not all_results:
        output_file = os.path.join(results_folder, 'results/results_for_evaluation.csv')
        img_use_for_training = get_img_from_training(yolo_model_folder, img_dataset_folder)
        predictions_files = exclude_training_images(predictions_files, img_use_for_training)
        corrected_files = exclude_training_images(corrected_files, img_use_for_training)
    else:
        output_file = os.path.join(results_folder, 'results/results_for_evaluation.csv')

    rows = []

    pred_map = {os.path.basename(path): path for path in predictions_files}
    corr_map = {os.path.basename(path): path for path in corrected_files}
    
    
    # Browse through all the predictions
    for basename, pred_path in pred_map.items():
        # Retrieve the correction file if it exists
        corr_path = corr_map.get(basename)
     
        if corr_path:
            predictions = sorted(load_data_from_files([pred_file]), key=lambda x: (float(x.split(" ")[1]), float(x.split()[2])))
            corrections = sorted(load_data_from_files(matching_corr_files), key=lambda x: (float(x.split(" ")[1]), float(x.split()[2])))
            best_matches = get_best_iou_matches(predictions, corrections)

            for prediction, best_correction, best_iou in best_matches:
                pred_box = list(map(float, prediction.split(" ")))
                cls_pred = int(pred_box[0])
                cls_corr = int(best_correction.split(" ")[0])

                if best_iou >= 0.5 and cls_pred == cls_corr:
                    tp_fp_fn = 'TP'
                elif best_iou >= 0.75 and cls_pred != cls_corr:
                    tp_fp_fn = 'FP_class'
                else:
                    tp_fp_fn = 'FP'

                if tp_fp_fn == 'FP':
                    rows.append({
                    'Filename': basename,
                    'Predicted_coordinates': ', '.join(map(str, pred_box)),
                    'Predicted_class': get_class_name(str(cls_pred), labels),
                    'TP/FP/FN': tp_fp_fn,
                    'cls_corr': '',
                    'Corrected_coordinates': '',
                    'IoU': '',
                    'Confidence_score': pred_box[5] if len(pred_box) > 5 else None
                })
                else: 
                    rows.append({
                        'Filename': basename,
                        'Predicted_coordinates': ', '.join(map(str, pred_box)),
                        'Predicted_class': get_class_name(str(cls_pred), labels),
                        'TP/FP/FN': tp_fp_fn,
                        'cls_corr': get_class_name(str(cls_corr), labels),
                        'Corrected_coordinates': best_correction,
                        'IoU': best_iou,
                        'Confidence_score': pred_box[5] if len(pred_box) > 5 else None
                    })
    
            matched_corrs = {c for _, c, _ in best_matches}
            for corr in corrections:
                if corr not in matched_corrs:
                    box_corr = list(map(float, unmatched_correction.split(" ")))
                    cls_corr = int(box_corr[0])
                    rows.append({
                        'Filename': basename,
                        'Predicted_coordinates': '',
                        'Predicted_class': '',
                        'TP/FP/FN': 'FN',
                        'cls_corr': get_class_name(str(cls_corr), labels),
                        'Corrected_coordinates': ', '.join(map(str, box_corr)),
                        'IoU': 0,
                        'Confidence_score': 0
                    })

        else:
            # No correction file at all → all predictions can be considered FP
            predictions = load_data_from_files([pred_path])
            for pred in predictions:
                box = list(map(float, pred.split()))
                cls = int(box[0])
                rows.append({
                    'Filename': basename,
                    'Predicted_coordinates': ', '.join(map(str, box)),
                    'Predicted_class': get_class_name(str(cls), labels),
                    'TP/FP/FN': 'FP',
                    'Corrected_class': '',
                    'Corrected_coordinates': '',
                    'IoU': '',
                    'Confidence_score': box[5] if len(box) > 5 else None
                })
    # Process *orphan* corrections (without associated predictions)
    for basename, corr_path in corr_map.items():
        if basename not in pred_map:
            corrections = load_data_from_files([corr_path])
            for corr in corrections:
                box = list(map(float, corr.split()))
                cls = int(box[0])
                rows.append({
                    'Filename': basename,
                    'Predicted_coordinates': '',
                    'Predicted_class': '',
                    'TP/FP/FN': 'FN',
                    'Corrected_class': get_class_name(str(cls), labels),
                    'Corrected_coordinates': ', '.join(map(str, box)),
                    'IoU': 0,
                    'Confidence_score': 0
                })
    
    save_results_to_csv(rows, output_file)

## Get metrics

In [68]:
def reformated_decimal(tp, fp, fn, recall_class, precision_class, f1_score_class):
    """
    This function formats the values of recall, precision, and F1 score to a consistent number of decimal places 
    based on the length of the input values for True Positives (TP), False Positives (FP), and False Negatives (FN).
    
    :param tp: 
        - Type: int
        - Description: The number of True Positives (TP) for a given class.
    :param fp: 
        - Type: int
        - Description: The number of False Positives (FP) for a given class.
    :param fn: 
        - Type: int
        - Description: The number of False Negatives (FN) for a given class.
    :param recall_class: 
        - Type: float
        - Description: The recall value for the class, calculated as `TP / (TP + FN)`.
    :param precision_class: 
        - Type: float
        - Description: The precision value for the class, calculated as `TP / (TP + FP)`.
    :param f1_score_class: 
        - Type: float
        - Description: The F1 score value for the class, calculated as `2 * (precision * recall) / (precision + recall)`.
    
    :return: 
        - Type: tuple of str
        - Description: Returns a tuple containing the formatted values for recall, precision, and F1 score, 
                       with a consistent number of decimal places based on the length of the input values.
    """

    # Establish the number of decimal places to use for formatting
    tp_len = len(str(tp)) if len(str(tp)) > 0 else 0
    fp_len = len(str(fp)) if len(str(fp)) > 0 else 0
    fn_len= len(str(fn)) if len(str(fn)) > 0 else 0
    max_decimal = max(tp_len, fp_len, fn_len)
        
    max_decimal += 1

    # Ensure that the results are displayed consistently
    recall_formated = "{:.{}f}".format(recall_class, max_decimal)
    precision_formated = "{:.{}f}".format(precision_class, max_decimal)
    f1_score_formated = "{:.{}f}".format(f1_score_class, max_decimal)
    
    return recall_formated, precision_formated, f1_score_formated

In [69]:
def get_txt_results(img_dataset_folder):
    
    """
    This function generates a text file and a PNG file summarizing the evaluation results of model predictions. 
    It processes a CSV file containing the previously generated statistics, calculates metrics such as recall, precision, 
    and F1 score for each class, and saves the results in a `.txt` file. A visual summary of the results is also saved 
    as a `.png` file.
    
    :param img_dataset_folder: 
        - Type: str
        - Description: The path to the folder containing the dataset images. This folder is used to locate the 
                       corresponding results and statistics CSV file.
    
    :return: 
        - Type: None
        - Description: This function does not return a value. It generates a `.txt` file with a summary of evaluation 
                       results and a `.png` file with a table displaying the calculated metrics.
    """
    
    results_folder = get_results_folder(yolo_model_folder, img_dataset_folder)
    csv_with_results = os.path.join(results_folder, 'results/results_for_evaluation.csv')
 
    df = pd.read_csv(csv_with_results, sep=';')
    output_file = csv_with_results.replace('.csv', '.txt')
    output_png = csv_with_results.replace('.csv', '.png')
    
    # Collect all unique classes present in the DataFrame
    all_classes = np.unique(np.concatenate([df['Predicted_class'].dropna().unique(), df['Corrected_class'].dropna().unique()]))
    print(f'Classes : {all_classes}')
 
    table_data = []
 
    # Initialize TP, FP and FN counters for all classes
    class_TP = {classe: 0 for classe in all_classes}
    class_FP = {classe: 0 for classe in all_classes}
    class_FN = {classe: 0 for classe in all_classes}
 
    # Browse DataFrame rows
    for _, row in df.iterrows():
        pred_class = row['Predicted_class']
        corr_class = row['Corrected_class']
        # Check TP/FP/FN status for line
        if row['TP/FP/FN'] == 'TP':
            class_TP[pred_class] += 1
        elif row['TP/FP/FN'] == 'FP':
            class_FP[pred_class] += 1
        elif row['TP/FP/FN'] == 'FN':
            class_FN[corr_class] += 1
        elif row['TP/FP/FN'] == 'FP_class':
            class_FN[corr_class] += 1
            class_FP[pred_class] += 1
            
 
    # Calculate global totals
    total_TP = sum(class_TP.values())
    total_FP = sum(class_FP.values())
    total_FN = sum(class_FN.values())
    total_support = total_TP + total_FN
 
    # Recall computation
    recall = total_TP / (total_TP + total_FN) if (total_TP + total_FN) != 0 else 0
 
    # Precision computation
    precision = total_TP / (total_TP + total_FP) if(total_TP + total_FP) != 0 else 0
 
    # Calculation of the overall F1 score
    f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) != 0 else 0
 
    # Open the file in write mode
    with open(output_file, 'w') as file:
        # Écrire les résultats globaux
        file.write("Overall results :\n")
        file.write("Number of TP: {}\n".format(total_TP))
        file.write("Number of FP : {}\n".format(total_FP))
        file.write("Number of FN: {}\n".format(total_FN))
        file.write("Recall (Recall) : {}\n".format(recall))
        file.write("Precision : {}\n".format(precision))
        file.write("Score F1 global : {}\n".format(f1_score))
        file.write(f"Support : {total_support}\n")
        file.write("\n")
 
        # Write results by class
        file.write("Results per class :\n")
        for classe in all_classes:
            tp = class_TP[classe]
            fp = class_FP[classe]
            fn = class_FN[classe]
            support = tp + fn
 
            recall_class = tp / (tp + fn) if (tp + fn) != 0 else 0
            precision_class = tp / (tp + fp) if (tp + fp) != 0 else 0
            f1_score_class = 2 * (precision_class * recall_class) / (precision_class + recall_class) if (precision_class + recall_class) != 0 else 0
            
            recall_formated, precision_formated, f1_score_formated = reformated_decimal(tp, fp, fn, recall_class, precision_class, f1_score_class)
            table_data.append([classe, tp, fp, fn, precision_formated, recall_formated, f1_score_formated, support])
            
            
            file.write("Class {}\n".format(classe))
            file.write("Number of TP: {}\n".format(tp))
            file.write("Number of FP : {}\n".format(fp))
            file.write("Number of FN: {}\n".format(fn))
            file.write("Recall (Recall): {}\n".format(recall_class))
            file.write("Precision : {}\n".format(precision_class))
            file.write("Score F1 : {}\n".format(f1_score_class))
            file.write(f"Support : {total_support}\n")
            file.write("\n")
 
    print(f"The {output_file} file has been created.")

    recall_formated, precision_formated, f1_score_formated = reformated_decimal(total_TP, total_FP, total_FN, recall, precision, f1_score)
    table_data.append(['Overall', total_TP, total_FP, total_FN, precision_formated, recall_formated, f1_score_formated, total_support])
    
    # Generate a PNG file with a table
    fig, ax = plt.subplots()
    ax.axis('off')
    ax.axis('tight')
    
    table = ax.table(cellText=table_data, colLabels=['Classes', 'Nb TP', 'Nb FP', 'Nb FN', 'Precision', 'Rappel', 'Score F1', 'Support'],
                     loc='center', cellLoc='center')
    
    table.auto_set_font_size(False)
    table.set_fontsize(10)
    table.auto_set_column_width([0, 1, 2, 3, 4, 5, 6,7])
    
    plt.savefig(output_png, bbox_inches='tight')
    plt.show()
 
    print(f"The {output_png} file has been created.")

## Print the confusion matrix

In [70]:
def create_confusion_matrix(img_dataset_folder, yolo_model_folder):
    """
    This function generates a confusion matrix to evaluate the performance and robustness of a trained YOLO model. 
    The matrix is created by comparing the predicted classes with the corrected (ground truth) classes, and the 
    result is visualized as an image.
    
    :param img_dataset_folder: 
        - Type: str
        - Description: The path to the folder containing the dataset images. This folder is used to locate 
                       the corresponding results and statistics CSV file.
    :param yolo_model_folder: 
        - Type: str
        - Description: The path to the folder containing the YOLO model and its associated data. This folder is 
                       used to access the labels file and the results CSV file.
    
    :return: 
        - Type: None
        - Description: This function does not return a value. It generates a confusion matrix image and saves it 
                       in the results folder for easy visualization.
    
    The confusion matrix is displayed using a heatmap format, and the resulting image is saved in the results folder.
    """


    results_folder = get_results_folder(yolo_model_folder, img_dataset_folder)
    csv_with_results  = os.path.join(results_folder, 'results/results_for_evaluation.csv')

    # Load labels from the labels file
    labels = get_labels(os.path.join(yolo_model_folder, 'labels.txt'))
    display_labels=list(labels.values())
    
    # Add the 'Background' class used for file the NaN results
    display_labels.append('Background')
    
    # Open the csv with results
    results = pd.read_csv(csv_with_results, sep=';')

    # Replace the NaN results with 'Background', the class will be used to show the FP and FN
    predictions = results['Predicted_class'].fillna('Background')
    corrections = results['Corrected_class'].fillna('Background')

    # Create the confusion matrix
    confusion_matix = metrics.confusion_matrix(y_pred=predictions, y_true=corrections, labels=display_labels)
    # print(confusion_matix)

    # To create a more interpretable visual display we need to convert the table into a confusion matrix display
    cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix=confusion_matix, display_labels=display_labels)

    fig, ax = plt.subplots(figsize=(10, 8))
    
    # Costumizing and visualizing the display with rotation of x-axis labels
    cm_display.plot(ax=ax, xticks_rotation=90, cmap='autumn', values_format='d')

    plt.title('Confusion matrice')

    plt.tight_layout()
    plt.savefig(os.path.join(results_folder, 'results','confusionMatrice.png'))
    plt.show()

## Processing

In [None]:
img_dataset_folder = 'ABSPATHTOTHEFOLDER' # to be changed, asbolute path to a folder with images only, without annotations.
yolo_model_folder = 'ABSPATHTOTHEMODELFOLDER' # to be changed, asbolute path to the folder with the training data

In [None]:
# Generate the corrected files in YOLO format
get_corrected_label_files(img_dataset_folder, yolo_model_folder)

In [None]:
# Generate a CSV with the corrected data
get_csv_results(img_dataset_folder, yolo_model_folder, all_results=False)

In [None]:
# Generate the file with metrics
get_txt_results(img_dataset_folder)

In [None]:
# Generate the confusion matrix
create_confusion_matrix(img_dataset_folder, yolo_model_folder)