# Evaluation of different open-source codes

This file contains the code used to evaluate three open-source FCD detection codes: MS-DSA-NET, deepFCD, and MELD classifier. (TODO include references)

## Evaluations

Many ways exists to evaluate such codes, but it has been chosen to evaluate them on three levels: patient, region and pixel.

### Patient-level

It will be a binary evaluation, wheter it is correctly categorised or not (patient being patient, control being control).

### Region-level

It will also be a binary evaluation, wheter the prediction is in the right area of the brain (frontal lobe and right hemisphere)

### Pixel-level

Sensitivity, sensibility, accuracy, precision and the Dice coefficient will be used to evaluate the prediction pixel per pixel.

## Data used

This evaluation is made on the benchmark available on [OpenNeuro](https://openneuro.org/datasets/ds004199/versions/1.0.6). It contains 170 MRI scans, each containing both T1-weighted and FLAIR images, for 85 patients and 85 controls.

For each patients, complementary informations are provided (sex, age at scan, 1 year outcome, hemisphere, lobe...) as shown with the code box below.


In [None]:
import pandas as pd
df = pd.read_csv('../../Benchmarks/OpenNeuro/subjects/participants.tsv', sep='\t', index_col="participant_id")
df.loc[:, ["group", "sex", "age_scan", "hemisphere", "lobe", "1year_outcome"]]

The repartition within the benchmark is as followed.

`age_scan` values are grouped within 5 years gap:
| age_scan    | interval    |
| ----------- | ----------- |
| 1           | 0 - 5 yo    |
| 2           | 6 - 10 yo   |
| 3           | 11 - 15 yo  |
| ...         | ...         |
| 11          | 50 - 55 yo  |
| 12          | 56 - 60 yo  |
| 13          | 61 - 65 yo  |

In [None]:
print(df.loc[:, ["group"]].value_counts())

print(df.loc[:, ["sex"]].value_counts())

print(df.loc[:, ["age_scan"]].value_counts().sort_index())

`hemisphere` and `lobe` abbreviations are:
| abbreviation  | meaning           |
| ------------- | ---------         |
| L             | left              |
| R             |  right            |
| FL            | frontal lobe      |
| TL            | temporal lobe     |
| PL            | paretal lobe      |
| OL            | occipital lobe    |
| IL            | inusular lobe     |

In [None]:
print(df.loc[:, ["hemisphere"]].value_counts())

print(df.loc[:, ["lobe"]].value_counts())

# MS-DSA-NET

In [None]:
import glob
import os

input_dir = "/home/guenael/git/Stage/research_epilepsy/Code/MS-DSA-NET/inputs/fsl/"
output_dir = "/home/guenael/git/Stage/research_epilepsy/Code/MS-DSA-NET/outputs/"
date = "2025-06-26"

input_files = sorted(glob.glob(os.path.join(
    input_dir, "*/fl_roi_reg.nii.gz"), recursive=True))
output_files = sorted(glob.glob(os.path.join(
    output_dir + date, "*/t1_reg_seg.nii.gz"), recursive=True))

nb_ground_truth = len(input_files)
nb_predictions = len(output_files)

def fill_in_blanks(ground_truth_files, prediction_files):
    gt_filled = 0
    pred_filled = 0
    for i in range(max(len(ground_truth_files), len(prediction_files))):
        if i >= len(ground_truth_files):
            ground_truth_files.append("")
            gt_filled += 1
        elif i >= len(prediction_files):
            prediction_files.append("")
        else:
            gt_id = int(ground_truth_files[i].split('/')[-2].split('_')[-1])
            pred_id = int(prediction_files[i].split('/')[-2].split('_')[-1])
            if gt_id > pred_id:
                ground_truth_files.insert(i, "")
            elif pred_id > gt_id:
                prediction_files.insert(i, "")
    print(gt_filled, "blank gt files added and", pred_filled, "blank pred files added.")

fill_in_blanks(input_files, output_files)
print("Before filling:")
print(nb_ground_truth, "input files and ", nb_predictions, "output files.")
print("After filling:")
print(len(input_files), "input files and ", len(output_files), "output files.")

## Patient-level

Each predicted mask is transformed to binary matrix which sum is then compared to 0, being the sum of an empty mask. If the sum:
- is equal to 0, then predicted as control
- is more than 0, then predicted as patient

In [None]:
from scripts.patient_level import get_patient_results

patient_detected, patient_forgotten, control_detected, control_forgotten = get_patient_results(df, output_files)

In [None]:
print("Patient: ", patient_detected, '/', patient_detected+len(patient_forgotten),"| Control: ", control_detected, '/', control_detected+len(control_forgotten), "| ", control_detected + patient_detected, "/", control_detected + patient_detected+len(patient_forgotten)+len(control_forgotten))
print(patient_forgotten)
print(control_forgotten)


## Region-level

Each predicted mask region is determined with nibabel and nilearn where the prediction is compared with the Harvard-Oxford atlas in order to get the lobe and hemisphere.

They are then compared to the data from the benchmark.

In [None]:
from scripts.region_level import get_region_results, get_region_result_for_one_mask

control_lobe_situated, control_lobe_missed, control_hemisphere_situated, control_hemisphere_missed, patient_lobe_situated, patient_lobe_missed, patient_hemisphere_situated, patient_hemisphere_missed = get_region_results(df, output_files)

In [None]:
print("Patient lobe: ", patient_lobe_situated, '/', patient_lobe_situated+len(patient_lobe_missed))
print("Patient hemisphere: ", patient_hemisphere_situated, '/', patient_hemisphere_situated+len(patient_hemisphere_missed))
print(patient_lobe_missed)
print(patient_hemisphere_missed)

print("Control lobe: ", control_lobe_situated, '/', control_lobe_situated+len(control_lobe_missed))
print("Control hemisphere: ", control_hemisphere_situated, '/', control_hemisphere_situated+len(control_hemisphere_missed))
print(control_lobe_missed)
print(control_hemisphere_missed)

## Pixel-level

Each predicted mask is compared to the ground truth. Sensitivity, sensibility, dice coefficent, accuracy and precision is then returned.

This test in run only on patients' ground truths and predictions

In [None]:
from scripts.pixel_level import get_pixel_results

mean_sensitivity, mean_specificity, mean_dice, mean_precision, mean_accuracy = get_pixel_results(df, input_files, output_files)

In [None]:
print("Out of", nb_ground_truth,"patients:")
print("Average sensitivity\t", mean_sensitivity)
print("Average specificity\t", mean_specificity)
print("Average dice\t\t", mean_dice)
print("Average precision\t", mean_precision)
print("Average accuracy\t", mean_accuracy)