## Non - Parametrics tests between different models

Nonparametric tests are statistical methods that do not assume a specific distribution (such as normality) for the data. These tests are especially useful when the data violates the assumptions required for parametric tests, or when dealing with ordinal data, ranks, or small sample sizes. Unlike parametric tests that rely on parameters like mean and variance, nonparametric tests often use medians or ranks to make inferences.

Some common nonparametric tests include:

- **Mann–Whitney U Test:** Compares differences between two independent groups when the dependent variable is ordinal or continuous but not normally distributed.

- **Wilcoxon Signed-Rank Test:** Tests differences between two related samples or matched pairs.

- **Kruskal–Wallis H Test:** An extension of the Mann–Whitney test for comparing more than two independent groups.

- **Spearman’s Rank Correlation:** Measures the strength and direction of association between two ranked variables.

- **Friedman Test:** Used for comparing more than two related groups.

For our research project , we are going to use friedman and wilcoxon test

In [1]:
####################################################################
#  Importing the required libraries for non - parametric tests
####################################################################

from scipy.stats import friedmanchisquare
from scipy.stats import wilcoxon

In [2]:
#####################################################################
# Collecting the MAE and RMSE metrics of models used in this paper
#####################################################################

mae_capri = [68.518, 68.962, 71.602, 69.133, 68.028]
rmse_capri = [108.121, 107.748, 108.867, 108.929, 106.493]

mae_cnn = [97.321, 96.105, 101.242, 120.492, 119.472]
rmse_cnn = [142.713, 141.306, 143.305, 154.623, 161.688]

mae_resnet = [115.5818, 130.4156, 89.3858, 116.2107, 125.8894]
rmse_resnet = [180.9482, 210.4882, 129.1232, 197.1785, 206.2745]

mae_squeezenet = [100.1274, 96.9600, 102.9326, 91.4517, 92.7629]
rmse_squeezenet = [145.5695, 144.9154, 148.7141, 140.0822, 141.6885]

In [3]:
#########################################################
# Evaluating the models using friedman
#########################################################

friedman_stat, friedman_p = friedmanchisquare(mae_capri, mae_cnn, mae_resnet, mae_squeezenet)
print(f"Friedman test statistic for MAE: {friedman_stat}, p-value: {friedman_p}")
friedman_stat1, friedman_p1 = friedmanchisquare(rmse_capri, rmse_cnn, rmse_resnet, rmse_squeezenet)
print(f"Friedman test statistic for RMSE: {friedman_stat1}, p-value: {friedman_p1}")


Friedman test statistic for MAE: 9.719999999999999, p-value: 0.021102512414100234
Friedman test statistic for RMSE: 10.679999999999993, p-value: 0.0135882729582177


 Statistical analysis using the Friedman test confirmed significant differences in model performance across the five folds (MAE: χ2 = 9.72, p = 0.0211; RMSE: χ2 = 14.04, p = 0.0029)

In [4]:
########################################################################
# The below method runs the wilcoxon test between two models
########################################################################

def run_wilcoxon_test(baseline, capri, metric_name):
    """
    Perform a Wilcoxon signed-rank test to compare CAPRI-CT results against a baseline.

    Args:
        baseline (array-like): Metric values from the baseline method.
        capri (array-like): Metric values from the CAPRI-CT method.
        metric_name (str): Name of the metric being compared (for printing purposes).

    Prints:
        The Wilcoxon test statistic, p-value, and whether CAPRI-CT is significantly better
        (alternative hypothesis: CAPRI-CT < Baseline).
    """
    
    stat, p = wilcoxon(capri, baseline, alternative='less')
    print(f"Wilcoxon test on {metric_name}: CAPRI-CT < Baseline => statistic = {stat}, p-value = {p:.4f}")
    if p < 0.05:
        print(f"Result: CAPRI-CT is significantly better in {metric_name} (p < 0.05)\n")
    else:
        print(f"Result: No significant difference in {metric_name} (p ≥ 0.05)\n")

In [5]:
######################################################################
# Evaluating Capri-ct vs CNN models
######################################################################

run_wilcoxon_test(mae_cnn, mae_capri, "Capri vs CNN")
run_wilcoxon_test(rmse_cnn, rmse_capri, "Capri vs CNN")

Wilcoxon test on Capri vs CNN: CAPRI-CT < Baseline => statistic = 0.0, p-value = 0.0312
Result: CAPRI-CT is significantly better in Capri vs CNN (p < 0.05)

Wilcoxon test on Capri vs CNN: CAPRI-CT < Baseline => statistic = 0.0, p-value = 0.0312
Result: CAPRI-CT is significantly better in Capri vs CNN (p < 0.05)



In [6]:
######################################################################
# Evaluating Capri-ct vs Resnet models
######################################################################

run_wilcoxon_test(mae_resnet, mae_capri, "Capri vs Resnet")
run_wilcoxon_test(rmse_resnet, rmse_capri, "Capri vs Resnet")

Wilcoxon test on Capri vs Resnet: CAPRI-CT < Baseline => statistic = 0.0, p-value = 0.0312
Result: CAPRI-CT is significantly better in Capri vs Resnet (p < 0.05)

Wilcoxon test on Capri vs Resnet: CAPRI-CT < Baseline => statistic = 0.0, p-value = 0.0312
Result: CAPRI-CT is significantly better in Capri vs Resnet (p < 0.05)



In [7]:
######################################################################
# Evaluating Capri-ct vs Squeezenet models
######################################################################

run_wilcoxon_test(mae_squeezenet, mae_capri, "Capri vs SqueezeNet")
run_wilcoxon_test(rmse_squeezenet, rmse_capri, "Capri vs SqueezeNet")

Wilcoxon test on Capri vs SqueezeNet: CAPRI-CT < Baseline => statistic = 0.0, p-value = 0.0312
Result: CAPRI-CT is significantly better in Capri vs SqueezeNet (p < 0.05)

Wilcoxon test on Capri vs SqueezeNet: CAPRI-CT < Baseline => statistic = 0.0, p-value = 0.0312
Result: CAPRI-CT is significantly better in Capri vs SqueezeNet (p < 0.05)



Wilcoxon signed-rank tests showed that CAPRI-CT significantly outperformed all the other baseline models!