___Wilcoxon Test___

Wilcoxon Test comparing Asym_Comp&ABR_Generator.ipynb(Model A) against Naive_Asym_Generator.ipynb(Model B). Displaying that Model B is statistically significant and better in determining cancer than Model A

In [2]:
from scipy.stats import wilcoxon
import pandas as pd

# Metrics from Model A
model_a = {
    "Accuracy": [0.4892, 0.5252, 0.5036, 0.5540, 0.5870, 0.5290, 0.4855, 0.4855],
    "Precision": [0.0000, 0.0000, 0.0000, 0.4333, 0.6667, 0.0000, 0.0000, 0.3158],
    "Recall": [0.0000, 0.0000, 0.0000, 0.2241, 0.0678, 0.0000, 0.0000, 0.0938],
    "F1 Score": [0.0000, 0.0000, 0.0000, 0.2955, 0.1231, 0.0000, 0.0000, 0.1446]
}

# Metrics from Model B
model_b = {
    "Accuracy": [0.5540, 0.5396, 0.5971, 0.5755, 0.6043, 0.6304, 0.5942, 0.6159],
    "Precision": [0.5000, 0.5882, 0.5714, 0.5217, 0.5714, 0.6000, 0.5362, 0.5714],
    "Recall": [0.6452, 0.5263, 0.7059, 0.5806, 0.6154, 0.6462, 0.6066, 0.6349],
    "F1 Score": [0.5634, 0.5556, 0.6316, 0.5496, 0.5926, 0.6222, 0.5692, 0.6015]
}

# Run Wilcoxon test for each metric
results = {}
for metric in model_a:
    stat, pval = wilcoxon(model_a[metric], model_b[metric], alternative='less')
    results[metric] = {"Wilcoxon Statistic": stat, "p-value": pval}

# Convert to DataFrame for display
results_df = pd.DataFrame(results).T

results_df

Unnamed: 0,Wilcoxon Statistic,p-value
Accuracy,0.0,0.003906
Precision,2.0,0.011719
Recall,0.0,0.003906
F1 Score,0.0,0.003906


__1.1 Multiple Lesions and the Impact on Technical Implementation of Asymmetry:__


Many asymmetry algorithms in dermoscopic image analysis rely on the lesion being centered within the image to enable symmetry assessment along a fixed axis or axes [1]. This approach was initially implemented in Asym_Comp&ABR_Generator.ipynb. In this method for every image, the number of lesions (components) was computed. Each lesion in the image was then recentered in its own binary image then axis-based reflections were performed to assess asymmetry. The individual asymmetry scores of each lesion in the image were then summed and averaged to give a final asymmetry score for the image. This aligns with standard practices in dermoscopy literature [2]. See appendix A for an example.
Additionally, we introduced a relative bias weight scoring, inspired by the observation that a cluster of components localized within a small spatial range may indicate irregular growth. To evaluate the features, a model was trained independently within our model framework. The seed code of 413316891 was used to evaluate the efficacy of the features.

![https://github.com/BossThePro/2025-FYP-groupKangaroo/tree/main/data/asymmetry/Wilcoxon%20Images]https://github.com/BossThePro/2025-FYP-groupKangaroo/blob/main/data/asymmetry/Wilcoxon%20Images/Screenshot%202025-05-27%20143835.png
Fig 1.1 Fold 5 from seed code 413316891 evaluating the features developed from Asym_Comp&ABR_Generator.ipynb

Given how poorly asymmetry performed in the model from this method, we decided on a far more naive approach. Rather than focusing on centroid and lesions themselves, asymmetry would now operate on the skin itself i.e. the whole image. The scope shifted from discriminating individual lesions asymmetry and averaging; to is the skin of this region asymmetrical. An extremely naive model not supported by literature. With the same seed number of 413316891 the results performed significantly better on the same instance of data.

![image]https://github.com/BossThePro/2025-FYP-groupKangaroo/blob/main/data/asymmetry/Wilcoxon%20Images/Screenshot%202025-05-27%20143744.png
Fig 1.2 Fold 5 from seed code 413316891 evaluating the features developed from Naive_Asym_Generator.ipynb

To test the significance of the features increased performance a Wilcoxon Test was implemented with a hypothesis test of: 
H_0 = Model A (Asym_Comp&ABR_Generator.ipynb)  = Model B (Naive_Asym_Generator.ipynb)
H_A = Model A (Asym_Comp&ABR_Generator.ipynb) < Model B (Naive_Asym_Generator.ipynb)
Across all 8 folds we saw statistical significance for all at alpha 0.05. Even more so there is strong statistical significance for Accuracy, Recall and F1 Score at alpha 0.01. Thus the features are statistically more significant in diagnosing cancer. 

![image](https://github.com/BossThePro/2025-FYP-groupKangaroo/blob/main/data/asymmetry/Wilcoxon%20Images/Screenshot%202025-05-27%20153403.png)
Figure 1.3 Showing the results of the Wilcoxon Test

While performing better the multicollinearity that exists in this model should be viewed with scepticism with a 0.9, Pearson's R relationship between ASI and asymmetry score. Essentially because they are computed in nearly the same method, an error conducted in methods that should be avoided in future.
With multiple lesions, the definition of what is considered irregular becomes ambiguous. The sensitivity of asymmetry alone is not high enough for diagnosis, some melanomas can be symmetrical, and some benign lesions can be asymmetrical [3]. This is reflected in our data set with Figure 1.4 being a perfect example. 


![image](https://github.com/BossThePro/2025-FYP-groupKangaroo/blob/main/data/asymmetry/Wilcoxon%20Images/PAT_87_133_391.png)![image](https://github.com/BossThePro/2025-FYP-groupKangaroo/blob/main/data/asymmetry/Wilcoxon%20Images/PAT_87_133_391_mask.png)
Figure 1.4 PAT_87_133_391.png showing a benign lesion 

The computed asymmetry of Figure 1.4 is high in both feature extractions. With an overall mean asymmetry score of 0.6380 in Asym_Comp&ABR_Generator.ipynb and a mean asymmetry score of 0.8039 with an ASI of 90.47% Naive_Asym_Generator.ipynb. Thus reinforcing that asymmetry alone is a poor feature without anything to mediate it.
What should be explored in future is how clustering of lesions as well as scale can be utilised as features. This could potentially add more contextual information that helps with asymmetries sensitivities.


References:
1. Clawson, K. M., Morrow, P. J., Scotney, B. W., McKenna, D. J., & Dolan, O. M. (2007). Determination of optimal axes for skin lesion asymmetry quantification. In Proceedings of the 2007 IEEE International Conference on Image Processing (Vol. 2, pp. II-453–II-456)https://www.researchgate.net/publication/4288907_Determination_of_Optimal_Axes_for_Skin_Lesion_Asymmetry_Quantification 

2. Ali, A.-R., Li, J., & O’Shea, S. J. (2020). Towards the automatic detection of skin lesion shape asymmetry, color variegation and diameter in dermoscopic images. PLOS ONE, 15(6), e0234352. https://doi.org/10.1371/journal.pone.0234352

3. Abbasi, N. R., Shaw, H. M., Rigel, D. S., Friedman, R. J., McCarthy, W. H., Osman, I., Kopf, A. W., & Polsky, D. (2004). Early diagnosis of cutaneous melanoma: Revisiting the ABCD criteria. JAMA, 292(22), 2771–2776. https://doi.org/10.1001/jama.292.22.2771
