(measurements:results)=
# Results

<hr>

## Two-sample KS Test

To first provide a statistical indication that the empirical distributions of the collected metrics differ across execution configurations, a two-sample Kolmogorov-Smirnov test was applied to compare both whole-brain and region-wise values and estimate the probability of them originating from the same probability distribution as the default configuration's empirical distribution function. The Bonferroni method was used to correct for multiple comparisons. {numref}`whole-brain-ks` contains a summary of the significant (corrected p-value is smaller than 0.05) whole-brain metrics:

In [18]:
import pandas as pd
from myst_nb import glue

whole_brain_ks = pd.read_csv("assets/whole-brain KS.csv").set_index(["Configuration", "Metric"])
whole_brain_ks_table = whole_brain_ks.style.format(
    formatter="{:3.3g}", subset=["Statistic", "p-value", "p-value (corrected)"]
# ).set_properties(
#     **{"text-align": "left"}, subset=["Configuration"]
).set_properties(
    **{"text-align": "center"}, subset=["Statistic", "p-value", "p-value (corrected)"]
).set_table_styles(
    [
        {
            "selector": "th", 
            "props": [('text-align', 'center')]
        }
    ]
).set_table_attributes(
    'style="margin-left: auto; margin-right: auto; font-size: 12px;"'
)
for i in whole_brain_ks.index:
    whole_brain_ks_table.set_table_styles({
        i: [{'selector': '', 'props': 'border-top: 1px solid black;'}]
    }, overwrite=False, axis=1)

glue("whole-brain-ks", whole_brain_ks_table, display=False)

alpha = 0.05

region_wise_ks = pd.read_csv("assets/region-wise KS.csv")
region_wise_counts = region_wise_ks.groupby(["Configuration", "Metric"]).size()
region_wise_counts.name = "# Regions"
region_wise_counts_table = region_wise_counts.to_frame().style.set_table_attributes(
    'style="margin-left: auto; margin-right: auto; font-size: 12px;"'
)
for i in region_wise_counts.index:
    region_wise_counts_table.set_table_styles({
        i: [{'selector': '', 'props': 'border-top: 1px solid black;'}]
    }, overwrite=False, axis=1)

glue("region-wise-counts", region_wise_counts_table, display=False)

```{glue:figure} whole-brain-ks
---
name: whole-brain-ks
align: center
figclass: small-table
---

Two-sample {{KS}} test results for significantly different ($\alpha=0.05$) whole-brain anatomical statistics distributions for various execution configurations, in comparison with the default configuration.
```

Applying the same test region-wise yields a total of 1184 comparisons per execution configuration (2 hemispheres * 74 atlas regions * 8 anatomical statistics), {numref}`region-wise-counts` contains a summary of the significant region counts per execution configuration.

```{glue:figure} region-wise-counts
---
name: region-wise-counts
align: center
figclass: small-table
---

Two-sample {{KS}} test counts of significantly different ($\alpha=0.05$) region-wise anatomical statistics distributions for various execution configurations, in comparison with the default configuration. Maximal number of regions is 148 (74 regions per hemisphere).
```

These results demonstrate the very large effect the *use_{{FLAIR}}* and *use_{{T2w}}* flags may have on elementary statistics commonly extracted with *FreeSurfer*, and the negligble effect the *{{MPRAGE}}* flag has. To better understand the differences between these execution configurations, changes in each significantly different anatomical statistic will be evaluated seperately. 

## Average Cortical Thickness

Average cortical thickness was calculated across all participants for each Destrieux atlas region and *FreeSurfer* execution configuration. {numref}`mean-thickness-pair-plot` shows the distributions and differences of mean average cortical thickness values per region. While both {{T2w}} and {{FLAIR}} are considered to improve pial surface estimation, which explains the large effect both have on the estimated cortical thickness, this figure demonstrates an opposite trend manifested by each; adding a {{T2w}} reference generally increases the estimated average thickness across almost all reigons, whereas a {{FLAIR}} reference generally decreases it, with a relatively uniform reduction across all average region thickness and an opposite trend for a large group of regions mostly around the center of the distribution. Using the *{{MPRAGE}}* flag hardly has any effect at all, as shown by the {{KS}} test in the previous section (see {numref}`whole-brain-ks` and {numref}`region-wise-counts`).

```{figure} ./assets/average_thickness_pairplot.png
---
name: mean-thickness-pair-plot
width: 100%
---

Mean estimated average thickness across participants per execution configuration. Green lines indicate an increase in value and red indicate decrease (from left to right).
```

The explore the distribution of these differences in the brain, the following two projections show the differences in mean region-wise average cortical thickness for the *T2* and *FLAIR* execution configurations.

```{figure} ./assets/FLAIR_thickness_diff.png
---
name: flair-thickness-diff
width: 100%
---
```

```{figure} ./assets/T2_thickness_diff.png
---
name: t2-thickness-diff
width: 100%
---
```

Other than the opposite directionality of the effect that has already been established in `numref`{mean-thickness-pair-plot}, the projection of the mean differences onto the cortical surface demonstrate an additional contradiction — the *FLAIR* configuration's general reduction in the average cortical thickness mostly effects gyri, whereas the increase with the *T2* configuration is mostly expressed is sulci.

## Within vs. Between-Participant Classification

### Cosine Similarity

```{figure} ./assets/cosine_similarity.png
---
name: cosine-similarity-plot
width: 100%
---
```

## Prediction of Participant Traits

### Sex

| Method | Execution Configuration | Model                      | Test Score |
|--------|-------------------------|----------------------------|:----------:|
| TPOT   | With T2                 | GradientBoostingClassifier |   0.9222   |
| TPOT   | DEFAULT                 | MLPClassifier              |   0.9167   |
| FLAML  | With FLAIR              | LogisticRegression         |   0.8519   |
| FLAML  | With T2                 | LogisticRegression         |   0.8519   |
| TPOT   | FLAIR                   | MLPClassifier              |    0.85    |
| FLAML  | DEFAULT                 | LGBMClassifier             |   0.7407   |

### Age

### BMI