# Methods

## Datasets
<!-- need to revise the demographic information -->

We selected two datasets on OpenNeuro for the current analysis:
`ds000228` {cite:p}`ds000228:1.1.0` and `ds000030` {cite:p}`ds000030:1.0.0`.
Dataset `ds000228` ($N=155$) contains fMRI scans of participants watching a silent version of Pixar animated movie "Partly Cloudy".
The dataset includes 33 adult subjects
($Mean_{age}=24.8$, $SD_{age}=5.3$, $range_{age}: 18-39$; $n_{female}=20$)
and 122 children subjects
($Mean_{age}=6.7$, $SD_{age}=2.3$, $range_{age}: 3.5-12$; $n_{female}=64$)
For more information for the dataset please refers to {cite:t}`richardson_development_2018`.

In [1]:
import warnings

warnings.filterwarnings('ignore')
import pandas as pd
from myst_nb import glue

from fmriprep_denoise.visualization import tables

desc = tables.lazy_demographic('ds000228')
desc = desc.style.set_table_attributes('style="font-size: 10px"')

glue('ds000228_desc', desc)

---- repo2data starting ----
/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/repo2data
Config from file :
/home/runner/work/fmriprep-denoise-benchmark/fmriprep-denoise-benchmark/binder/data_requirement.json
Destination:
../../inputs/fmrieprep-denoise-paper

Info : Starting to download from zenodo 10.5281/zenodo.6941758 ...


KeyboardInterrupt: 

Dataset `ds000030` includes multiple tasks collected on subjects of a variety of neuropsychiatric diagnostics, including ADHD, bipolar disorder, schizophrenia , and healthy controls.
The current analysis only focused on the resting state scans.
Scans with an instrumental artifact (flagged under column `ghost_NoGhost` in `particiapnts.tsv`) were excluded from the analysis pipeline.
259 out of 272 subjects of were included in the benchmark.
The demographic information per condition is in {numref}`table-ds000030`.

```{table} Demographic information of ds000030
:name: table-ds000030
|                 | Full sample | Healthy control | Schizophrenia | Bipolar disorder |     ADHD    |
|----------------:|------------:|----------------:|--------------:|-----------------:|------------:|
|       N(female) |    259(108) |         120(56) |        50(12) |           49(21) |      40(19) |
| Age Mean(s.d.)  |   33.3(9.3) |      31.7 (8.8) |    36.5 (8.9) |       35.3 (9.0) | 32.1 (10.4) |
|       Age Range |      21--50 |          21--50 |        22--49 |           21--50 |      21--50 |
```

In [None]:
desc = tables.lazy_demographic('ds000030')
desc = desc.style.set_table_attributes('style="font-size: 10px"')

glue('ds000030_desc', desc)

## fMRI data preprocessing

We preprocessed with fMRIPrep LTS20.2.1 through [`fMRIPrep-slurm`](https://github.com/SIMEXP/fmriprep-slurm) with the following options:
```
--use-aroma \
--omp-nthreads 1 \
--nprocs 1 \
--random-seed 0  \
--output-spaces MNI152NLin2009cAsym MNI152NLin6Asym \
--output-layout bids \
--notrack \
--skip_bids_validation \
--write-graph \
--omp-nthreads 1 \
--nprocs 1 \
--resource-monitor
```

For the full description generated by fMRIPrep, please see [supplemental material](../supplementary_materials/CITATION.md).

## Time series extraction and connectome generation

We extract time series with regions of interest (ROI) defined by the following atlases:
Gordon atlas {cite:p}`gordon_atlas_2014`,
Schaefer 7 network atlas {cite:p}`schaefer_local-global_2017`,
Multiresolution Intrinsic Segmentation Template (MIST) {cite:p}`urchs_mist_2019`,
and Dictionary of Functional Modes (DiFuMo){cite:p}`difumo_2020`.
All atlases were resampled to the resolution of the preprocessed functional data.

Further ROI extraction was done on DiFuMo and MIST,
because area under the same label can be networks with disjointed regions.
We present the labels with the original number of parcel,
and denote the number of extracted ROI in brackets.
Gordon and Schaefer atlas both comes with parcels as isolated ROI hence were applied as it is in the extraction.
Schaefer 1000 parcels atlas was excluded as some regions would be dropped after resampling.

- Gordon atlas: 333
- Schaefer atlas: 100, 200, 300, 400, 500, 600, 800[^1]
- Multiresolution Intrinsic Segmentation Template (MIST) {cite:p}`urchs_mist_2019`: 7, 12, 20, 36, 64, 122, 197, 325, 444, "ROI" (210 parcels, 122 split by the midline)
- DiFuMo atlas {cite:p}`difumo_2020`: 64 (114), 128 (200), 256 (372), 512 (637), 1024 (1158)

Process involved here are implemented through nilearn {cite:p}`nilearn`.
Time series were extracted using `nilearn.maskers.NiftiLabelsMasker` and `nilearn.maskers.NiftiMapsMasker`.
Connectomes were calculated using Pearson's Correlations, implemented through `nilearn.connectome.ConnectivityMeasure`.

## Confound regression strategies

Confound variables were retrieved using API
`nilearn.interfaces.fmriprep.load_confounds` (simplified as `load_confounds`),
the basic API that retrieves different classes of confound regressor,
and `nilearn.interfaces.fmriprep.load_confounds_strategy`(simplified as `load_confounds_strategy`),
a higher level wrapper to implement common strategies from the denoising literature.
The current section describes the logic behind the design of API.
For documentation of the actual function, please see the latest version of `nilearn`.

### Basic noise components

To enable easy confound variables loading from fMRIPrep outputs,
`load_confounds` provides an interface that groups subsets of confound variables into noise components and their parameters.
It is possible to fine-tune a subset of noise components and their parameters through this function.
The implementation will only support fMRIPrep functional derivative directory from the 1.2.x series. The compcor noise component requires 1.4.x series or above.

<!-- Explain the logic of nilearn API mirror the intro -->
Two types of regressors are always loaded with no additional parameters for user customisation:

- `high_pass`: discrete cosines transformation basis regressors to handle low-frequency signal drifts.
- `non_steady_state` denotes volumes collected before the fMRI scanner has reached a stable state.

`motion`, `wm_csf`, and `global_signal` shares similar expansion options:

- `motion`: head motion estimates translation/rotation (6 parameters).
- `wm_csf`: average signal extracted from masks of white matter and cerebrospinal fluids (2 parameters).
- `global_signal`: average signal extracted from brain mask (1 parameters).

For these three parameters above, user can select from the following four options:
- `basic`: just the original signal (n parameter)
- `power2`: original signal and quadratic term (2 * n parameters)
- `derivatives`: original signal and temporal derivative (2 * n parameters)
- `full`:  original signal + temporal derivatives + quadratic terms + quadratic terms temporal derivatives (4 * n parameters)


`scrub` generates mask to exclude volumes with excessive motion {cite:p}`power_scrubbing_2012`.
Two types of parameters can be used to determined volumes to be excluded.
- `fd_threshold`: set the head motion cut-off value determined by framewise displacement approach {cite:p}`power_scrubbing_2012`.
- `std_dvars_threshold`: set the head motion cut-off value determined by the standard deviation of root mean square approach {cite:p}`power_scrubbing_2012,jenkinson_2002`.

The CompCor {cite:p}`behzadi_compcor_2007` approach has two associated parameters
- `compcor` allows users to select components generated by the temporal approach,
    or the anatomical approach with specific details for the mask used in noise signal extraction.
- `n_compcor` retrieves the number of principle components to retrieve.

For the ICA-based approach, fMRIPrep implemented ICA-AROMA {cite:p}`aroma`.
User must manually enable ICA-AROMA with flag `--use-aroma` when using fMRIPrep.
The parameter `ica_aroma` allows two approaches:
1. Use fMRIPrep output with suffix `desc-smoothAROMAnonaggr_bold.nii.gz`.
2. Use noise independent components only. Must be used with output with suffix `desc-preproc_bold.nii.gz`.

### Pre-defined strategies

`load_confounds_strategy` provides an interface to select confounds based on past literature with limited parameters for user customisation:
`simple` {cite:p}`fox_pnas_2005` (motion parameters, and tissue signal),
`scrubbing` {cite:p}`power_scrubbing_2012`(volume censoring, motion parameters, and tissue signal),
`compcor` {cite:p}`behzadi_compcor_2007`(anatomical compcor and motion parameters),
and `aroma` {cite:p}`aroma`(ICA-AROMA based denoising, and tissue signal).
All strategies but `compcor` provides an option to add global signal to the confound regressors.

### Examined strategies

We evaluated common confound regression strategies that are possible through fMRIPrep generated confound regressors.
The connectome generated from high-pass filtered time series were served as a comparison baseline.
Confound variables were accessed using API `load_confounds_strategy`.
The detailed 11 strategies and a full breakdown of parameters used under the hood is presented in {numref}`table-strategies`.

:::{dropdown} Click to see {numref}`table-strategies`

```{table} Denoising strategies
:name: table-strategies
| strategy        | image                          | `high_pass` | `motion` | `wm_csf` | `global_signal` | `scrub` | `fd_thresh` | `compcor`       | `n_compcor` | `ica_aroma` | `demean` |
|-----------------|--------------------------------|-------------|----------|----------|-----------------|---------|-------------|-----------------|-------------|-------------|----------|
| baseline        | `desc-preproc_bold`            | `True`      | N/A      | N/A      | N/A             | N/A     | N/A         | N/A             | N/A         | N/A         | `True`   |
| simple          | `desc-preproc_bold`            | `True`      | `full`   | `basic`  | N/A             | N/A     | N/A         | N/A             | N/A         | N/A         | `True`   |
| simple+gsr      | `desc-preproc_bold`            | `True`      | `full`   | `basic`  | `basic`         | N/A     | N/A         | N/A             | N/A         | N/A         | `True`   |
| scrubbing.5     | `desc-preproc_bold`            | `True`      | `full`   | `full`   | N/A             | `5`     | `0.5`       | N/A             | N/A         | N/A         | `True`   |
| scrubbing.5+gsr | `desc-preproc_bold`            | `True`      | `full`   | `full`   | `basic`         | `5`     | `0.5`       | N/A             | N/A         | N/A         | `True`   |
| scrubbing.2     | `desc-preproc_bold`            | `True`      | `full`   | `full`   | N/A             | `5`     | `0.2`       | N/A             | N/A         | N/A         | `True`   |
| scrubbing.2+gsr | `desc-preproc_bold`            | `True`      | `full`   | `full`   | `basic`         | `5`     | `0.2`       | N/A             | N/A         | N/A         | `True`   |
| compcor         | `desc-preproc_bold`            | `True`      | `full`   | N/A      | N/A             | N/A     | N/A         | `anat_combined` | `all`       | N/A         | `True`   |
| compcor6        | `desc-preproc_bold`            | `True`      | `full`   | N/A      | N/A             | N/A     | N/A         | `anat_combined` | `6 `        | N/A         | `True`   |
| aroma           | `desc-smoothAROMAnonaggr_bold` | `True`      | N/A      | `basic`  | N/A             | N/A     | N/A         | N/A             | N/A         | `full`      | `True`   |
| aroma+gsr       | `desc-smoothAROMAnonaggr_bold` | `True`      | N/A      | `basic`  | `basic`         | N/A     | N/A         | N/A             | N/A         | `full`      | `True`   |
```
:::

## Denoising evaluation measures

We used selected metrics described in the previous literature to evaluate the denoising results
{cite:p}`ciric_benchmarking_2017,parkes_evaluation_2018`.
Motion related metrics are centred around framewise displacement.
Framewise displacement (FD) indexes the movement of the head from one volume to the next.
The movement includes the transitions on the three axes ($x$, $y$, $z$) and the respective rotation ($\alpha$, $\beta_t$, $\gamma$).
Rotational displacements are calculated as the displacement on the surface of a sphere of radius 50 mm {cite}`power_scrubbing_2012`.
fMRIPrep genetates the FD based on the formula proposed in {cite}`power_scrubbing_2012`.
The FD at each time point $t$ is expressed as:

$$
\text{FD}_t = |\Delta d_{x,t}| + |\Delta d_{y,t}| +
|\Delta d_{z,t}| + |\Delta \alpha_t| + |\Delta \beta_t| + |\Delta \gamma_t|
$$

The details of each measures are explained as followed.

### Quality control / functional connectivity (QC-FC)

QC-FC {cite:p}`power_recent_2015` quantifies the correlation between mean FD and functional connectivity.
This is calculated by a partial correlation between mean FD and connectivity with age and sex as covariates.
The denoising methods should aim to reduce the QC-FC value.
The significants values reported are control for multiple comparisons with false positive rate correction.

### Distance-dependent effects of motion on connectivity

To determine the residual distance-dependence of subject movement,
we first calculate the Euclidean distance between the centers of mass of ecah pair of parcels {cite:p}`power_scrubbing_2012`.
We then correlated the distance separating each pair of parcels and the associated QC-FC correlation of the edge connecting those parcels.
Closer parcels generally exhibiting greater impact of motion on connectivity.
We expect to see a general trend of negative correlation to no correlation after confound regression.

### Network modularity

Confound regressors has the potential to remove real signal in addition to motion-related noise.
In order to evaluate this possibility, we computed modularity quality,
an explicit quantification of the degree to which there are structured sub-networks in a given network,
in this case the de-noised connectome {cite:p}`satterthwaite_impact_2012`.
Modularity quality is quantified by graph community detection based on Louvain method {cite:p}`rubinov2010`,
implemented in the Brain Connectome Toolbox.
If confound regression and censoring were removing real signal in addition to motion-related noise,
we expect that modularity would decline.
To understand the extend of correlation between modularity and motion.
we computed the partial correlation between subjects' modularity values and mean FD,
with age and sex as covariates.

[^1]: When resampling 1000 parcel version of the Schaefer atlas to match the preprocessed data,
some subjects will miss a parcel.