- **Author:** [Dace Apšvalka](https://www.mrc-cbu.cam.ac.uk/people/dace.apsvalka/) 
- **Date:** August 2024  
- **conda environment**: I used the [fMRI workshop's conda environment](https://github.com/MRC-CBU/COGNESTIC/blob/main/mri_environment.yml) to run this notebook and any accompanied scripts.

# fMRI Data Quality Control and Pre-processing

--------------


**Table of contents**<a id='toc0_'></a>    
1. [Quality control with MRIQC](#toc1_)    
1.1. [MRIQC Participant level](#toc1_1_)    
1.2. [MRIQC Group level](#toc1_2_)    
1.3. [MRIQC output](#toc1_3_)    
2. [Preprocessing with fMRIPrep](#toc2_)    
2.1. [Preprocessing of structural MRI](#toc2_1_)    
2.2. [BOLD preprocessing](#toc2_2_)    
2.3. [fMRIPrep generated Methods section](#toc2_3_)    
2.4. [Example scipt to run fMRIPrep](#toc2_4_)    

<!-- vscode-jupyter-toc-config
	numbering=true
	anchor=true
	flat=true
	minLevel=2
	maxLevel=3
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

-----------

## 1. <a id='toc1_'></a>[Quality control with MRIQC](#toc0_)
Before we start to do anything with our data, we should check the acquisition quality. 

**MRIQC** extracts various [IQMs (image quality metrics)](https://mriqc.readthedocs.io/en/latest/measures.html) from structural (T1w and T2w) and functional MRI (magnetic resonance imaging) data.

See [MRIQC Documentation](https://mriqc.readthedocs.io/en/latest/).

MRIQC is a [BIDS-App](https://bids-apps.neuroimaging.io/), and therefore it inherently understands the BIDS standard and follows the BIDS-Apps standard command line interface: 

`mriqc bids-root/ output-folder/ participant`.


The most effective way to run MRIQC is through containerized versions, such as Docker or Singularity/Apptainer. Containers encapsulate the software and all its dependencies, including a minimal operating system, within a single comprehensive image. This ensures that when it's time to run the software, everything operates seamlessly. The use of containers enhances the shareability and portability of the software, leading to more reproducible outputs. At the CBU, we have access to Singularity/Apptainer, which is also available on HPC systems. Importantly, Singularity/Apptainer can utilize Docker images as well.

### 1.1. <a id='toc1_1_'></a>[MRIQC Participant level](#toc0_)

**Example generic script:** [code-examples/mriqc_script.sh(code-examples/mriqc_script.sh)]

The script in brief:

```bash
...

# ======================================================================
# MRIQC with Singularity
# ======================================================================
singularity run \
    -B "$PROJECT_PATH":/MyProject \
    /imaging/local/software/singularity_images/mriqc/mriqc-22.0.1.simg \
    /MyProject/data \
    /MyProject/data/derivatives/mriqc/ \
    --work-dir /MyProject/scratch/mriqc/"$subject" \
    participant \
    --participant-label "${subject#sub-}" \
    --float32 \
    --n_procs 16 --mem_gb 24 --ants-nthreads 16 \
    --modalities T1w bold \
    --no-sub

# EACH LINE EXPLINED:
# attaching our project directory to the Singularity
# the Singularity file
# our BIDS data directory
# output directory
# --work-dir: path where intermediate results should be stored
# analysis_level (participant or group)
# --participant-label: a list of participant identifiers
# --float32: cast the input data to float32 if it’s represented in higher precision (saves space and improves perfomance)
# --n_procs 16 --mem_gb 24 --ants-nthreads 16: options to handle performance
# --modalities: filter input dataset by MRI type
# --no-sub: turn off submission of anonymized quality metrics to MRIQC’s metrics repository
# ======================================================================

```

**Example script for processing multiple subjects using SLURM**: [code-examples/step05_mriqc_subjects.sh](code-examples/step05_mriqc_subjects.sh)


### 1.2. <a id='toc1_2_'></a>[MRIQC Group level](#toc0_)

The 'Goup' level just aggregates subject level reports and links them together.

**Example script:** [code-examples/step06_mriqc_group.sh](code-examples/step06_mriqc_group.sh)

```bash

PROJECT_PATH='FaceProcessing'

# ======================================================================
# MRIQC with Singularity
# ======================================================================
singularity run --cleanenv -B "$PROJECT_PATH":/"$PROJECT_PATH" \
    /imaging/local/software/singularity_images/mriqc/mriqc-22.0.1.simg \
    "$PROJECT_PATH"/data "$PROJECT_PATH"/data/derivatives/mriqc/ \
    --work-dir "$PROJECT_PATH"/work/mriqc/ \
    group \
    --float32 \
    --n_procs 16 --mem_gb 24 \
    --ants-nthreads 16 \
    --modalities T1w bold \
    --no-sub
```

### 1.3. <a id='toc1_3_'></a>[MRIQC output](#toc0_)

`MRIQC` output is [`BIDS` **derivative**](https://bids-specification.readthedocs.io/en/stable/05-derivatives/01-introduction.html). Derivatives are outputs of common processing pipelines, capturing data and meta-data sufficient for a researcher to understand and (critically) reuse those outputs in subsequent processing.

`MRIQC` outputs separate `MRIQC` reports for each individual run, as well as group reports. To have a quick look at the quality of the data acquired for your subjects, a good first start is to look at the group bold report to see if the image quality metrics show any outlier runs with respect to the quality of the data of your whole sample.

Here is an informative paper about MRI *carpet plots*: [Power, J. D. (2017). A simple but useful way to assess fMRI scan qualities. Neuroimage, 154, 150-158.](https://doi.org/10.1016/j.neuroimage.2016.08.009)

## 2. <a id='toc2_'></a>[Preprocessing with fMRIPrep](#toc0_)

`fMRIprep` ([A Robust Preprocessing Pipeline for fMRI Data](https://fmriprep.org/en/stable/) is another [BIDS-App](https://bids-apps.neuroimaging.io/). 

fMRIPrep is a fMRI data preprocessing pipeline that is designed to provide an easily accessible, state-of-the-art interface that is robust to variations in scan acquisition protocols and that requires **minimal user input**, while providing easily interpretable and comprehensive error and output reporting. 

The fMRIPrep pipeline uses a combination of tools from well-known software packages, including FSL, ANTs, FreeSurfer and AFNI. This pipeline was designed to **provide the best software implementation for each state of preprocessing**.

fMRIPrep performs **minimal preprocessing**: motion correction, field unwarping, normalization, bias field correction, and brain extraction. [See the workflows section of our documentation for more details](https://fmriprep.org/en/latest/workflows.html).

fMRIPrep adapts its pipeline depending on what data and metadata are available and are used as the input. For example, slice timing correction will be performed only if the `SliceTiming` metadata field is found for the input dataset.

### 2.1. <a id='toc2_1_'></a>[Preprocessing of structural MRI](#toc0_)

Steps: 
* Brain extraction (skull-stripping; helps with normalisation), 
* brain tissue segmentation (needed for normalisation) and 
* spatial normalisation.

#### Lesion masking during normalisation

When processing images from patients with focal brain lesions (e.g., stroke, tumor resection), it is possible to provide a lesion mask to be used during spatial normalization to standard space. The mask will be used to minimize warping of healthy tissue into damaged areas (or vice-versa). Lesion masks should be binary NIfTI images (damaged areas = 1, everywhere else = 0) in the same space and resolution as the T1 image, and follow the naming convention specified in [BIDS Extension Proposal 3: Common Derivatives](https://docs.google.com/document/d/1Wwc4A6Mow4ZPPszDIWfCUCRNstn7d_zzaWPcfcHmgI4/edit#heading=h.9146wuepclkt) (e.g., `sub-001_T1w_label-lesion_roi.nii.gz`). This file should be placed in the `sub-*/anat` directory of the BIDS dataset to be run through fMRIPrep. Because lesion masks are not currently part of the BIDS specification, it is also necessary to include a `.bidsignore` file in the root of your dataset directory. This will prevent bids-validator from complaining that your dataset is not valid BIDS, which prevents fMRIPrep from running. Your `.bidsignore` file should include the following line: `*lesion_roi.nii.gz`

#### Surface preprocessing
fMRIPrep uses [FreeSurfer](https://surfer.nmr.mgh.harvard.edu/) to reconstruct surfaces from T1w/T2w structural images. If enabled, several steps in the fMRIPrep pipeline are added or replaced. All surface preprocessing may be disabled with the `--fs-no-reconall` flag.

### 2.2. <a id='toc2_2_'></a>[BOLD preprocessing](#toc0_)

#### BOLD reference image estimation

The reference image is used to calculate a brain mask for the BOLD signal, estimate head-motion, and register BOLD to T1w.  

If a single-band reference (“sbref”) image associated with the BOLD series is available, then it is used directly. If not, a reference image is estimated from the BOLD series as follows: When T1-saturation effects (“dummy scans” or non-steady state volumes) are detected, they are averaged and used as reference due to their superior tissue contrast. Otherwise, a median of motion corrected subset of volumes is used.


#### Head-motion estimation

Using the previously estimated reference scan, FSL `mcflirt` is used to estimate head-motion. For a more accurate estimation of head-motion, the motion parameters are calculated before any time-domain filtering (i.e., slice-timing correction).

#### Slice time correction

If the `SliceTiming` field is available within the input dataset metadata, this workflow performs slice time correction prior to other signal resampling processes. Slice time correction is performed using AFNI `3dTShift`. All slices are realigned in time to the **middle of each TR**.

Slice time correction can be disabled with the `--ignore slicetiming` command line argument.

#### Susceptibility Distortion Correction

One of the major problems that affects EPI data is the spatial distortion caused by the inhomogeneity of the field inside the scanner. This step applies susceptibility-derived distortion correction, based on fieldmap estimation. 

#### Pre-processed BOLD in native space

A new *preproc* BOLD series is generated from the slice-timing corrected (or the original) data in the original space.

#### EPI to T1w registration

The alignment between the reference EPI image of each run and the reconstructed subject using the gray/white matter boundary is calculated by the `bbregister` routine. If FreeSurfer processing is disabled, FSL `flirt` is run with the BBR cost function.

#### Resampling BOLD runs to standard spaces

EPI image is mapped to the standard spaces given by the `--output-spaces` argument (see [Defining standard and nonstandard spaces where data will be resampled](https://fmriprep.org/en/latest/spaces.html#output-spaces)).

#### EPI sampled to FreeSurfer surfaces

If FreeSurfer processing is enabled, the motion-corrected functional series (after single shot resampling to T1w space) is sampled to the surface by averaging across the cortical ribbon.

Surfaces are generated for the “subject native” surface, as well as transformed to the `fsaverage` template space. All surface outputs are in GIFTI format.

#### Confounds estimation

Non-neuronal fluctuations in fMRI data may appear as a result of head motion, scanner noise, or physiological fluctuations (related to cardiac or respiratory effects). For a detailed review of the possible sources of noise in the BOLD signal, see [Greve et al. (2013)](https://doi.org/10.1007/s11336-012-9294-0). 

Given a motion-corrected fMRI, a brain mask, movement parameters and a segmentation, potential confounds per volume (time-point) are calculated. Confounding variables calculated in fMRIPrep are stored separately for each subject, session and run in `.tsv` files - one column for each confound variable. Such tabular files may include over 100 columns of potential confound regressors.

It is possible to minimize confounding effects of non-neuronal signals by including them as nuisance regressors in the GLM design matrix. The fMRIPrep pipeline generates a large array of possible confounds. The most well established confounding variables in neuroimaging are the six head-motion parameters (three rotations and three translations) - the common output of the head-motion correction (also known as realignment) of popular fMRI preprocessing software such as SPM or FSL. 

**Do not include all columns of `~_desc-confounds_timeseries.tsv` table into your design matrix! Filter the table first, to include only the confounds you want to remove from your fMRI signal.** [See the fMRIPrep confound regressor description](https://fmriprep.org/en/latest/outputs.html#confound-regressors-description). 


### 2.3. <a id='toc2_3_'></a>[fMRIPrep generated Methods section](#toc0_)

>Results included in this manuscript come from preprocessing
performed using *fMRIPrep* 21.0.1
(@fmriprep1; @fmriprep2; RRID:SCR_016216),
which is based on *Nipype* 1.6.1
(@nipype1; @nipype2; RRID:SCR_002502).



>Preprocessing of B<sub>0</sub> inhomogeneity mappings

>: A total of 1 fieldmaps were found available within the input
BIDS structure for this particular subject.
A *B<sub>0</sub>* nonuniformity map (or *fieldmap*) was estimated from the
phase-drift map(s) measure with two consecutive GRE (gradient-recalled echo)
acquisitions.
The corresponding phase-map(s) were phase-unwrapped with `prelude` (FSL 6.0.5.1:57b01774).

>Anatomical data preprocessing

>: A total of 1 T1-weighted (T1w) images were found within the input
BIDS dataset.The T1-weighted (T1w) image was corrected for intensity non-uniformity (INU)
with `N4BiasFieldCorrection` [@n4], distributed with ANTs 2.3.3 [@ants, RRID:SCR_004757], and used as T1w-reference throughout the workflow.
The T1w-reference was then skull-stripped with a *Nipype* implementation of
the `antsBrainExtraction.sh` workflow (from ANTs), using OASIS30ANTs
as target template.
Brain tissue segmentation of cerebrospinal fluid (CSF),
white-matter (WM) and gray-matter (GM) was performed on
the brain-extracted T1w using `fast` [FSL 6.0.5.1:57b01774, RRID:SCR_002823,
@fsl_fast].
Brain surfaces were reconstructed using `recon-all` [FreeSurfer 6.0.1,
RRID:SCR_001847, @fs_reconall], and the brain mask estimated
previously was refined with a custom variation of the method to reconcile
ANTs-derived and FreeSurfer-derived segmentations of the cortical
gray-matter of Mindboggle [RRID:SCR_002438, @mindboggle].
Volume-based spatial normalization to one standard space (MNI152NLin2009cAsym) was performed through
nonlinear registration with `antsRegistration` (ANTs 2.3.3),
using brain-extracted versions of both T1w reference and the T1w template.
The following template was selected for spatial normalization:
*ICBM 152 Nonlinear Asymmetrical template version 2009c* [@mni152nlin2009casym, RRID:SCR_008796; TemplateFlow ID: MNI152NLin2009cAsym].

>Functional data preprocessing

>: For each of the 9 BOLD runs found per subject (across all
tasks and sessions), the following preprocessing was performed.
First, a reference volume and its skull-stripped version were generated
 using a custom
methodology of *fMRIPrep*.
Head-motion parameters with respect to the BOLD reference
(transformation matrices, and six corresponding rotation and translation
parameters) are estimated before any spatiotemporal filtering using
`mcflirt` [FSL 6.0.5.1:57b01774, @mcflirt].
The estimated *fieldmap* was then aligned with rigid-registration to the target
EPI (echo-planar imaging) reference run.
The field coefficients were mapped on to the reference EPI using the transform.
BOLD runs were slice-time corrected to 0.974s (0.5 of slice acquisition range
0s-1.95s) using `3dTshift` from AFNI  [@afni, RRID:SCR_005927].
The BOLD reference was then co-registered to the T1w reference using
`bbregister` (FreeSurfer) which implements boundary-based registration [@bbr].
Co-registration was configured with six degrees of freedom.
Several confounding time-series were calculated based on the
*preprocessed BOLD*: framewise displacement (FD), DVARS and
three region-wise global signals.
FD was computed using two formulations following Power (absolute sum of
relative motions, @power_fd_dvars) and Jenkinson (relative root mean square
displacement between affines, @mcflirt).
FD and DVARS are calculated for each functional run, both using their
implementations in *Nipype* [following the definitions by @power_fd_dvars].
The three global signals are extracted within the CSF, the WM, and
the whole-brain masks.
Additionally, a set of physiological regressors were extracted to
allow for component-based noise correction [*CompCor*, @compcor].
Principal components are estimated after high-pass filtering the
*preprocessed BOLD* time-series (using a discrete cosine filter with
128s cut-off) for the two *CompCor* variants: temporal (tCompCor)
and anatomical (aCompCor).
tCompCor components are then calculated from the top 2% variable
voxels within the brain mask.
For aCompCor, three probabilistic masks (CSF, WM and combined CSF+WM)
are generated in anatomical space.
The implementation differs from that of Behzadi et al. in that instead
of eroding the masks by 2 pixels on BOLD space, the aCompCor masks are
subtracted a mask of pixels that likely contain a volume fraction of GM.
This mask is obtained by dilating a GM mask extracted from the FreeSurfer's *aseg* segmentation, and it ensures components are not extracted
from voxels containing a minimal fraction of GM.
Finally, these masks are resampled into BOLD space and binarized by
thresholding at 0.99 (as in the original implementation).
Components are also calculated separately within the WM and CSF masks.
For each CompCor decomposition, the *k* components with the largest singular
values are retained, such that the retained components' time series are
sufficient to explain 50 percent of variance across the nuisance mask (CSF,
WM, combined, or temporal). The remaining components are dropped from
consideration.
The head-motion estimates calculated in the correction step were also
placed within the corresponding confounds file.
The confound time series derived from head motion estimates and global
signals were expanded with the inclusion of temporal derivatives and
quadratic terms for each [@confounds_satterthwaite_2013].
Frames that exceeded a threshold of 0.5 mm FD or
1.5 standardised DVARS were annotated as motion outliers.
The BOLD time-series were resampled into standard space,
generating a *preprocessed BOLD run in MNI152NLin2009cAsym space*.
First, a reference volume and its skull-stripped version were generated
 using a custom
methodology of *fMRIPrep*.
All resamplings can be performed with *a single interpolation
step* by composing all the pertinent transformations (i.e. head-motion
transform matrices, susceptibility distortion correction when available,
and co-registrations to anatomical and output spaces).
Gridded (volumetric) resamplings were performed using `antsApplyTransforms` (ANTs),
configured with Lanczos interpolation to minimize the smoothing
effects of other kernels [@lanczos].
Non-gridded (surface) resamplings were performed using `mri_vol2surf`
(FreeSurfer).

>Many internal operations of *fMRIPrep* use
*Nilearn* 0.8.1 [@nilearn, RRID:SCR_001362],
mostly within the functional processing workflow.
For more details of the pipeline, see [the section corresponding
to workflows in *fMRIPrep*'s documentation](https://fmriprep.readthedocs.io/en/latest/workflows.html "FMRIPrep's documentation").

### 2.4. <a id='toc2_4_'></a>[Example scipt to run fMRIPrep](#toc0_)

If you want to include Freesurfer surface reconstruction, you need to get [Freesurfer license file](https://surfer.nmr.mgh.harvard.edu/registration.html) (it is free!).

If you want to skip the Freesurfer part, specify `--fs-no-reconall` (although, you might still need to have the Freesurfer license!).

About `--output-spaces` see information [here](https://fmriprep.org/en/22.0.1/spaces.html).

See the list of all possible options [here](https://fmriprep.org/en/stable/usage.html#execution-and-the-bids-format).


**Example of a generic script:** [code-examples/fmriprep_script.sh](code-examples/fmriprep_script.sh)

```bash
# ...
# See the possible fmriprep arguments here: https://fmriprep.org/en/stable/usage.html

singularity run \
    -B "$PROJECT_PATH":/MyProject \
    /imaging/local/software/singularity_images/fmriprep/fmriprep-21.0.1.simg \
    /MyProject/data \
    /MyProject/data/derivatives/fmriprep\
    participant \
    --fs-license-file /MyProject/code/freesurfer_license.txt \
    --work-dir /MyProject/scratch/fmriprep \
    --participant-label "$subject" \
    --output-spaces MNI152NLin2009cAsym:res-2 \
    --dummy-scans 2 \
    --fs-no-reconall \
    --nthreads 16 --omp-nthreads 8 \
    --skip-bids-validation \
    --stop-on-first-crash
```

**Example script for processing multiple subjects using SLURM:** [code-examples/step07_fmriprep.sh](code-examples/step07_fmriprep.sh)
