Skip to content

Commit

Permalink
Merge pull request #65 from courtois-neuromod/doc_derivatives
Browse files Browse the repository at this point in the history
Doc derivatives
  • Loading branch information
pbellec committed Dec 17, 2022
2 parents f08edeb + ea0d3dd commit eb24829
Show file tree
Hide file tree
Showing 2 changed files with 130 additions and 77 deletions.
53 changes: 41 additions & 12 deletions source/ACCESS.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,35 +28,64 @@ datalad install -r git@github.com:courtois-neuromod/cneuromod.git
# If errors show up relative to .heudiconv subdataset/submodule, this is OK, they are not published (will be cleaned up in the future).
cd cneuromod
```
You will most likely want to checkout a stable release tag for your analyses. For instance:

### Versioning

By default, this will install the latest stable release of the dataset, which is the recommended version to get for a new analysis.
If you are need to work on a specific version (for instance to reproduce a result), you can change to the appropriate tag with.
```
git checkout cneuromod-2020
git checkout 2020
```

We now set as environment variable the credentials to the file server. The s3 access_key and secret_key will be provided by the data manager after being granted access to cneuromod by the user access committee.
```
# This needs to be set in your `bash` everytime you want to download data.
export AWS_ACCESS_KEY_ID=<s3_access_key> AWS_SECRET_ACCESS_KEY=<s3_secret_key>
```
You can now get data using:

### Preprocessed data

For analysis of fMRI data, it is preferable to directly get the preprocessed data (smriprep and fmriprep for now).

```
datalad get -r <any/file/in/the/dataset.example>
datalad install git@github.com:courtois-neuromod/cneuromod.processed.git
cd cneuromod.processed
```

## Updates
You can install the sub-datasets you are interested in (instead of installing all of them) using for instance:
```
datalad get -n smriprep fmriprep/movie10
```
and then get only the files you need (for instance MNI space output):
```
datalad get smriprep/sub-*/anat/*space-MNI152NLin2009cAsym_* # get all anatomical output in MNI space
datalad get fmriprep/movie10/sub-*/ses-*/func/*space-MNI152NLin2009cAsym_* # get all functional output in MNI space
```
You can add the flag `-J n` to download files in parallel with `n` being the number of threads to use.

The source data used for preprocessing (including raw data) are referenced as sources in the preprocessed dataset following [Yoda](https://handbook.datalad.org/en/latest/basics/101-127-yoda.html), so as to track provenance.
You can also track the version of the cneuromod dataset you are using by installing it in a datalad dataset created for your project.

The dataset will be updated with new releases so you might want to get these changes (unless you are running analyses, or trying to reproduce results). The master branch will evolve with the project, and can be unstable or messy.
Thus, we recommend using specific release tags.

### Stimuli and event files

You will likely need the events files and stimuli for your analysis which can be obtained from the sourcedata reference sub-datasets, for example:
```
git checkout 2020-alpha2 # checkout the dataset tag
git submodule update --init # checkout the subdatasets corresponding commits
datalad get -r fmriprep/movie10/sourcedata/movie10/stimuli fmriprep/movie10/sourcedata/movie10/*_events.tsv
```

There is one stable release per year, e.g. `cneuromod-2020`, which is preceded by one or multiple alpha release (e.g. `cneuromod-2020-alpha`), beta release (e.g. `cneuromod-2020-beta`) and release candidate (e.g. `cneuromod-2020-rc`). To update your dataset to the latest version, use:
or to get subject specific event files for tasks collecting behavioral responses:
```
datalad get -r fmriprep/movie10/sourcedata/hcptrt/sub-*/ses-*/func/*_events.tsv
```

## Updates

The dataset will be updated with new releases so you might want to get these changes (unless you are currently running analyses, or trying to reproduce results). The main branches of all datasets will always track the latest stable release.

```
# update the dataset recursively
datalad update -r --merge
datalad update -r --merge --reobtain-data
```
Once your local dataset clone is updated, you might need to pull new data, as some files could have been added or changed.
Once your local dataset clone is updated, you might need to pull new data, as some files could have been added or modified. The `--reobtain-data` flag should automatically pull files that you had already downloaded in case these were modified.
154 changes: 89 additions & 65 deletions source/DERIVATIVES.md
Original file line number Diff line number Diff line change
@@ -1,98 +1,114 @@
# Derivatives

## fMRIPrep
## [sMRIPrep](https://github.com/courtois-neuromod/anat.smriprep)

The anatomical data was preprocessed using [sMRIPrep pipeline](https://github.com/nipreps/smriprep).
It took as input the T1w and T2w of the first 2 sessions of all participants, which were averaged after coregistration.

## [fMRIPrep](https://github.com/courtois-neuromod/cneuromod.fmriprep)

### Overview
The functional data was preprocessed using the [fMRIprep pipeline](https://fmriprep.readthedocs.io/en/stable/installation.html). FmriPrep is an fMRI data preprocessing pipeline that requires minimal user input, while providing error and output reporting. It performs basic processing steps (coregistration, normalization, unwarping, noise component extraction, segmentation, skullstripping etc.) and provides outputs that can be easily submitted to a variety of group level analyses, including task-based or resting-state fMRI, graph theory measures, surface or volume-based statistics, etc. The fMRIprep pipeline uses a combination of tools from well-known software packages, including FSL, ANTs, FreeSurfer and [AFNI](https://afni.nimh.nih.gov/). For additional information regarding fMRIPrep installation, workflow and outputs, please visit the [documentation page](https://fmriprep.readthedocs.io/en/stable/installation.html).
Note that the `slicetiming` and `recon-all` options were disabled (i.e. fMRIprep was invoked with the flags `--fs-no-reconall --ignore slicetiming`).
Note that the `slicetiming` option was disabled (i.e. fMRIprep was invoked with the flag `--ignore slicetiming`).

### Outputs
The outputs of fMRIprep can be found under the folder of each dataset (e.g. `movie10`) `derivatives/fmriprep` in the Courtois NeuroMod datalad. The description of participant, session, task and event tags can be found in the [Datasets](DATASETS.html) section. Each participant folder (`sub-*`) contains:
- `anat` folder with T1 preprocessed and segmented in native and MNI space, registration parameters
The outputs of fMRIprep can be found as sub-datasets of the [cneuromod.processed](https://github.com/courtois-neuromod/cneuromod.processed) super-dataset.
fMRIPrep functional preprocessing was run using the anatomical "fast-track" (flag `--anat-derivatives`) with sMRIPrep output described above, so as to use the same anatomical basis for all functional dataset.
The output was generated in `T1w`, `MNI152NLin2009cAsym` and `fsLR-den-91k` spaces as defined by [templateflow](https://www.templateflow.org/) to respectively enable native space and volumetric or surface-based analyses.

The description of participant, session, task and event tags can be found in the [Datasets](DATASETS.html) section. Each participant folder (`sub-*`) contains:
- `ses-*/func` containing for each fMRI run of that session file prefixed with:
- `_boldref.nii.gz` : a BOLD single volume reference.
- `_*-brain_mask.nii.gz` : the brain mask in fMRI space.
- `_*-preproc_bold.nii.gz` : the preprocessed BOLD timeseries.
- `_*-confounds_regressors.tsv` : a tabular tsv file, containing a large set of confounds to use in analysis steps (eg. GLM).
- `*_boldref.nii.gz` : a BOLD single volume reference.
- `*_desc-brain_mask.nii.gz` : the brain mask in fMRI space.
- `*_desc-preproc_bold.nii.gz` : the preprocessed BOLD timeseries.
- `*_desc-confounds_timeseries.tsv` : a tabular tsv file, containing a large set of confounds to use in analysis steps (eg. GLM).

### Recommended preprocessing strategy
The confounding regressors are correlated, thus it is recommended to use a subset of these regressors. Also note that preprocessed time series have not been corrected for any confounds, but simply realigned in space, and it is therefore critical to regress some of the available confounds prior to analysis. For python users, we recommend using [nilearn](https://nilearn.github.io) and the tool [load_confounds](https://github.com/SIMEXP/load_confounds) to load confounds from the fMRIprep outputs, using with the `Params24` strategy. As the NeuroMod data consistently exhibits low levels of motion, we recommend against removing time points with excessive motion (aka scrubbing). Because of the 2 mm spatial resolution of the fMRI scan, there is substantial impact of thermal noise, and some amount of spatial smoothing is advisable. Our preliminary analyses suggest thah `smoothing_fwhm=8` in nilearn nifti maskers to work well.
The confounding regressors are correlated, thus it is recommended to use a subset of these regressors. Also note that preprocessed time series have not been corrected for any confounds, but simply realigned in space, and it is therefore critical to regress some of the available confounds prior to analysis. For python users, we recommend using [nilearn](https://nilearn.github.io) [load_confounds](https://nilearn.github.io/dev/modules/generated/nilearn.interfaces.fmriprep.load_confounds.html) to load confounds from the fMRIprep outputs, using the `Params24` strategy. As the NeuroMod data consistently exhibits low levels of motion, we recommend against removing time points with excessive motion (aka scrubbing). Because of the 2 mm spatial resolution of the fMRI scan, there is substantial impact of thermal noise, and some amount of spatial smoothing is advisable, the extent of it being determined by your hypotheses and analysis.

### Pipeline description
The following boilerplate text was automatically generated by fMRIPrep with the express intention that users should copy and paste this text into their manuscripts *unchanged*. It is released under the [CC0](https://creativecommons.org/publicdomain/zero/1.0/) license. All references in the text link to a `.bib` file with detailed reference list, ready to be incorporated in a `LaTeX` document.

Results included in this manuscript come from preprocessing performed using fMRIPrep 20.1.1+38.g8480eabb ([fmriprep1](./_static/CITATION.bib); [fmriprep2](./_static/CITATION.bib); RRID:SCR_016216), which is based on Nipype 1.5.0 ([nipype1](./_static/CITATION.bib); [nipype2](./_static/CITATION.bib); RRID:SCR_002502).

#### Anatomical data preprocessing

The T1-weighted (T1w) image was corrected for intensity non-uniformity (INU)
with `N4BiasFieldCorrection` [[n4](./_static/CITATION.bib)], distributed with ANTs 2.2.0 [[ants](./_static/CITATION.bib), RRID:SCR_004757], and used as T1w-reference throughout the workflow.
The T1w-reference was then skull-stripped with a *Nipype* implementation of
the `antsBrainExtraction.sh` workflow (from ANTs), using OASIS30ANTs
as target template.
Brain tissue segmentation of cerebrospinal fluid (CSF),
white-matter (WM) and gray-matter (GM) was performed on
the brain-extracted T1w using `fast` [FSL 5.0.9, RRID:SCR_002823,
[fsl_fast](./_static/CITATION.bib)].
Volume-based spatial normalization to one standard space (MNI152NLin2009cAsym) was performed through
nonlinear registration with `antsRegistration` (ANTs 2.2.0),
using brain-extracted versions of both T1w reference and the T1w template.
The following template was selected for spatial normalization:
*ICBM 152 Nonlinear Asymmetrical template version 2009c* [[mni152nlin2009casym](./_static/CITATION.bib), RRID:SCR_008796; TemplateFlow ID: MNI152NLin2009cAsym].

#### Functional data preprocessing

For each of the BOLD runs found per subject (across all

Results included in this manuscript come from preprocessing
performed using *fMRIPrep* 20.2.5
(@fmriprep1; @fmriprep2; RRID:SCR_016216),
which is based on *Nipype* 1.6.1
(@nipype1; @nipype2; RRID:SCR_002502).

Anatomical data preprocessing

: A total of 0 T1-weighted (T1w) images were found within the input
BIDS dataset.
Anatomical preprocessing was reused from previously existing derivative objects.


Functional data preprocessing

: For each of the 2 BOLD runs found per subject (across all
tasks and sessions), the following preprocessing was performed.
First, a reference volume and its skull-stripped version were generated
using a custom methodology of *fMRIPrep*.
A deformation field to correct for susceptibility distortions was estimated
based on two echo-planar imaging (EPI) references with opposing phase-encoding
directions, using `3dQwarp` [afni](./_static/CITATION.bib) (AFNI 20160207).
Based on the estimated susceptibility distortion, an
unwarped BOLD reference was calculated for a more accurate
co-registration with the anatomical reference.
by aligning and averaging
1 single-band references (SBRefs).
A B0-nonuniformity map (or *fieldmap*) was estimated based on two (or more)
echo-planar imaging (EPI) references with opposing phase-encoding
directions, with `3dQwarp` @afni (AFNI 20160207).
Based on the estimated susceptibility distortion, a corrected
EPI (echo-planar imaging) reference was calculated for a more
accurate co-registration with the anatomical reference.
The BOLD reference was then co-registered to the T1w reference using
`flirt` [FSL 5.0.9, [flirt](./_static/CITATION.bib)] with the boundary-based registration [[bbr](./_static/CITATION.bib)]
cost-function.
Co-registration was configured with nine degrees of freedom to account
for distortions remaining in the BOLD reference.
`bbregister` (FreeSurfer) which implements boundary-based registration [@bbr].
Co-registration was configured with six degrees of freedom.
Head-motion parameters with respect to the BOLD reference
(transformation matrices, and six corresponding rotation and translation
parameters) are estimated before any spatiotemporal filtering using
`mcflirt` [FSL 5.0.9, [mcflirt](./_static/CITATION.bib)].
`mcflirt` [FSL 5.0.9, @mcflirt].
First, a reference volume and its skull-stripped version were generated
using a custom
methodology of *fMRIPrep*.
The BOLD time-series were resampled onto the following surfaces
(FreeSurfer reconstruction nomenclature):
*fsaverage*.
The BOLD time-series (including slice-timing correction when applied)
were resampled onto their original, native space by applying
a single, composite transform to correct for head-motion and
susceptibility distortions.
These resampled BOLD time-series will be referred to as *preprocessed
BOLD in original space*, or just *preprocessed BOLD*.
The BOLD time-series were resampled into standard space,
generating a *preprocessed BOLD run in ['MNI152NLin2009cAsym'] space*.
generating a *preprocessed BOLD run in MNI152NLin2009cAsym space*.
First, a reference volume and its skull-stripped version were generated
using a custom methodology of *fMRIPrep*.
using a custom
methodology of *fMRIPrep*.
*Grayordinates* files [@hcppipelines] containing 91k samples were also
generated using the highest-resolution ``fsaverage`` as intermediate standardized
surface space.
Several confounding time-series were calculated based on the
*preprocessed BOLD*: framewise displacement (FD), DVARS and
three region-wise global signals.
FD was computed using two formulations following Power (absolute sum of
relative motions, @power_fd_dvars) and Jenkinson (relative root mean square
displacement between affines, @mcflirt).
FD and DVARS are calculated for each functional run, both using their
implementations in *Nipype* [following the definitions by [power_fd_dvars](./_static/CITATION.bib)].
implementations in *Nipype* [following the definitions by @power_fd_dvars].
The three global signals are extracted within the CSF, the WM, and
the whole-brain masks.
Additionally, a set of physiological regressors were extracted to
allow for component-based noise correction [*CompCor*, [compcor](./_static/CITATION.bib)].
allow for component-based noise correction [*CompCor*, @compcor].
Principal components are estimated after high-pass filtering the
*preprocessed BOLD* time-series (using a discrete cosine filter with
128s cut-off) for the two *CompCor* variants: temporal (tCompCor)
and anatomical (aCompCor).
tCompCor components are then calculated from the top 5% variable
voxels within a mask covering the subcortical regions.
This subcortical mask is obtained by heavily eroding the brain mask,
which ensures it does not include cortical GM regions.
For aCompCor, components are calculated within the intersection of
the aforementioned mask and the union of CSF and WM masks calculated
in T1w space, after their projection to the native space of each
functional run (using the inverse BOLD-to-T1w transformation). Components
are also calculated separately within the WM and CSF masks.
tCompCor components are then calculated from the top 2% variable
voxels within the brain mask.
For aCompCor, three probabilistic masks (CSF, WM and combined CSF+WM)
are generated in anatomical space.
The implementation differs from that of Behzadi et al. in that instead
of eroding the masks by 2 pixels on BOLD space, the aCompCor masks are
subtracted a mask of pixels that likely contain a volume fraction of GM.
This mask is obtained by dilating a GM mask extracted from the FreeSurfer's *aseg* segmentation, and it ensures components are not extracted
from voxels containing a minimal fraction of GM.
Finally, these masks are resampled into BOLD space and binarized by
thresholding at 0.99 (as in the original implementation).
Components are also calculated separately within the WM and CSF masks.
For each CompCor decomposition, the *k* components with the largest singular
values are retained, such that the retained components' time series are
sufficient to explain 50 percent of variance across the nuisance mask (CSF,
Expand All @@ -102,22 +118,30 @@ The head-motion estimates calculated in the correction step were also
placed within the corresponding confounds file.
The confound time series derived from head motion estimates and global
signals were expanded with the inclusion of temporal derivatives and
quadratic terms for each [[confounds_satterthwaite_2013](./_static/CITATION.bib)].
Frames that exceeded a threshold of 0.5 mm FD or 1.5 standardised DVARS
were annotated as motion outliers.
quadratic terms for each [@confounds_satterthwaite_2013].
Frames that exceeded a threshold of 0.5 mm FD or
1.5 standardised DVARS were annotated as motion outliers.
All resamplings can be performed with *a single interpolation
step* by composing all the pertinent transformations (i.e. head-motion
transform matrices, susceptibility distortion correction when available,
and co-registrations to anatomical and output spaces).
Gridded (volumetric) resamplings were performed using `antsApplyTransforms` (ANTs),
configured with Lanczos interpolation to minimize the smoothing
effects of other kernels [[lanczos](./_static/CITATION.bib)].
effects of other kernels [@lanczos].
Non-gridded (surface) resamplings were performed using `mri_vol2surf`
(FreeSurfer).


Many internal operations of *fMRIPrep* use
*Nilearn* 0.5.2 [[nilearn](./_static/CITATION.bib), RRID:SCR_001362],
*Nilearn* 0.6.2 [@nilearn, RRID:SCR_001362],
mostly within the functional processing workflow.
For more details of the pipeline, see [the section corresponding
to workflows in *fMRIPrep*'s documentation](https://fmriprep.readthedocs.io/en/latest/workflows.html "FMRIPrep's documentation").


### Copyright Waiver

The above boilerplate text was automatically generated by fMRIPrep
with the express intention that users should copy and paste this
text into their manuscripts *unchanged*.
It is released under the [CC0](https://creativecommons.org/publicdomain/zero/1.0/) license.

0 comments on commit eb24829

Please sign in to comment.