Merge pull request #65 from courtois-neuromod/doc_derivatives

Doc derivatives
courtois-neuromod · Dec 17, 2022 · eb24829 · eb24829
2 parents f08edeb + ea0d3dd
commit eb24829
Show file tree

Hide file tree

Showing 2 changed files with 130 additions and 77 deletions.
diff --git a/source/ACCESS.md b/source/ACCESS.md
@@ -28,35 +28,64 @@ datalad install -r git@github.com:courtois-neuromod/cneuromod.git
 # If errors show up relative to .heudiconv subdataset/submodule, this is OK, they are not published (will be cleaned up in the future).
 cd cneuromod
 ```
-You will most likely want to checkout a stable release tag for your analyses. For instance:
+
+### Versioning
+
+By default, this will install the latest stable release of the dataset, which is the recommended version to get for a new analysis.
+If you are need to work on a specific version (for instance to reproduce a result), you can change to the appropriate tag with.
 ```
-git checkout cneuromod-2020
+git checkout 2020
 ```
+
 We now set as environment variable the credentials to the file server. The s3 access_key and secret_key will be provided by the data manager after being granted access to cneuromod by the user access committee.
 ```
 # This needs to be set in your `bash` everytime you want to download data.
 export AWS_ACCESS_KEY_ID=<s3_access_key>  AWS_SECRET_ACCESS_KEY=<s3_secret_key>
 ```
-You can now get data using:
+
+### Preprocessed data
+
+For analysis of fMRI data, it is preferable to directly get the preprocessed data (smriprep and fmriprep for now).
+
 ```
-datalad get -r <any/file/in/the/dataset.example>
+datalad install git@github.com:courtois-neuromod/cneuromod.processed.git
+cd cneuromod.processed
 ```
 
-## Updates
+You can install the sub-datasets you are interested in (instead of installing all of them) using for instance:
+```
+datalad get -n smriprep fmriprep/movie10
+```
+and then get only the files you need (for instance MNI space output):
+```
+datalad get smriprep/sub-*/anat/*space-MNI152NLin2009cAsym_* # get all anatomical output in MNI space
+datalad get fmriprep/movie10/sub-*/ses-*/func/*space-MNI152NLin2009cAsym_* # get all functional output in MNI space
+```
+You can add the flag `-J n` to download files in parallel with `n` being the number of threads to use.
+
+The source data used for preprocessing (including raw data) are referenced as sources in the preprocessed dataset following [Yoda](https://handbook.datalad.org/en/latest/basics/101-127-yoda.html), so as to track provenance.
+You can also track the version of the cneuromod dataset you are using by installing it in a datalad dataset created for your project.
 
-The dataset will be updated with new releases so you might want to get these changes (unless you are running analyses, or trying to reproduce results). The master branch will evolve with the project, and can be unstable or messy.
-Thus, we recommend using specific release tags.
 
+### Stimuli and event files
+
+You will likely need the events files and stimuli for your analysis which can be obtained from the sourcedata reference sub-datasets, for example:
 ```
-git checkout 2020-alpha2 # checkout the dataset tag
-git submodule update --init # checkout the subdatasets corresponding commits
+datalad get -r fmriprep/movie10/sourcedata/movie10/stimuli fmriprep/movie10/sourcedata/movie10/*_events.tsv
 ```
 
-There is one stable release per year, e.g. `cneuromod-2020`, which is preceded by one or multiple alpha release (e.g. `cneuromod-2020-alpha`), beta release (e.g. `cneuromod-2020-beta`) and release candidate (e.g. `cneuromod-2020-rc`). To update your dataset to the latest version, use:
+or to get subject specific event files for tasks collecting behavioral responses:
+```
+datalad get -r fmriprep/movie10/sourcedata/hcptrt/sub-*/ses-*/func/*_events.tsv
+```
+
+## Updates
+
+The dataset will be updated with new releases so you might want to get these changes (unless you are currently running analyses, or trying to reproduce results). The main branches of all datasets will always track the latest stable release.
 
 ```
 # update the dataset recursively
-datalad update -r --merge
+datalad update -r --merge --reobtain-data
 
 ```
-Once your local dataset clone is updated, you might need to pull new data, as some files could have been added or changed.
+Once your local dataset clone is updated, you might need to pull new data, as some files could have been added or modified. The `--reobtain-data` flag should automatically pull files that you had already downloaded in case these were modified.
diff --git a/source/DERIVATIVES.md b/source/DERIVATIVES.md
@@ -1,98 +1,114 @@
 # Derivatives
 
-## fMRIPrep
+## [sMRIPrep](https://github.com/courtois-neuromod/anat.smriprep)
+
+The anatomical data was preprocessed using [sMRIPrep pipeline](https://github.com/nipreps/smriprep).
+It took as input the T1w and T2w of the first 2 sessions of all participants, which were averaged after coregistration.
+
+## [fMRIPrep](https://github.com/courtois-neuromod/cneuromod.fmriprep)
 
 ### Overview
 The functional data was preprocessed using the [fMRIprep pipeline](https://fmriprep.readthedocs.io/en/stable/installation.html). FmriPrep is an fMRI data preprocessing pipeline that requires minimal user input, while providing error and output reporting. It performs basic processing steps (coregistration, normalization, unwarping, noise component extraction, segmentation, skullstripping etc.) and provides outputs that can be easily submitted to a variety of group level analyses, including task-based or resting-state fMRI, graph theory measures, surface or volume-based statistics, etc. The fMRIprep pipeline uses a combination of tools from well-known software packages, including FSL, ANTs, FreeSurfer and [AFNI](https://afni.nimh.nih.gov/). For additional information regarding fMRIPrep installation, workflow and outputs, please visit the [documentation page](https://fmriprep.readthedocs.io/en/stable/installation.html).
- Note that the `slicetiming` and `recon-all` options were disabled (i.e. fMRIprep was invoked with the flags `--fs-no-reconall --ignore slicetiming`).
+Note that the `slicetiming` option was disabled (i.e. fMRIprep was invoked with the flag `--ignore slicetiming`).
 
 ### Outputs
-The outputs of fMRIprep can be found under the folder of each dataset (e.g. `movie10`) `derivatives/fmriprep` in the Courtois NeuroMod datalad. The description of participant, session, task and event tags can be found in the [Datasets](DATASETS.html) section. Each participant folder (`sub-*`) contains:
-- `anat` folder with T1 preprocessed and segmented in native and MNI space, registration parameters
+The outputs of fMRIprep can be found as sub-datasets of the [cneuromod.processed](https://github.com/courtois-neuromod/cneuromod.processed) super-dataset.
+fMRIPrep functional preprocessing was run using the anatomical "fast-track" (flag `--anat-derivatives`) with sMRIPrep output described above, so as to use the same anatomical basis for all functional dataset.
+The output was generated in `T1w`, `MNI152NLin2009cAsym` and `fsLR-den-91k` spaces as defined by [templateflow](https://www.templateflow.org/) to respectively enable native space and volumetric or surface-based analyses.
+
+The description of participant, session, task and event tags can be found in the [Datasets](DATASETS.html) section. Each participant folder (`sub-*`) contains:
 - `ses-*/func` containing for each fMRI run of that session file prefixed with:
-  - `_boldref.nii.gz` : a BOLD single volume reference.
-  - `_*-brain_mask.nii.gz` : the brain mask in fMRI space.
-  - `_*-preproc_bold.nii.gz` : the preprocessed BOLD timeseries.
-  - `_*-confounds_regressors.tsv` : a tabular tsv file, containing a large set of confounds to use in analysis steps (eg. GLM). 
-  
+  - `*_boldref.nii.gz` : a BOLD single volume reference.
+  - `*_desc-brain_mask.nii.gz` : the brain mask in fMRI space.
+  - `*_desc-preproc_bold.nii.gz` : the preprocessed BOLD timeseries.
+  - `*_desc-confounds_timeseries.tsv` : a tabular tsv file, containing a large set of confounds to use in analysis steps (eg. GLM).
+
 ### Recommended preprocessing strategy
-The confounding regressors are correlated, thus it is recommended to use a subset of these regressors. Also note that preprocessed time series have not been corrected for any confounds, but simply realigned in space, and it is therefore critical to regress some of the available confounds prior to analysis. For python users, we recommend using [nilearn](https://nilearn.github.io) and the tool [load_confounds](https://github.com/SIMEXP/load_confounds) to load confounds from the fMRIprep outputs, using with the `Params24` strategy. As the NeuroMod data consistently exhibits low levels of motion, we recommend against removing time points with excessive motion (aka scrubbing). Because of the 2 mm spatial resolution of the fMRI scan, there is substantial impact of thermal noise, and some amount of spatial smoothing is advisable. Our preliminary analyses suggest thah `smoothing_fwhm=8` in nilearn nifti maskers to work well. 
-  
+The confounding regressors are correlated, thus it is recommended to use a subset of these regressors. Also note that preprocessed time series have not been corrected for any confounds, but simply realigned in space, and it is therefore critical to regress some of the available confounds prior to analysis. For python users, we recommend using [nilearn](https://nilearn.github.io) [load_confounds](https://nilearn.github.io/dev/modules/generated/nilearn.interfaces.fmriprep.load_confounds.html) to load confounds from the fMRIprep outputs, using the `Params24` strategy. As the NeuroMod data consistently exhibits low levels of motion, we recommend against removing time points with excessive motion (aka scrubbing). Because of the 2 mm spatial resolution of the fMRI scan, there is substantial impact of thermal noise, and some amount of spatial smoothing is advisable, the extent of it being determined by your hypotheses and analysis.
+
 ### Pipeline description
-The following boilerplate text was automatically generated by fMRIPrep with the express intention that users should copy and paste this text into their manuscripts *unchanged*. It is released under the [CC0](https://creativecommons.org/publicdomain/zero/1.0/) license. All references in the text link to a `.bib` file with detailed reference list, ready to be incorporated in a `LaTeX` document.
-
-Results included in this manuscript come from preprocessing performed using fMRIPrep 20.1.1+38.g8480eabb ([fmriprep1](./_static/CITATION.bib); [fmriprep2](./_static/CITATION.bib); RRID:SCR_016216), which is based on Nipype 1.5.0 ([nipype1](./_static/CITATION.bib); [nipype2](./_static/CITATION.bib); RRID:SCR_002502).
-
-#### Anatomical data preprocessing
-
-The T1-weighted (T1w) image was corrected for intensity non-uniformity (INU)
-with `N4BiasFieldCorrection` [[n4](./_static/CITATION.bib)], distributed with ANTs 2.2.0 [[ants](./_static/CITATION.bib), RRID:SCR_004757], and used as T1w-reference throughout the workflow.
-The T1w-reference was then skull-stripped with a *Nipype* implementation of
-the `antsBrainExtraction.sh` workflow (from ANTs), using OASIS30ANTs
-as target template.
-Brain tissue segmentation of cerebrospinal fluid (CSF),
-white-matter (WM) and gray-matter (GM) was performed on
-the brain-extracted T1w using `fast` [FSL 5.0.9, RRID:SCR_002823,
-[fsl_fast](./_static/CITATION.bib)].
-Volume-based spatial normalization to one standard space (MNI152NLin2009cAsym) was performed through
-nonlinear registration with `antsRegistration` (ANTs 2.2.0),
-using brain-extracted versions of both T1w reference and the T1w template.
-The following template was selected for spatial normalization:
-*ICBM 152 Nonlinear Asymmetrical template version 2009c* [[mni152nlin2009casym](./_static/CITATION.bib), RRID:SCR_008796; TemplateFlow ID: MNI152NLin2009cAsym].
-
-#### Functional data preprocessing
-
-For each of the BOLD runs found per subject (across all
+
+Results included in this manuscript come from preprocessing
+performed using *fMRIPrep* 20.2.5
+(@fmriprep1; @fmriprep2; RRID:SCR_016216),
+which is based on *Nipype* 1.6.1
+(@nipype1; @nipype2; RRID:SCR_002502).
+
+Anatomical data preprocessing
+
+: A total of 0 T1-weighted (T1w) images were found within the input
+BIDS dataset.
+Anatomical preprocessing was reused from previously existing derivative objects.
+
+
+Functional data preprocessing
+
+: For each of the 2 BOLD runs found per subject (across all
 tasks and sessions), the following preprocessing was performed.
 First, a reference volume and its skull-stripped version were generated
-using a custom methodology of *fMRIPrep*.
-A deformation field to correct for susceptibility distortions was estimated
-based on two echo-planar imaging (EPI) references with opposing phase-encoding
-directions, using `3dQwarp` [afni](./_static/CITATION.bib) (AFNI 20160207).
-Based on the estimated susceptibility distortion, an
-unwarped BOLD reference was calculated for a more accurate
-co-registration with the anatomical reference.
+by aligning and averaging
+1 single-band references (SBRefs).
+A B0-nonuniformity map (or *fieldmap*) was estimated based on two (or more)
+echo-planar imaging (EPI) references with opposing phase-encoding
+directions, with `3dQwarp` @afni (AFNI 20160207).
+Based on the estimated susceptibility distortion, a corrected
+EPI (echo-planar imaging) reference was calculated for a more
+accurate co-registration with the anatomical reference.
 The BOLD reference was then co-registered to the T1w reference using
-`flirt` [FSL 5.0.9, [flirt](./_static/CITATION.bib)] with the boundary-based registration [[bbr](./_static/CITATION.bib)]
-cost-function.
-Co-registration was configured with nine degrees of freedom to account
-for distortions remaining in the BOLD reference.
+`bbregister` (FreeSurfer) which implements boundary-based registration [@bbr].
+Co-registration was configured with six degrees of freedom.
 Head-motion parameters with respect to the BOLD reference
 (transformation matrices, and six corresponding rotation and translation
 parameters) are estimated before any spatiotemporal filtering using
-`mcflirt` [FSL 5.0.9, [mcflirt](./_static/CITATION.bib)].
+`mcflirt` [FSL 5.0.9, @mcflirt].
+First, a reference volume and its skull-stripped version were generated
+ using a custom
+methodology of *fMRIPrep*.
+The BOLD time-series were resampled onto the following surfaces
+(FreeSurfer reconstruction nomenclature):
+*fsaverage*.
 The BOLD time-series (including slice-timing correction when applied)
 were resampled onto their original, native space by applying
 a single, composite transform to correct for head-motion and
 susceptibility distortions.
 These resampled BOLD time-series will be referred to as *preprocessed
 BOLD in original space*, or just *preprocessed BOLD*.
 The BOLD time-series were resampled into standard space,
-generating a *preprocessed BOLD run in ['MNI152NLin2009cAsym'] space*.
+generating a *preprocessed BOLD run in MNI152NLin2009cAsym space*.
 First, a reference volume and its skull-stripped version were generated
-using a custom methodology of *fMRIPrep*.
+ using a custom
+methodology of *fMRIPrep*.
+*Grayordinates* files [@hcppipelines] containing 91k samples were also
+generated using the highest-resolution ``fsaverage`` as intermediate standardized
+surface space.
 Several confounding time-series were calculated based on the
 *preprocessed BOLD*: framewise displacement (FD), DVARS and
 three region-wise global signals.
+FD was computed using two formulations following Power (absolute sum of
+relative motions, @power_fd_dvars) and Jenkinson (relative root mean square
+displacement between affines, @mcflirt).
 FD and DVARS are calculated for each functional run, both using their
-implementations in *Nipype* [following the definitions by [power_fd_dvars](./_static/CITATION.bib)].
+implementations in *Nipype* [following the definitions by @power_fd_dvars].
 The three global signals are extracted within the CSF, the WM, and
 the whole-brain masks.
 Additionally, a set of physiological regressors were extracted to
-allow for component-based noise correction [*CompCor*, [compcor](./_static/CITATION.bib)].
+allow for component-based noise correction [*CompCor*, @compcor].
 Principal components are estimated after high-pass filtering the
 *preprocessed BOLD* time-series (using a discrete cosine filter with
 128s cut-off) for the two *CompCor* variants: temporal (tCompCor)
 and anatomical (aCompCor).
-tCompCor components are then calculated from the top 5% variable
-voxels within a mask covering the subcortical regions.
-This subcortical mask is obtained by heavily eroding the brain mask,
-which ensures it does not include cortical GM regions.
-For aCompCor, components are calculated within the intersection of
-the aforementioned mask and the union of CSF and WM masks calculated
-in T1w space, after their projection to the native space of each
-functional run (using the inverse BOLD-to-T1w transformation). Components
-are also calculated separately within the WM and CSF masks.
+tCompCor components are then calculated from the top 2% variable
+voxels within the brain mask.
+For aCompCor, three probabilistic masks (CSF, WM and combined CSF+WM)
+are generated in anatomical space.
+The implementation differs from that of Behzadi et al. in that instead
+of eroding the masks by 2 pixels on BOLD space, the aCompCor masks are
+subtracted a mask of pixels that likely contain a volume fraction of GM.
+This mask is obtained by dilating a GM mask extracted from the FreeSurfer's *aseg* segmentation, and it ensures components are not extracted
+from voxels containing a minimal fraction of GM.
+Finally, these masks are resampled into BOLD space and binarized by
+thresholding at 0.99 (as in the original implementation).
+Components are also calculated separately within the WM and CSF masks.
 For each CompCor decomposition, the *k* components with the largest singular
 values are retained, such that the retained components' time series are
 sufficient to explain 50 percent of variance across the nuisance mask (CSF,
@@ -102,22 +118,30 @@ The head-motion estimates calculated in the correction step were also
 placed within the corresponding confounds file.
 The confound time series derived from head motion estimates and global
 signals were expanded with the inclusion of temporal derivatives and
-quadratic terms for each [[confounds_satterthwaite_2013](./_static/CITATION.bib)].
-Frames that exceeded a threshold of 0.5 mm FD or 1.5 standardised DVARS
-were annotated as motion outliers.
+quadratic terms for each [@confounds_satterthwaite_2013].
+Frames that exceeded a threshold of 0.5 mm FD or
+1.5 standardised DVARS were annotated as motion outliers.
 All resamplings can be performed with *a single interpolation
 step* by composing all the pertinent transformations (i.e. head-motion
 transform matrices, susceptibility distortion correction when available,
 and co-registrations to anatomical and output spaces).
 Gridded (volumetric) resamplings were performed using `antsApplyTransforms` (ANTs),
 configured with Lanczos interpolation to minimize the smoothing
-effects of other kernels [[lanczos](./_static/CITATION.bib)].
+effects of other kernels [@lanczos].
 Non-gridded (surface) resamplings were performed using `mri_vol2surf`
 (FreeSurfer).
 
 
 Many internal operations of *fMRIPrep* use
-*Nilearn* 0.5.2 [[nilearn](./_static/CITATION.bib), RRID:SCR_001362],
+*Nilearn* 0.6.2 [@nilearn, RRID:SCR_001362],
 mostly within the functional processing workflow.
 For more details of the pipeline, see [the section corresponding
 to workflows in *fMRIPrep*'s documentation](https://fmriprep.readthedocs.io/en/latest/workflows.html "FMRIPrep's documentation").
+
+
+### Copyright Waiver
+
+The above boilerplate text was automatically generated by fMRIPrep
+with the express intention that users should copy and paste this
+text into their manuscripts *unchanged*.
+It is released under the [CC0](https://creativecommons.org/publicdomain/zero/1.0/) license.