Initialize derivative data set with a copy of the raw data #78

Remi-Gau · 2020-10-29T10:55:33Z

There are a few circumstances (see here under) where I would like to initialize the derivatives with a copy, complete or partial, of the original full BIDS dataset. I have been working on a prepare_derivatives.m for that specific purpose with some features, such as:

copy the data of all or a sub-list of subjects;
if there are multiple sessions, copy the data from all or only some sessions;
copy all or a sub-list of modalities, and for func data possibly all or some tasks;
unzip the .nii.gz files and 4D-to-3D unpacking the func and dwi images, if requested;
decide where that derivatives/tool subfolder will be created, i.e. not necessarily next to the "raw" data in the BIDS folder.

Why would I want to do that? Well because, it is convenient to work on a subset of data and/or in a "sandpit", for example

to develop a processing pipeline with 5 subjects instead of 100 or simply to keep the subjects that are kept for the specific analysis, i.e. leave out some "outliers"
analysis can focus on a single session for some question and therefore the processing will be different from that combing multiple sessions
analysis can also focus on a single modality (or task for func data), therefore no need to bring along everything
some operations in SPM's standard special processing pipelines, e.g. realignment and coregistration, do change the header of the images, so they ought to be in a derivatives subfolder before any processing. Moreover SPM and some other tools, would prefer dealing with a series of 3D images rather than a single 4D volume, e.g. ditching the 1st few functional images into a "dummy" subfolder.
if the dataset is saved on an external storage place (like the centralized "mass-storage" system at ULiège), one cannot directly work on this server and therefore data need to be copied locally before any processing is applied, i.e. I might as well call this my derivatives\some_tool_or_step data and start working on these. Afterwards, it is straightforward to copy that derivatives folder back on the external storage place.

Originally posted by @ChristophePhillips in #60 (comment)

possibility to copy part of the raw only by filtering by subject, session, modality, task
possibility to unzip and split 4D into 3D (FYI: Rémi is not in favor of this. 😉 )
this derivative folder can be anywhere relative to the raw data but make the default path follow one of the recommended ways to store derivatives with respec to the raw (see BIDS specs here
initialize a dataset_description.json in the root of that derivatiive folder

See comments:

In other repos:

The text was updated successfully, but these errors were encountered:

Remi-Gau · 2021-02-15T09:29:02Z

I have also something similar in our lab pipeline though yours seem to have more features. I also know that spmup by @CPernet has something that does some of that.

So that seems like one the obvious low hanging fruit !!

Remi-Gau · 2021-02-15T09:30:01Z

Copied from original issue

@ChristophePhillips: spm_copy and spm_mkdir are tools to help you for what you want to do:
https://en.wikibooks.org/wiki/SPM/BIDS#Formatting_datasets_into_BIDS
that we used here:
https://github.com/spm/MultimodalScripts/blob/master/code/scripted/master_script.m#L49-L76

Remi-Gau · 2021-02-15T09:30:40Z

Actually had opened an issue on one of our repo to get spm_mkdir and spm_copy out of spm. So I would heart to have them as part of bids matlab.

One headache to keep in mind: datasets curated with datalad have their content stored with git annex (I need to finish a PR about that on the datalad handbook). So a simple call to copyfile will not follow the symbolic link and you just end up with just a bunch of broken links.

2 options:

the user must make sure they have run datalad unlock the files to copy
try a system call to cp -L and catch with copyfile if it fails (this is the hacky way of doing things we are currentty using: see here

The second option is going to make Windows users cry though... If they use datalad: no a huge user base at the moment but that too could grow.

Any other ways around this that would make everyone happy?

Remi-Gau · 2021-02-15T09:31:29Z

Origianlly posted by @gllmflndn

I think we should make a system() call only out of necessity. We could test for symlinks within a isunix condition and only use cp -L for these?

Remi-Gau mentioned this issue Oct 29, 2020

bids-matlab: a "wish-list" of enhancements and requests for new features #60

Closed

Remi-Gau added the enhancement New feature or request label Oct 29, 2020

Remi-Gau linked a pull request Feb 19, 2021 that will close this issue

[WIP] Function to copy derivatives #162

Closed

7 tasks

Remi-Gau linked a pull request Apr 17, 2021 that will close this issue

[WIP] Function to copy derivatives #171

Merged

10 tasks

Remi-Gau mentioned this issue Apr 17, 2021

[WIP] Function to copy derivatives #171

Merged

10 tasks

Remi-Gau closed this as completed Apr 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initialize derivative data set with a copy of the raw data #78

Initialize derivative data set with a copy of the raw data #78

Remi-Gau commented Oct 29, 2020 •

edited

Remi-Gau commented Feb 15, 2021

Remi-Gau commented Feb 15, 2021

Remi-Gau commented Feb 15, 2021

Remi-Gau commented Feb 15, 2021

Initialize derivative data set with a copy of the raw data #78

Initialize derivative data set with a copy of the raw data #78

Comments

Remi-Gau commented Oct 29, 2020 • edited

Remi-Gau commented Feb 15, 2021

Remi-Gau commented Feb 15, 2021

Remi-Gau commented Feb 15, 2021

Remi-Gau commented Feb 15, 2021

Remi-Gau commented Oct 29, 2020 •

edited