Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize derivative data set with a copy of the raw data #78

Closed
Remi-Gau opened this issue Oct 29, 2020 · 4 comments · Fixed by #171
Closed

Initialize derivative data set with a copy of the raw data #78

Remi-Gau opened this issue Oct 29, 2020 · 4 comments · Fixed by #171
Labels
enhancement New feature or request

Comments

@Remi-Gau
Copy link
Collaborator

Remi-Gau commented Oct 29, 2020

There are a few circumstances (see here under) where I would like to initialize the derivatives with a copy, complete or partial, of the original full BIDS dataset. I have been working on a prepare_derivatives.m for that specific purpose with some features, such as:

  • copy the data of all or a sub-list of subjects;
  • if there are multiple sessions, copy the data from all or only some sessions;
  • copy all or a sub-list of modalities, and for func data possibly all or some tasks;
  • unzip the .nii.gz files and 4D-to-3D unpacking the func and dwi images, if requested;
  • decide where that derivatives/tool subfolder will be created, i.e. not necessarily next to the "raw" data in the BIDS folder.

Why would I want to do that? Well because, it is convenient to work on a subset of data and/or in a "sandpit", for example

  • to develop a processing pipeline with 5 subjects instead of 100 or simply to keep the subjects that are kept for the specific analysis, i.e. leave out some "outliers"
  • analysis can focus on a single session for some question and therefore the processing will be different from that combing multiple sessions
  • analysis can also focus on a single modality (or task for func data), therefore no need to bring along everything
  • some operations in SPM's standard special processing pipelines, e.g. realignment and coregistration, do change the header of the images, so they ought to be in a derivatives subfolder before any processing. Moreover SPM and some other tools, would prefer dealing with a series of 3D images rather than a single 4D volume, e.g. ditching the 1st few functional images into a "dummy" subfolder.
  • if the dataset is saved on an external storage place (like the centralized "mass-storage" system at ULiège), one cannot directly work on this server and therefore data need to be copied locally before any processing is applied, i.e. I might as well call this my derivatives\some_tool_or_step data and start working on these. Afterwards, it is straightforward to copy that derivatives folder back on the external storage place.

Originally posted by @ChristophePhillips in #60 (comment)

  • possibility to copy part of the raw only by filtering by subject, session, modality, task
  • possibility to unzip and split 4D into 3D (FYI: Rémi is not in favor of this. 😉 )
  • this derivative folder can be anywhere relative to the raw data but make the default path follow one of the recommended ways to store derivatives with respec to the raw (see BIDS specs here
  • initialize a dataset_description.json in the root of that derivatiive folder

See comments:


In other repos:

@Remi-Gau
Copy link
Collaborator Author

I have also something similar in our lab pipeline though yours seem to have more features. I also know that spmup by @CPernet has something that does some of that.

So that seems like one the obvious low hanging fruit !!

@Remi-Gau
Copy link
Collaborator Author

Copied from original issue

@ChristophePhillips: spm_copy and spm_mkdir are tools to help you for what you want to do:
https://en.wikibooks.org/wiki/SPM/BIDS#Formatting_datasets_into_BIDS
that we used here:
https://github.com/spm/MultimodalScripts/blob/master/code/scripted/master_script.m#L49-L76

@Remi-Gau
Copy link
Collaborator Author

Actually had opened an issue on one of our repo to get spm_mkdir and spm_copy out of spm. So I would heart to have them as part of bids matlab.

One headache to keep in mind: datasets curated with datalad have their content stored with git annex (I need to finish a PR about that on the datalad handbook). So a simple call to copyfile will not follow the symbolic link and you just end up with just a bunch of broken links.

2 options:

the user must make sure they have run datalad unlock the files to copy
try a system call to cp -L and catch with copyfile if it fails (this is the hacky way of doing things we are currentty using: see here

The second option is going to make Windows users cry though... If they use datalad: no a huge user base at the moment but that too could grow.

Any other ways around this that would make everyone happy?

@Remi-Gau
Copy link
Collaborator Author

Origianlly posted by @gllmflndn

I think we should make a system() call only out of necessity. We could test for symlinks within a isunix condition and only use cp -L for these?

@Remi-Gau Remi-Gau linked a pull request Feb 19, 2021 that will close this issue
7 tasks
@Remi-Gau Remi-Gau linked a pull request Apr 17, 2021 that will close this issue
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
1 participant