"Idealized" versus "good-enough" processing stream #26

jdkent · 2019-05-01T04:42:56Z

I'm curious what would be considered an "idealized" reproducible processing stream, and what is a "good enough" reproducible processing stream, and identify the tools/skills needed to complete a "good enough" reproducible analysis. I have some hypothesized steps and some tools listed to complete those steps.

Sparse Learner's Profile

Starting from the top where a PI (or someone) hands you a bunch of dicoms and asks you get subcortical volumes from the structural scans (but there are other currently irrelevant dicoms as well). The PI also wants to be able to run your analysis and wants the data to be publicly available (assuming all IRB/data sharing agreements are satisfied)

An Idealized Processing Pipeline

I imagine we would be using datalad to record all our data/code/processing steps, and always be using/developing containers from the beginning. I'm not exactly sure where/how to place NIDM annotations of data/results or what tool I should use (PyNIDM?).

Good Enough Processing Pipeline

Removed datalad from the processing stream, removed testing, removed niflows, but still want to use desired software from within a container.

search through and find the relevant dicoms
- nibabel
- afni
convert the dicoms to nifti file format named to the BIDS standard
- heudiconv (via docker/singularity)
deface and rename the files
- pydeface (via docker/singularity)
- shell
write a script that calculates subcortical volumes
- shell
- fsl
- datalad
place the script in a container with all the requisite software installed
- neurodocker
upload the container to a hub (docker and/or singularity)
- docker
- singularity
run the script on the data and output data in a derivatives directory
- docker
- singularity
upload the BIDS organized nifti files to some online database
- openneuro
upload the code/outputs to an online repository and link to what containers you used
- git
- github

I would like feedback on both the "Idealized" and "Good Enough" analyses since I am not as knowledgeable as I would like to be on designing processing pipelines. I may not be most up to date on what are the hot/new tools versus what will get the job done.

Once we pin what we would like workshop attendees to be able to do (and hopefully this matches with what they wish to do as well), then I think we will have an easier time elucidating necessary skills and modifying episodes to make sure they help build these skills.

yarikoptic · 2019-05-01T14:33:20Z

A fun exercise, thanks! Very much inline with our now elderly https://github.com/ReproNim/simple_workflow container of which I have now reused locally for "a script that calculates subcortical volumes" ;) It is also well aligns with the http://www.repronim.org/5steps .
Instead of a hard split between the two (Idealized/Good enough) it might be better to annotate the steps in the full list with some kind of score of "importance" for reproducibility. We could also imagine that there could be a precrafted workflow (e.g. that simple_workflow, just generalized) which takes care about consuming bunch of dicoms, and performing all actions as a "unitary step" so then particular inner steps might not be any longer relevant but still reflected in the result.

As for particulars, I think a custom heudiconv heuristic could perform the "search" and conversion. So overall a simplified, datalad-centric workflow could be something like

datalad create analysis-for-the-pi; cd analysis-for-the-pi
datalad create -d . sourcedata && cp ALL_DICOMS sourcedata/
datalad install -d . https://github.com/ReproNim/containers/
workout heuristic for heudiconv under code/heudiconv-heuristic.py
datalad create -d . -c bids bids # -c bids is coming with 0.12 release of datalad and datalad-neuroimaging some time soonish
datalad create -d . -c text2git results
datalad containers-run -n containers/heudiconv -f code/heudionv-heuristic -o bids --files sourcedata (TODO - container: add repronim/ and other additional commonly used images containers#2)
Deface! apparently there is no "official" bids-app yet, but there is a number of defacers available, thus TODO - streamline (bids-app, container etc)
datalad containers-run -n containers/simple_workflow -i bids -o results + whatever params it consumes (TODO - container: add repronim/ and other additional commonly used images containers#2)
when all is good, look into upload to wherever (datalad create-sibling*, datalad publish) ;)

satra · 2019-05-01T15:08:58Z

@jdkent - continuing on the datalad theme, one place where the nidm model could be integrated is how datalad stores the input, process, output relationships. or as an exporter from the git log.

regarding the workflows themselves, reproducibility would come from making them niflows, as you started with the simple1 example.

more broadly, the same data typically gets used for many experiments. different pieces are used to test different hypotheses. thus the graph model of data does make a lot of sense.

perhaps the idealized to good enough spectrum can be refactored a bit through the lens of the goal of the workflow. highlighting points where things can make a difference. as an example, there is a piece of software that kevin (in my group) is using that only works if the dicoms are converted via spm rather than dcm2niix.

This was referenced May 3, 2019

Publish to SingularityHub nipreps/fmriprep#1122

Closed

Prototypical workflow #1 ReproNim/containers#7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Idealized" versus "good-enough" processing stream #26

"Idealized" versus "good-enough" processing stream #26

jdkent commented May 1, 2019

yarikoptic commented May 1, 2019

satra commented May 1, 2019

"Idealized" versus "good-enough" processing stream #26

"Idealized" versus "good-enough" processing stream #26

Comments

jdkent commented May 1, 2019

Sparse Learner's Profile

An Idealized Processing Pipeline

Good Enough Processing Pipeline

yarikoptic commented May 1, 2019

satra commented May 1, 2019