Initial sketch for the mriqc/fmriprep singularity based workflow #438

yarikoptic · 2019-07-18T18:01:01Z

An initial local attempt was slowed down by WTF of ReproNim/containers#23

Preprint which inspired the fmriprep workflow here is http://dx.doi.org/10.1101/694364

Edit: an example of prototypical use of reproman for mriqc is now formalized in README: https://github.com/ReproNim/reproman#step-2-create-analysis-datalad-dataset-and-run-computation-on-aws-hpc2 . I will keep this PR open until we furnish a full workflow which would also include fmriprep (thus handling secret license for matlab) and then some linear modeling

codecov · 2019-07-18T18:20:53Z

Codecov Report

Merging #438 into master will increase coverage by 0.00%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #438   +/-   ##
=======================================
  Coverage   89.66%   89.66%           
=======================================
  Files         148      148           
  Lines       12388    12389    +1     
=======================================
+ Hits        11108    11109    +1     
  Misses       1280     1280

Impacted Files	Coverage Δ
reproman/interface/run.py	`100.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d47e38b...52c19fc. Read the comment docs.

yarikoptic · 2019-08-01T18:19:35Z

docs/usecases/bids-fmriprep-workflow-NP.sh

+
+# Sample run without any parallelization, and doing both levels (participant and group)
+reproman run --follow -r "${RM_RESOURCE}" --sub "${RM_SUB}" --orc "${RM_ORC}" \
+		 --jp container=containers/bids-mriqc data/bids data/mriqc participant,group


so the original, datalad command would be

datalad containers-run -n containers/bids-mriqc data/bids data/mriqc participant,group

TODO : add inputs/outputs specification

yarikoptic · 2019-08-01T18:21:05Z

docs/usecases/bids-fmriprep-workflow-NP.sh

+#  - datalad-container
+
+RM_RESOURCE=smaug
+RM_SUB=condor


so for local execution it could be

RM_RESOURCE=localshell RM_SUB=local

…docs on howto

kyleam · 2019-08-06T18:41:55Z

docs/usecases/bids-fmriprep-workflow-NP.sh

+reproman run --follow -r "${RM_RESOURCE}" --sub "${RM_SUB}" --orc "${RM_ORC}" \
+  --bp 'thing=thing-*' \
+  --input '{p[thing]}' \
+  sh -c 'cat {p[thing]} {p[thing]} >doubled-{p[thing]}'


With the latest push to run-subjobs (ac14277) checked out, try

reproman run --follow -r "${RM_RESOURCE}" --sub "${RM_SUB}" --orc "${RM_ORC}" \ --jp container=containers/bids-mriqc \ --bp 'pl=02,13' \ --input data/bids \ data/bids data/mriqc participant --participant_label '{p[pl]}'

I was able to get that [*] to successfully run via condor on smaug. As you've already experienced, the management of existing datasets is a bit rough, so you may want to use a fresh dataset.

[*] Or more specifically, this script:

script

#!/bin/sh set -eu cd $(mktemp -d --tmpdir=. ds-XXXX) datalad create -c text2git . datalad install -d . ///repronim/containers datalad install -d . -s https://github.com/ReproNim/ds000003-demo data/bids mkdir licenses/ echo freesurfer.txt > licenses/.gitignore cat > licenses/README.md <<EOF Freesurfer ---------- Place your FreeSurfer license into freesurfer.txt file in this directory. Visit https://surfer.nmr.mgh.harvard.edu/registration.html to obtain one if you don't have it yet - it is free. EOF datalad save -m "DOC: licenses/ directory stub" licenses/ datalad create -d . data/mriqc reproman run --resource sm --follow \ --sub condor --orc datalad-pair-run \ --jp container=containers/bids-mriqc --bp 'pl=02,13' \ -i data/bids \ data/bids data/mriqc participant --participant_label '{p[pl]}'

Unfortunately initial run has failed with 2019-08-15 14:32:13,311 [INFO ] Waiting on job 1848: running 2019-08-15 14:32:23,478 [INFO ] Fetching results for 20190815-142046-33ea 2019-08-15 14:35:51,720 [INFO ] Creating run commit in /home/yoh/proj/repronim/reproman-master/docs/usecases/bids-fmriprep-workflow-NP/out7 2019-08-15 14:36:06,509 [INFO ] Unregistered job 20190815-142046-33ea + reproman_run --jp container=containers/bids-mriqc --input data/bids --output data/mriqc "{inputs}" "{outputs}" group + reproman run --follow -r smaug --sub condor --orc datalad-pair-run --jp container=containers/bids-mriqc --input data/bids --output data/mriqc "{inputs}" "{outputs}" group 2019-08-15 14:36:10,588 [INFO ] No root directory supplied for smaug; using "/home/yoh/.reproman/run-root" [INFO ] Publishing <Dataset path=/home/yoh/proj/repronim/reproman-master/docs/usecases/bids-fmriprep-workflow-NP/out7/data/mriqc> to smaug ECDSA host key for IP address "129.170.233.9" not in list of known hosts. [INFO ] Publishing <Dataset path=/home/yoh/proj/repronim/reproman-master/docs/usecases/bids-fmriprep-workflow-NP/out7> to smaug [ERROR ] failed to push to smaug: master -> smaug/master [rejected] (non-fast-forward); pushed: ["d145d97..97de059"] [publish(/home/yoh/proj/repronim/reproman-master/docs/usecases/bids-fmriprep-workflow-NP/out7)] 2019-08-15 14:36:59,238 [ERROR ] "datalad publish" failed. Try running "datalad update -s smaug --merge --recursive" first [orchestrators.py:prepare_remote:792] (OrchestratorError) CONTAINERS_REPO=~/proj/repronim/containers INPUT_DATASET_REPO= 70.57s user 22.71s system 9% cpu 16:50.41 total and stderr.1 on remote end showed that tar failed to find some output file: $> tail -n 3 stderr.1 tar: ./work/workflow_enumerator/anatMRIQCT1w/ComputeIQMs/_in_file_..home..yoh...reproman..run-root..44671e06-bf85-11e9-95c1-8019340ce7f2..data..bids..sub-02..anat..sub-02_T1w.nii.gz/ComputeQI2/_0x9713a172faade86794f9c56a3080a44e_unfinished.json: Cannot stat: No such file or directory tar: ./work/workflow_enumerator/anatMRIQCT1w/ComputeIQMs/_in_file_..home..yoh...reproman..run-root..44671e06-bf85-11e9-95c1-8019340ce7f2..data..bids..sub-02..anat..sub-02_T1w.nii.gz/ComputeQI2/error.svg: Cannot stat: No such file or directory tar: Exiting with failure status due to previous errors

yarikoptic · 2019-08-16T15:21:10Z

@kyleam I have reran the script and got the same failure due to tar -- could you please confirm that you get the same?

kyleam · 2019-08-16T16:06:36Z

@kyleam I have reran the script and got the same failure due to tar -- could you please confirm that you get the same?

Sure, I'll give it a try this afternoon.

kyleam · 2019-08-16T19:29:34Z

could you please confirm that you get the same?

Sure, I'll give it a try this afternoon.

I've triggered it, though I don't yet have an explanation of what's going on.

After a command completes, it writes to "status.$subjob". If, after completing its command, a subjob sees that the status files for all the other subjobs are in, it claims responsibility for the post-processing step. For the datalad-run orchestrators, post-processing includes calling `find` to get a list of newly added files and then calling `tar` with these files as input. Given that the above procedure waits until each command exits, the hope is that all the output files are created and any temporary files will have been cleaned up. But we're hitting into cases [*] where apparently intermediate files are present for the `find` call but gone by the time `tar` is called. This leads to `tar` exiting with a non-zero status and the post-processing being aborted. Until someone has a better idea of how to deal with this, instruct `tar` to exit with zero even if an expected file isn't present. This allows post-processing to succeed and the incident will still show up in the captured stderr. [*] ReproNim#438 (comment)

kyleam · 2019-08-16T21:10:57Z

I've triggered it, though I don't yet have an explanation of what's going on.

Hmm, with several attempts, I was able to trigger the failure only once. Looking at the successful runs, the togethome file does not include the files that tar is complaining about in the failed runs. The only explanation I have for that is that these are temporary files that, based on the timing of things, might end getting removed between the find ... >togethome call and the tar call. I've submitted gh-451 as a workaround.

script

For completeness: In all the above tries, I was using this script, which is a stripped-down version of 5b95ded:docs/usecases/bids-fmriprep-workflow-NP.sh.

set -eu

cd $(mktemp -d --tmpdir=. ds-XXXX)
datalad create -c text2git .
datalad install -d . ///repronim/containers
datalad install -d . -s https://github.com/ReproNim/ds000003-demo data/bids

mkdir licenses/
echo freesurfer.txt > licenses/.gitignore
cat > licenses/README.md <<EOF

Freesurfer
----------

Place your FreeSurfer license into freesurfer.txt file in this directory.
Visit https://surfer.nmr.mgh.harvard.edu/registration.html to obtain one if
you don't have it yet - it is free.

EOF
datalad save -m "DOC: licenses/ directory stub" licenses/

datalad create -d . -c text2git data/mriqc

reproman run --resource sm --follow \
         --sub condor --orc datalad-pair-run \
         --jp container=containers/bids-mriqc --bp 'pl=02,13' \
         -i data/bids -o data/mriqc \
         '{inputs}' '{outputs}' participant --participant_label '{p[pl]}'

yarikoptic · 2019-08-20T20:47:03Z

The only explanation I have for that is that these are temporary files that, based on the timing of things, might end getting removed between the find ... >togethome call and the tar call.

indeed probably with NFS etc we could see even more of such usecases. But I am still wondering what is exactly happening here besides may be condor returning "complete" job status before it (and all kids) actually finished, and either we wouldn't miss some results if we rush into collecting/tarring them up. May be adding some fuser call to check if any process is still holding on that output path or alike. I will try to look into it when I get a moment.

kyleam · 2019-08-20T21:02:18Z

indeed probably with NFS etc we could see even more of such usecases. But I am still wondering what is exactly happening here

I am still wondering too :)

besides may be condor returning "complete" job status before it (and all kids) actually finished

This status isn't coming from condor. Its creation is chained after the run of the command:

reproman/reproman/support/jobs/job_templates/runscript/base.template.sh

Lines 43 to 46 in c247acf

    
           /bin/sh -c "$cmd" && \ 
        
               echo "succeeded" >"$metadir/status.$subjob" || \ 
        
               (echo "failed: $?" >"$metadir/status.$subjob"; 
        
                mkdir -p "$metadir/failed" && touch "$metadir/failed/$subjob")

…tend how to deal with freesurfer license

fixed typo in ".../licenses/..."

also uses new freeze_version --save-dataset helper from containers etc

…ork now!)

yarikoptic · 2019-12-20T19:16:02Z

Ran current state (v0.2.1-43-ga81e457) on smaug as

FS_LICENSE=~/.freesurfer-license RUNNER=datalad ./bids-fmriprep-workflow-NP.sh bids-fmriprep-workflow-NP/out4

and it finished nicely (although there was a puke from fmriprep's atexit about "OSError: handle is closed" which I guess was just ignored), and everything looked kosher BUT "data/fmriprep//fmriprep/dataset_description.json" was added to git-annex not git! It is strange. The whole "data/fmriprep" was initiated with -c txt2git and it seemed to work -- .tsv etc were committed to git but all jsons to annex. It reminded me about https://git-annex.branchable.com/bugs/manages_to_incorrectly_add_to_annex_instead_of_git_based_on___34__mimetype__34___-_we_cannot_figure_it_out_why/ which Joey "done, the magic database changing behavior is not a bug in git-annex". I naively had wrong impression that this issue was actually fixed somehow properly.
Well -- on libmagic side -k was indeed fixed around then, so it is possible to obtain multiple mimetypes, and the 2nd one for .json is text:

$> file --mime-type -Lk 1.json
1.json: application/json\012- text/plain

yarikoptic · 2019-12-20T23:24:16Z

oh hoh -- that is a good old datalad/datalad#3361 which we didn't really address although git annex provided a way already :-/

yarikoptic

Left a comment that it was a workaround

yarikoptic · 2020-01-14T23:31:39Z

docs/usecases/bids-fmriprep-workflow-NP.sh

@@ -222,6 +222,7 @@ containers/scripts/freeze_versions --save-dataset=^ \
    poldracklab-ds003-example=0.0.3 \
    bids-mriqc=0.15.0 \
    bids-fmriprep=1.4.1
+( cd containers/ ; git clean -dfx ; )


Note: this is a temporary workaround to remove .datalad/config-e created by above freeze_versions script. Proper solution sounds fix that script to stay compatible with osx.

This reverts commit 5ab5a8b.

…se etc

… remove not needed set -eu in function

@yarikoptic

When running the usecase example ReproNimgh-438, the working tree is coming up as unexpectedly dirty, including the change of status files from empty to "succeed", apparently indicating that the content of the file is not flushed at the time of the 'datalad add/save' call. To prevent saving before the content is flushed, don't count a status file unless it has the content. Modified-by: Kyle Meyer <kyle@kyleam.com> Add commit message body, and dropped `find` call, as suggested by @yarikoptic.

* origin/master: BF: runscript - only consider statuses for success/failure Fix runscript regexp to work on Mac OS Conflicts: reproman/support/jobs/job_templates/runscript/base.template.sh - took master version since it incorporated Chris' fixup for OSX

yarikoptic

Thanks!

Mac sed doesn't handle tabs as linux does, so we start by converting the TSV file to CSV, then process from there.

For some reason otherwise it immediately fails on smaug if I do not do that ... Installing setuptools, pkg_resources, pip, wheel...done. > source venv3/bin/activate >> deactivate nondestructive >> unset -f pydoc >> '[' -z '' ']' >> '[' -z '' ']' >> '[' -n /bin/bash ']' >> hash -r >> '[' -z '' ']' >> unset VIRTUAL_ENV >> '[' '!' nondestructive = nondestructive ']' >> VIRTUAL_ENV=/home/yoh/.tmp/rm-TIAZarc/venv3 >> export VIRTUAL_ENV >> _OLD_VIRTUAL_PATH=/home/yoh/bin:/home/yoh/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/games:/sbin:/usr/sbin:/ usr/local/sbin >> PATH=/home/yoh/.tmp/rm-TIAZarc/venv3/bin:/home/yoh/bin:/home/yoh/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/games:/sbin:/usr/sbin:/usr/local/sbin >> export PATH >> '[' -z '' ']' >> '[' -z '' ']' venv3/bin/activate: line 57: PS1: unbound variable >>> echo Finished for setup=pip under PWD=/home/yoh/.tmp/rm-TIAZarc Finished for setup=pip under PWD=/home/yoh/.tmp/rm-TIAZarc

…gure local resource

locally had 0.12.6 and it might be interfering

…) into doc-usecases * origin/master: (21 commits) MNT: setup.py: Add optional DataLad dependency ENH: Raise a dedicated JobError if any subjob fails ENH: run: Specify constraints for --sub and --orc ENH: Submitter.follow - a smart --follow sleeping/reporting ENH: runcsript - print "Waiting" once and pool each second ENH: runscript - remove no longer necessary and anyways ineffective sleep 1 ENH: skip with a warning a job which fails to render due to a missing key BF: orchestrators: Call create-sibling unconditionally ENH: orchestrators: Provide access to datalad-publish results ENH: orchestrators(datalad-pair): Do a more targeted update ENH: orchestrators: Make head_at() helper abort on submodule changes BF: orchestrators: Fetch with --recurse-submodules=no BF: orchestrators: Initialize submodules ENH: orchestrators: Meld handling of local and SSH sessions RF: test_orchestrators: Use DataLad's repo.call_git() MNT: orchestrators: Drop warning about an unsupported DataLad version RF: orchestrators: Adjust datalad run imports RF: orchestrators: Prune now unneeded SSH URL compatibility kludge RF: run: Use 'datalad save' instead of 'datalad add' RF: orchestrators: Check whether datalad >=0.13 is installed locally ...

…ly (not as {inputs}) That should allow to upload all needed licenses to the submitting host

yarikoptic added 2 commits July 18, 2019 14:00

Initial sketch for the mriqc/fmriprep singularity based workflow

8f4469e

DOC: added a few comments

de0442f

DOC: note on execution of mriqc

a18521d

yarikoptic commented Aug 1, 2019

View reviewed changes

yarikoptic added 2 commits August 6, 2019 11:11

ENH: make possible to quickly switch from reproman to datalad + some …

92388ed

…docs on howto

ENH: text2git for mriqc output

1588f76

kyleam reviewed Aug 6, 2019

View reviewed changes

yarikoptic mentioned this pull request Aug 8, 2019

"[ERROR ] Remote repository ... is dirty" should may be checked earlier? #447

Closed

kyleam mentioned this pull request Aug 16, 2019

BF: job_templates: Call tar with --ignore-failed-read #451

Closed

yarikoptic added 2 commits August 21, 2019 15:09

another perspective one on kwyk

79c3a99

ENH/RF: shellcheck, group common bids-app logic into run_bids_app, ex…

57b6b6c

…tend how to deal with freesurfer license

yarikoptic requested a review from chaselgrove December 5, 2019 19:38

chaselgrove and others added 6 commits December 12, 2019 14:18

moved datalad install containers before working in containers subdir

fedb8b0

fixed typo in ".../licenses/..."

Merge branch 'master' into doc-usecases

dc5dc0b

RF: compose proper call for fmriprep, inline querying participant labels

76450af

RF: reordered commands so settings come first and then all the actions

850b36f

also uses new freeze_version --save-dataset helper from containers etc

BF+RF: improve handling of fs license, more TODO comments (seems to w…

7110ec6

…ork now!)

ENH: do not create bids app results dataset if directory exists already

a81e457

clean the containers repo after freezing versions

5ab5a8b

yarikoptic commented Jan 14, 2020

View reviewed changes

chaselgrove added 3 commits January 28, 2020 15:27

Revert "clean the containers repo after freezing versions"

7ae46ab

This reverts commit 5ab5a8b.

Merge branch 'master' into doc-usecases

a4af6ba

Fix runscript regexp to work on Mac OS

5a2f0f3

kyleam added a commit that referenced this pull request May 7, 2020

Merge pull request #438 from yarikoptic/doc-usecases

cd1f4f0

yarikoptic added 4 commits May 21, 2020 13:55

ENH: always set -x, add env vars to not require patching for FS licen…

b70144e

…se etc

Comments on which images must be prepropulated in containers repo and…

82df248

… remove not needed set -eu in function

ENH: Script for reproducible rerun of the demo script

6009dd0

kyle1-ps4 setup

295d9d2

yarikoptic and others added 2 commits May 25, 2020 19:16

BF: Fix failure on unset BIDS_APP with set -u

8905cc4

yarikoptic commented May 26, 2020

View reviewed changes

chaselgrove and others added 5 commits May 27, 2020 10:27

BF: Add mac workaround in get_participant_ids

cffa3a3

Mac sed doesn't handle tabs as linux does, so we start by converting the TSV file to CSV, then process from there.

ENH: Update parallel install message for mac users

85a41dc

ENH: Use temporary HOME, cp .gitconfig and .freesurfer-license, confi…

75fa815

…gure local resource

reproman-master setup for -reproduce and min datalad 0.12.7

7920234

locally had 0.12.6 and it might be interfering

yarikoptic mentioned this pull request May 27, 2020

Report on running #438 usecases with datalad 0.13.0rc1 and "local" resource #511

Closed

yarikoptic added 5 commits May 27, 2020 22:47

master reproman now has [datalad] installation target

29500a7

ENH: add containers/licenses into --input, specify data/bids explicit…

d895cc4

…ly (not as {inputs}) That should allow to upload all needed licenses to the submitting host

DOC: note that datalad runner group analysis probably does nothing

e28d011

ENH: point to subject specific input data

795eed8

This was referenced May 29, 2020

reproman rerun #525

Closed

reproman rerun #458

Open

Merge remote-tracking branch 'yarik/doc-usecases' into doc-usecases

52c19fc

yarikoptic mentioned this pull request Oct 8, 2020

DOC: provide a complete example walk-though for a typical bids-app case (e.g. on mriqc) #548

Open

yarikoptic marked this pull request as draft February 15, 2023 20:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial sketch for the mriqc/fmriprep singularity based workflow #438

Initial sketch for the mriqc/fmriprep singularity based workflow #438

yarikoptic commented Jul 18, 2019 •

edited

Loading

codecov bot commented Jul 18, 2019 •

edited

Loading

yarikoptic Aug 1, 2019

yarikoptic Aug 1, 2019

kyleam Aug 6, 2019

yarikoptic commented Aug 16, 2019

kyleam commented Aug 16, 2019 via email

kyleam commented Aug 16, 2019 •

edited

Loading

kyleam commented Aug 16, 2019

yarikoptic commented Aug 20, 2019

kyleam commented Aug 20, 2019 •

edited

Loading

yarikoptic commented Dec 20, 2019

yarikoptic commented Dec 20, 2019

yarikoptic left a comment

yarikoptic Jan 14, 2020

yarikoptic left a comment

Initial sketch for the mriqc/fmriprep singularity based workflow #438

Are you sure you want to change the base?

Initial sketch for the mriqc/fmriprep singularity based workflow #438

Conversation

yarikoptic commented Jul 18, 2019 • edited Loading

codecov bot commented Jul 18, 2019 • edited Loading

Codecov Report

yarikoptic Aug 1, 2019

Choose a reason for hiding this comment

yarikoptic Aug 1, 2019

Choose a reason for hiding this comment

kyleam Aug 6, 2019

Choose a reason for hiding this comment

yarikoptic commented Aug 16, 2019

kyleam commented Aug 16, 2019 via email

kyleam commented Aug 16, 2019 • edited Loading

kyleam commented Aug 16, 2019

yarikoptic commented Aug 20, 2019

kyleam commented Aug 20, 2019 • edited Loading

yarikoptic commented Dec 20, 2019

yarikoptic commented Dec 20, 2019

yarikoptic left a comment

Choose a reason for hiding this comment

yarikoptic Jan 14, 2020

Choose a reason for hiding this comment

yarikoptic left a comment

Choose a reason for hiding this comment

yarikoptic commented Jul 18, 2019 •

edited

Loading

codecov bot commented Jul 18, 2019 •

edited

Loading

kyleam commented Aug 16, 2019 •

edited

Loading

kyleam commented Aug 20, 2019 •

edited

Loading