-
Notifications
You must be signed in to change notification settings - Fork 143
Modify the batch and slurm scripts to support additonal per-subject parameters #434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
35dacc0 to
06a1c76
Compare
|
The PR is tested and can be reviewed now. |
|
Thanks for the PR, it's working well out of the box now.
|
|
I believe I found a bug - the surface pipeline can not run, because --parralel_subjects is passed to brun_fastsrufer.sh without argument. Line 669 in 0d4dbb5
|
|
--parallel_subjects is supposed to be a valid setting and equal to max. With regards to your earlier comments.
|
|
There seems to be a not missing Line 148 in 0d4dbb5
|
|
I am a bit surprised about the zombification of jobs though. brun_fastsurfer is supposed to always return 0 (success) Line 466 in 0d4dbb5
I'll have to read up on the slurm docs, whether a non-zero return code for a child process supercedes a zero return code of the main process. |
I think this is broken atm, I can keep an eye on this when I run some batches with failing subjects and let you know what happens currently. IIRC the jobs stayed IDLE indefinetly with "dependency not met" (zombie processes), but I didn't look into it.
The issue is not that files used by the script cant be modified, but that the script esentially blocks the HCP work directory, so no other scripts can use it. (e.g. what if I want to also run freesurfer on slurm at the same time as fastsrufer). Why not only check and delete the folders that are actually used? |
|
For proper scheduling between Jobarrays, there needs to be a dependency between in corresponding seg and surf tasks. |
|
In case this pops up during testing, just FYI - I ran into an issues on line Line 438 in 0d4dbb5
where there seemed to be some issue with the formatting of the srun commend. I did a quick and dirty workaround by building the command into a string and calling eval on it. |
I don't particularly know what you are running into, but #438 identifies an issue in exactly that line, |
|
Currently, the default option for Also, the cleanup job should just delete cases, logs, scripts, and images, not *. |
|
I can confirm that slurm inherits the return code from job steps, i.e. if a srun call fails, for example a surface or a segmentation command, this interrupts the whole processing pipeline. This can be avoided by manually setting the derived return code with It would probably be best to have a verbose error message at the end of the script, but always return 0, so proceeding tasks are always executed. |
fe10262 to
d5e725b
Compare
613070d to
235e422
Compare
235e422 to
4824f38
Compare
- switch subject_list_delim separator to = from : (more consistency with brun_fastsurfer and its subject_list format) brun_fastsurfer.sh - add capability to add more fastsurfer parameters in the subject_list file stools.sh - adapt the awk script in so that additional parameter paths in additional parameters are also mapped to the singularity container
- fix debug printing of parallel_subjects - fix --parallel_subjects surf= parsing, max - add subject_id to all parallel processing - fix debug output of FastSurfer parameters
- copy stools.sh script
- change the (very expert) call interface of --subject_list_awk_code to --subject_list_awk_code_sid and --subject_list_awk_code_args. stools.sh - fix translate_cases according to the split of the awk_code snippets - fix variable escaping in translate_cases awk code
- add debug output for the translated subjects stools.sh - fix awk code snippets
- Add redundant parameter names --subjets_list* for all --subject_list* parameters
- fix the run_fastsurfer variable - add a slurm_task_id option in --batch to fix the --jobarray functionality - simplify the --parallel_subjects argument parsing - comment parallel_surf == true ==> seg_surf_only == "" - fix the argument setting of --statusfile - fix double usage of variable i, and make sure it gets incremented each loop (do this first) - remove quotes around $seg_surf_only, so no empty arguments get passed to run_fastsurfer.sh srun_fastsurfer.sh - Add doc to --work (should be empty and 'owned by this script') - add comments for $delete_hpc_work - make sure the slurm segmentation script returns exit code 0 - reduce default values for memory allocation - add log and debug functions - add a log of the slurm submit script (run_fastsurfer) to the logs - add the --batch argument to the brun_fastsurfer call where necessary stools.sh - remove explicit folders in hpc_work, not * - exit if not --sd is defined in check_out_dir - add logfile to both make_cleanup_job and make_copy_job run_fastsurfer.sh - do not crash on empty arguments (just ignore) - fix some formatting
- change the delimiter between t1 and additional args to <space>, to make it mroe consistent srun_fastsurfer.sh - fix cleanup, if redoing cases, mv fails because it does not replace data/folders, so we ask if folders should be replaced and then copy (cp + rm -R) stools.sh - check_cases_in_out_dir : checks if subject data already exists ... to ask if it should be overwritten - make_cleanup_job : added cleanup mode: mv or cp - export cleanup_jobid and copy_jobid
- cleanup the initial log text brun_fastsurfer.sh - streamline IFS - make subject_list work with different newline characters - use array to store subjects rather than a string - fix default values of task_id and task_count (now not specified/not specified) - clean up the for loop (no only iterates through the batch not all) srun_fastsurfer.sh - add parameters for mem -
update time limit for surface pipeline (1mm can take more than 1h on 1 core)
4824f38 to
cafe4fd
Compare
A current limitation of both srun_fastsurfer and brun_fastsurfer is that it is not possible to define additional subject-specific parameters to run_fastsurfer.sh.
This PR adds the feature to add such options to both srun_fastsurfer and brun_fastsurfer.
In particular, brun_fastsurfer.sh can have additional parameters (format =,). This also passed through to srun_fastsurfer, for example:
Example 1:
--subject_list_delim "," --subject_list_awk_code_args '\$2 ",--vox_size," \$4'to implement from the subject_list line
subject-101,raw/T1w-101A.nii.gz,study-1,0.9to
--sid subject-101 --t1 <data-path>/raw/T1w-101A.nii.gz --vox_size 0.9Example 2:
--subject_list_delim "," --subject_list_awk_code '\$2' --subject_list_awk_code_args '\$4,--asegdkt_segfile,\$3'to implement from the subject_list line
4,sub_A1,mri/asegdkt.mgz,412463-t1.nii.gzto
--sid sub_A1 --t1 <data-path>/412463-t1.nii.gz --asegdkt_segfile mri/asegdkt.mgz