Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in finding parcellation atlases when running pipeline in parallel #1064

Closed
psychelzh opened this issue Feb 25, 2024 · 8 comments · Fixed by #1066 or #1073
Closed

Error in finding parcellation atlases when running pipeline in parallel #1064

psychelzh opened this issue Feb 25, 2024 · 8 comments · Fixed by #1066 or #1073
Labels
bug Issues noting problems and PRs fixing those problems.

Comments

@psychelzh
Copy link
Contributor

psychelzh commented Feb 25, 2024

Summary

Now the output atlases folder is moved out of the subject directory, which is very reasonable. But for some unknown reason, when running pipeline for several subjects in parallel, some sub-tasks will result in an error in finding corresponding parcellation atlas file. The following is crash report:

Node: xcpd_wf.single_subject_SICNU028_wf.cifti_postprocess_11_wf.connectivity_wf.parcellate_reho
Working directory: /seastor/CAMP/tmp/xcpd_wf/single_subject_SICNU028_wf/cifti_postprocess_11_wf/connectivity_wf/parcellate_reho

Node inputs:

atlas = ['/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S1056Parcels/space-fsLR_atlas-4S1056Parcels_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S156Parcels/space-fsLR_atlas-4S156Parcels_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S256Parcels/space-fsLR_atlas-4S256Parcels_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S356Parcels/space-fsLR_atlas-4S356Parcels_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S456Parcels/space-fsLR_atlas-4S456Parcels_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S556Parcels/space-fsLR_atlas-4S556Parcels_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S656Parcels/space-fsLR_atlas-4S656Parcels_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S756Parcels/space-fsLR_atlas-4S756Parcels_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S856Parcels/space-fsLR_atlas-4S856Parcels_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S956Parcels/space-fsLR_atlas-4S956Parcels_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-Glasser/space-fsLR_atlas-Glasser_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-Gordon/space-fsLR_atlas-Gordon_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-HCP/space-fsLR_atlas-HCP_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-Tian/space-fsLR_atlas-Tian_den-91k_dseg.dlabel.nii']
atlas_labels = <undefined>
data_file = /seastor/CAMP/tmp/xcpd_wf/single_subject_SICNU028_wf/cifti_postprocess_11_wf/reho_wf/merge_cifti/reho_converted.dscalar.nii
min_coverage = 0.0
parcellated_atlas = <undefined>

Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/plugins/multiproc.py", line 292, in _send_procs_to_workers
    num_subnodes = self.procs[jobid].num_subnodes()
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 1308, in num_subnodes
    self._get_inputs()
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 1321, in _get_inputs
    self._inputs.trait_set(**old_inputs)
  File "/usr/local/miniconda/lib/python3.10/site-packages/traits/has_traits.py", line 1520, in trait_set
    setattr(self, name, value)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/interfaces/base/traits_extension.py", line 424, in validate
    value = super(MultiObject, self).validate(objekt, name, newvalue)
  File "/usr/local/miniconda/lib/python3.10/site-packages/traits/trait_types.py", line 2699, in validate
    return TraitListObject(self, object, name, value)
  File "/usr/local/miniconda/lib/python3.10/site-packages/traits/trait_list_object.py", line 582, in __init__
    super().__init__(
  File "/usr/local/miniconda/lib/python3.10/site-packages/traits/trait_list_object.py", line 213, in __init__
    super().__init__(self.item_validator(item) for item in iterable)
  File "/usr/local/miniconda/lib/python3.10/site-packages/traits/trait_list_object.py", line 213, in <genexpr>
    super().__init__(self.item_validator(item) for item in iterable)
  File "/usr/local/miniconda/lib/python3.10/site-packages/traits/trait_list_object.py", line 865, in _item_validator
    return trait_validator(object, self.name, value)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/interfaces/base/traits_extension.py", line 330, in validate
    value = super(File, self).validate(objekt, name, value, return_pathlike=True)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/interfaces/base/traits_extension.py", line 135, in validate
    self.error(objekt, name, str(value))
  File "/usr/local/miniconda/lib/python3.10/site-packages/traits/base_trait_handler.py", line 74, in error
    raise TraitError(
traits.trait_errors.TraitError: Each element of the 'atlas_labels' trait of a DynamicTraitedSpec instance must be a pathlike object or string representing an existing file, but a value of '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S1056Parcels/atlas-4S1056Parcels_dseg.tsv' <class 'str'> was specified.


When creating this crashfile, the results file corresponding
to the node could not be found.

Additional details

  • xcp_d version: 0.6.1
  • Docker version: N/A
  • Singularity version: 3.7.0

What were you trying to do?

Running several subjects in parallel.

What did you expect to happen?

Atlases should be okay for all subjects.

What actually happened?

Some subject pipeline cannot find atlas file.

Reproducing the bug

None. The pipeline is set up as the default setting.

@psychelzh psychelzh added the bug Issues noting problems and PRs fixing those problems. label Feb 25, 2024
@tsalo
Copy link
Member

tsalo commented Feb 25, 2024

This seems like it could be stemming from a race condition. Namely, XCP-D might be in the process of writing out a new copy of the atlas in one node while another node is trying to access it. I will look into it tomorrow, but my guess is that we can fix this by checking if the output atlas already exists in whichever node typically writes it out. If the atlas exists, then that node should just not try to write out a new copy at all.

@psychelzh
Copy link
Contributor Author

That is probably correct. I think we can make XCP-D first check if the file exists, and if so, skip writing and use it directly?

@psychelzh
Copy link
Contributor Author

Maybe a related but not the same issue is with the desc-linc_qc.json file. Is it possible to fix it in a similar way?

Node: xcpd_wf.single_subject_TJNU007N_wf.cifti_postprocess_0_wf.qc_report_wf.ds_qc_metadata
Working directory: /seastor/CAMP/tmp/xcpd_wf/single_subject_TJNU007N_wf/cifti_postprocess_0_wf/qc_report_wf/ds_qc_metadata

Node inputs:

acquisition = <undefined>
atlas = <undefined>
base_directory = /seastor/CAMP/derivatives/xcpd_no_gsr
ceagent = <undefined>
check_hdr = True
chunk = <undefined>
cohort = <undefined>
compress = <undefined>
data_dtype = <undefined>
datatype = <undefined>
den = <undefined>
desc = linc
direction = <undefined>
dismiss_entities = ['suffix', 'task', 'measure', 'mode', 'roi', 'atlas', 'tracksys', 'ceagent', 'modality', 'space', 'flip', 'fmap', 'staining', 'tracer', 'chunk', 'mt', 'den', 'hemi', 'inv', 'recording', 'res', 'cohort', 'sample', 'reconstruction', 'desc', 'session', 'part', 'echo', 'acquisition', 'from', 'datatype', 'direction', 'model', 'extension', 'scans', 'proc', 'subject', 'label', 'subset', 'to', 'run']
echo = <undefined>
extension = .json
flip = <undefined>
fmap = <undefined>
from = <undefined>
hemi = <undefined>
in_file = ['/seastor/CAMP/tmp/xcpd_wf/single_subject_TJNU007N_wf/cifti_postprocess_0_wf/qc_report_wf/qc_report/filtered_denoisedqc_bold.json']
inv = <undefined>
label = <undefined>
measure = <undefined>
meta_dict = <undefined>
modality = <undefined>
mode = <undefined>
model = <undefined>
mt = <undefined>
part = <undefined>
proc = <undefined>
reconstruction = <undefined>
recording = <undefined>
res = <undefined>
roi = <undefined>
run = <undefined>
sample = <undefined>
scans = <undefined>
session = <undefined>
source_file = ['/seastor/CAMP/derivatives/fmriprep/sub-TJNU007N/ses-1/func/sub-TJNU007N_ses-1_task-am_dir-PA_run-1_space-fsLR_den-91k_bold.dtseries.nii']
space = <undefined>
staining = <undefined>
subject = <undefined>
subset = <undefined>
suffix = qc
task = <undefined>
to = <undefined>
tracer = <undefined>
tracksys = <undefined>

Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/plugins/multiproc.py", line 344, in _send_procs_to_workers
    self.procs[jobid].run(updatehash=updatehash)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 527, in run
    result = self._run_interface(execute=True)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 645, in _run_interface
    return self._run_command(execute)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 771, in _run_command
    raise NodeExecutionError(msg)
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node ds_qc_metadata.

Traceback:
	Traceback (most recent call last):
	  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/interfaces/base/core.py", line 397, in run
	    runtime = self._run_interface(runtime)
	  File "/usr/local/miniconda/lib/python3.10/site-packages/niworkflows/interfaces/bids.py", line 732, in _run_interface
	    _copy_any(orig_file, str(out_file))
	  File "/usr/local/miniconda/lib/python3.10/site-packages/niworkflows/utils/misc.py", line 288, in _copy_any
	    copyfile(src, dst, copy=True, use_hardlink=True)
	  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/utils/filemanip.py", line 447, in copyfile
	    copyfile(
	  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/utils/filemanip.py", line 386, in copyfile
	    elif posixpath.samefile(newfile, originalfile):
	  File "/usr/local/miniconda/lib/python3.10/genericpath.py", line 100, in samefile
	    s1 = os.stat(f1)
	FileNotFoundError: [Errno 2] No such file or directory: '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/desc-linc_qc.json'

@tsalo
Copy link
Member

tsalo commented Feb 27, 2024

The atlas race condition falls under XCP-D's purview, because I use a custom function to copy the files, but the new error stems from the DerivativesDataSink class imported from Niworkflows. I think the best move forward would be to open an issue in the Niworkflows repo (https://github.com/nipreps/niworkflows) or open a NeuroStars post with the niworkflows tag. Would you be willing to do that?

@psychelzh
Copy link
Contributor Author

psychelzh commented Feb 27, 2024

I have filed a new issue there, but I am not sure if I provide enough details.

@psychelzh
Copy link
Contributor Author

psychelzh commented Feb 28, 2024

It is a little strange. I have tried xcp_d 0.6.2 version containing the patch, but it still gives these errors. Maybe this time it is related to DerivativesDataSink?

Node: xcpd_wf.single_subject_TJNU066N_wf.load_atlases_wf.ds_atlas_metadata
Working directory: /seastor/CAMP/tmp/xcpd_wf/single_subject_TJNU066N_wf/load_atlases_wf/ds_atlas_metadata

Node inputs:

acquisition = <undefined>
atlas = ['4S1056Parcels', '4S156Parcels', '4S256Parcels', '4S356Parcels', '4S456Parcels', '4S556Parcels', '4S656Parcels', '4S756Parcels', '4S856Parcels', '4S956Parcels', 'Glasser', 'Gordon', 'HCP', 'Tian']
base_directory = /seastor/CAMP/derivatives/xcpd_no_gsr
ceagent = <undefined>
check_hdr = False
chunk = <undefined>
cohort = <undefined>
compress = <undefined>
data_dtype = <undefined>
datatype = <undefined>
den = <undefined>
desc = <undefined>
direction = <undefined>
dismiss_entities = ['datatype', 'subject', 'session', 'task', 'run', 'desc', 'space', 'res', 'den', 'cohort']
echo = <undefined>
extension = .json
flip = <undefined>
fmap = <undefined>
from = <undefined>
hemi = <undefined>
in_file = [['/AtlasPack/tpl-fsLR_atlas-4S1056Parcels_dseg.json'], ['/AtlasPack/tpl-fsLR_atlas-4S156Parcels_dseg.json'], ['/AtlasPack/tpl-fsLR_atlas-4S256Parcels_dseg.json'], ['/AtlasPack/tpl-fsLR_atlas-4S356Parcels_dseg.json'], ['/AtlasPack/tpl-fsLR_atlas-4S456Parcels_dseg.json'], ['/AtlasPack/tpl-fsLR_atlas-4S556Parcels_dseg.json'], ['/AtlasPack/tpl-fsLR_atlas-4S656Parcels_dseg.json'], ['/AtlasPack/tpl-fsLR_atlas-4S756Parcels_dseg.json'], ['/AtlasPack/tpl-fsLR_atlas-4S856Parcels_dseg.json'], ['/AtlasPack/tpl-fsLR_atlas-4S956Parcels_dseg.json'], ['/usr/local/miniconda/lib/python3.10/site-packages/xcp_d/data/atlases/tpl-fsLR_atlas-Glasser_dseg.json'], ['/usr/local/miniconda/lib/python3.10/site-packages/xcp_d/data/atlases/tpl-fsLR_atlas-Gordon_dseg.json'], ['/usr/local/miniconda/lib/python3.10/site-packages/xcp_d/data/atlases/tpl-fsLR_atlas-HCP_dseg.json'], ['/usr/local/miniconda/lib/python3.10/site-packages/xcp_d/data/atlases/tpl-fsLR_atlas-Tian_dseg.json']]
inv = <undefined>
label = <undefined>
measure = <undefined>
meta_dict = <undefined>
modality = <undefined>
mode = <undefined>
model = <undefined>
mt = <undefined>
part = <undefined>
proc = <undefined>
reconstruction = <undefined>
recording = <undefined>
res = <undefined>
roi = <undefined>
run = <undefined>
sample = <undefined>
scans = <undefined>
session = <undefined>
source_file = <undefined>
space = <undefined>
staining = <undefined>
subject = <undefined>
subset = <undefined>
suffix = dseg
task = <undefined>
to = <undefined>
tracer = <undefined>
tracksys = <undefined>

Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/plugins/multiproc.py", line 344, in _send_procs_to_workers
    self.procs[jobid].run(updatehash=updatehash)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 527, in run
    result = self._run_interface(execute=True)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 1380, in _run_interface
    result = self._collate_results(
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 1249, in _collate_results
    for i, nresult, err in nodes:
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/utils.py", line 94, in nodelist_runner
    result = node.run(updatehash=updatehash)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 527, in run
    result = self._run_interface(execute=True)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 645, in _run_interface
    return self._run_command(execute)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 771, in _run_command
    raise NodeExecutionError(msg)
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node _ds_atlas_metadata6.

Traceback:
	Traceback (most recent call last):
	  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/interfaces/base/core.py", line 397, in run
	    runtime = self._run_interface(runtime)
	  File "/usr/local/miniconda/lib/python3.10/site-packages/niworkflows/interfaces/bids.py", line 732, in _run_interface
	    _copy_any(orig_file, str(out_file))
	  File "/usr/local/miniconda/lib/python3.10/site-packages/niworkflows/utils/misc.py", line 288, in _copy_any
	    copyfile(src, dst, copy=True, use_hardlink=True)
	  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/utils/filemanip.py", line 447, in copyfile
	    copyfile(
	  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/utils/filemanip.py", line 406, in copyfile
	    os.unlink(newfile)
	FileNotFoundError: [Errno 2] No such file or directory: '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S656Parcels/atlas-4S656Parcels_dseg.json'

@psychelzh
Copy link
Contributor Author

@tsalo Given my last comment, should we reopen this issue? Or just wait for Niworkflows?

@tsalo
Copy link
Member

tsalo commented Feb 29, 2024

It turns out both of the failures (first the tsv, then the json) you mentioned were due to the DerivativesDataSink, so it seems like #1066 didn't fix anything (though it's entirely possible the atlas files were susceptible to the race condition issue).

I will reopen this and attempt to use a similar file-copying approach instead of DerivativesDataSink for these two outputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issues noting problems and PRs fixing those problems.
Projects
None yet
2 participants