Error in finding parcellation atlases when running pipeline in parallel #1064

psychelzh · 2024-02-25T09:43:44Z

Summary

Now the output atlases folder is moved out of the subject directory, which is very reasonable. But for some unknown reason, when running pipeline for several subjects in parallel, some sub-tasks will result in an error in finding corresponding parcellation atlas file. The following is crash report:

Node: xcpd_wf.single_subject_SICNU028_wf.cifti_postprocess_11_wf.connectivity_wf.parcellate_reho
Working directory: /seastor/CAMP/tmp/xcpd_wf/single_subject_SICNU028_wf/cifti_postprocess_11_wf/connectivity_wf/parcellate_reho

Node inputs:

atlas = ['/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S1056Parcels/space-fsLR_atlas-4S1056Parcels_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S156Parcels/space-fsLR_atlas-4S156Parcels_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S256Parcels/space-fsLR_atlas-4S256Parcels_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S356Parcels/space-fsLR_atlas-4S356Parcels_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S456Parcels/space-fsLR_atlas-4S456Parcels_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S556Parcels/space-fsLR_atlas-4S556Parcels_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S656Parcels/space-fsLR_atlas-4S656Parcels_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S756Parcels/space-fsLR_atlas-4S756Parcels_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S856Parcels/space-fsLR_atlas-4S856Parcels_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S956Parcels/space-fsLR_atlas-4S956Parcels_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-Glasser/space-fsLR_atlas-Glasser_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-Gordon/space-fsLR_atlas-Gordon_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-HCP/space-fsLR_atlas-HCP_den-91k_dseg.dlabel.nii', '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-Tian/space-fsLR_atlas-Tian_den-91k_dseg.dlabel.nii']
atlas_labels = <undefined>
data_file = /seastor/CAMP/tmp/xcpd_wf/single_subject_SICNU028_wf/cifti_postprocess_11_wf/reho_wf/merge_cifti/reho_converted.dscalar.nii
min_coverage = 0.0
parcellated_atlas = <undefined>

Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/plugins/multiproc.py", line 292, in _send_procs_to_workers
    num_subnodes = self.procs[jobid].num_subnodes()
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 1308, in num_subnodes
    self._get_inputs()
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 1321, in _get_inputs
    self._inputs.trait_set(**old_inputs)
  File "/usr/local/miniconda/lib/python3.10/site-packages/traits/has_traits.py", line 1520, in trait_set
    setattr(self, name, value)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/interfaces/base/traits_extension.py", line 424, in validate
    value = super(MultiObject, self).validate(objekt, name, newvalue)
  File "/usr/local/miniconda/lib/python3.10/site-packages/traits/trait_types.py", line 2699, in validate
    return TraitListObject(self, object, name, value)
  File "/usr/local/miniconda/lib/python3.10/site-packages/traits/trait_list_object.py", line 582, in __init__
    super().__init__(
  File "/usr/local/miniconda/lib/python3.10/site-packages/traits/trait_list_object.py", line 213, in __init__
    super().__init__(self.item_validator(item) for item in iterable)
  File "/usr/local/miniconda/lib/python3.10/site-packages/traits/trait_list_object.py", line 213, in <genexpr>
    super().__init__(self.item_validator(item) for item in iterable)
  File "/usr/local/miniconda/lib/python3.10/site-packages/traits/trait_list_object.py", line 865, in _item_validator
    return trait_validator(object, self.name, value)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/interfaces/base/traits_extension.py", line 330, in validate
    value = super(File, self).validate(objekt, name, value, return_pathlike=True)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/interfaces/base/traits_extension.py", line 135, in validate
    self.error(objekt, name, str(value))
  File "/usr/local/miniconda/lib/python3.10/site-packages/traits/base_trait_handler.py", line 74, in error
    raise TraitError(
traits.trait_errors.TraitError: Each element of the 'atlas_labels' trait of a DynamicTraitedSpec instance must be a pathlike object or string representing an existing file, but a value of '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S1056Parcels/atlas-4S1056Parcels_dseg.tsv' <class 'str'> was specified.


When creating this crashfile, the results file corresponding
to the node could not be found.

Additional details

xcp_d version: 0.6.1
Docker version: N/A
Singularity version: 3.7.0

What were you trying to do?

Running several subjects in parallel.

What did you expect to happen?

Atlases should be okay for all subjects.

What actually happened?

Some subject pipeline cannot find atlas file.

Reproducing the bug

None. The pipeline is set up as the default setting.

The text was updated successfully, but these errors were encountered:

tsalo · 2024-02-25T14:42:23Z

This seems like it could be stemming from a race condition. Namely, XCP-D might be in the process of writing out a new copy of the atlas in one node while another node is trying to access it. I will look into it tomorrow, but my guess is that we can fix this by checking if the output atlas already exists in whichever node typically writes it out. If the atlas exists, then that node should just not try to write out a new copy at all.

psychelzh · 2024-02-25T15:03:40Z

That is probably correct. I think we can make XCP-D first check if the file exists, and if so, skip writing and use it directly?

psychelzh · 2024-02-27T14:08:33Z

Maybe a related but not the same issue is with the desc-linc_qc.json file. Is it possible to fix it in a similar way?

Node: xcpd_wf.single_subject_TJNU007N_wf.cifti_postprocess_0_wf.qc_report_wf.ds_qc_metadata
Working directory: /seastor/CAMP/tmp/xcpd_wf/single_subject_TJNU007N_wf/cifti_postprocess_0_wf/qc_report_wf/ds_qc_metadata

Node inputs:

acquisition = <undefined>
atlas = <undefined>
base_directory = /seastor/CAMP/derivatives/xcpd_no_gsr
ceagent = <undefined>
check_hdr = True
chunk = <undefined>
cohort = <undefined>
compress = <undefined>
data_dtype = <undefined>
datatype = <undefined>
den = <undefined>
desc = linc
direction = <undefined>
dismiss_entities = ['suffix', 'task', 'measure', 'mode', 'roi', 'atlas', 'tracksys', 'ceagent', 'modality', 'space', 'flip', 'fmap', 'staining', 'tracer', 'chunk', 'mt', 'den', 'hemi', 'inv', 'recording', 'res', 'cohort', 'sample', 'reconstruction', 'desc', 'session', 'part', 'echo', 'acquisition', 'from', 'datatype', 'direction', 'model', 'extension', 'scans', 'proc', 'subject', 'label', 'subset', 'to', 'run']
echo = <undefined>
extension = .json
flip = <undefined>
fmap = <undefined>
from = <undefined>
hemi = <undefined>
in_file = ['/seastor/CAMP/tmp/xcpd_wf/single_subject_TJNU007N_wf/cifti_postprocess_0_wf/qc_report_wf/qc_report/filtered_denoisedqc_bold.json']
inv = <undefined>
label = <undefined>
measure = <undefined>
meta_dict = <undefined>
modality = <undefined>
mode = <undefined>
model = <undefined>
mt = <undefined>
part = <undefined>
proc = <undefined>
reconstruction = <undefined>
recording = <undefined>
res = <undefined>
roi = <undefined>
run = <undefined>
sample = <undefined>
scans = <undefined>
session = <undefined>
source_file = ['/seastor/CAMP/derivatives/fmriprep/sub-TJNU007N/ses-1/func/sub-TJNU007N_ses-1_task-am_dir-PA_run-1_space-fsLR_den-91k_bold.dtseries.nii']
space = <undefined>
staining = <undefined>
subject = <undefined>
subset = <undefined>
suffix = qc
task = <undefined>
to = <undefined>
tracer = <undefined>
tracksys = <undefined>

Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/plugins/multiproc.py", line 344, in _send_procs_to_workers
    self.procs[jobid].run(updatehash=updatehash)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 527, in run
    result = self._run_interface(execute=True)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 645, in _run_interface
    return self._run_command(execute)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 771, in _run_command
    raise NodeExecutionError(msg)
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node ds_qc_metadata.

Traceback:
	Traceback (most recent call last):
	  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/interfaces/base/core.py", line 397, in run
	    runtime = self._run_interface(runtime)
	  File "/usr/local/miniconda/lib/python3.10/site-packages/niworkflows/interfaces/bids.py", line 732, in _run_interface
	    _copy_any(orig_file, str(out_file))
	  File "/usr/local/miniconda/lib/python3.10/site-packages/niworkflows/utils/misc.py", line 288, in _copy_any
	    copyfile(src, dst, copy=True, use_hardlink=True)
	  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/utils/filemanip.py", line 447, in copyfile
	    copyfile(
	  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/utils/filemanip.py", line 386, in copyfile
	    elif posixpath.samefile(newfile, originalfile):
	  File "/usr/local/miniconda/lib/python3.10/genericpath.py", line 100, in samefile
	    s1 = os.stat(f1)
	FileNotFoundError: [Errno 2] No such file or directory: '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/desc-linc_qc.json'

tsalo · 2024-02-27T14:20:46Z

The atlas race condition falls under XCP-D's purview, because I use a custom function to copy the files, but the new error stems from the DerivativesDataSink class imported from Niworkflows. I think the best move forward would be to open an issue in the Niworkflows repo (https://github.com/nipreps/niworkflows) or open a NeuroStars post with the niworkflows tag. Would you be willing to do that?

psychelzh · 2024-02-27T14:44:47Z

I have filed a new issue there, but I am not sure if I provide enough details.

psychelzh · 2024-02-28T08:55:16Z

It is a little strange. I have tried xcp_d 0.6.2 version containing the patch, but it still gives these errors. Maybe this time it is related to DerivativesDataSink?

Node: xcpd_wf.single_subject_TJNU066N_wf.load_atlases_wf.ds_atlas_metadata
Working directory: /seastor/CAMP/tmp/xcpd_wf/single_subject_TJNU066N_wf/load_atlases_wf/ds_atlas_metadata

Node inputs:

acquisition = <undefined>
atlas = ['4S1056Parcels', '4S156Parcels', '4S256Parcels', '4S356Parcels', '4S456Parcels', '4S556Parcels', '4S656Parcels', '4S756Parcels', '4S856Parcels', '4S956Parcels', 'Glasser', 'Gordon', 'HCP', 'Tian']
base_directory = /seastor/CAMP/derivatives/xcpd_no_gsr
ceagent = <undefined>
check_hdr = False
chunk = <undefined>
cohort = <undefined>
compress = <undefined>
data_dtype = <undefined>
datatype = <undefined>
den = <undefined>
desc = <undefined>
direction = <undefined>
dismiss_entities = ['datatype', 'subject', 'session', 'task', 'run', 'desc', 'space', 'res', 'den', 'cohort']
echo = <undefined>
extension = .json
flip = <undefined>
fmap = <undefined>
from = <undefined>
hemi = <undefined>
in_file = [['/AtlasPack/tpl-fsLR_atlas-4S1056Parcels_dseg.json'], ['/AtlasPack/tpl-fsLR_atlas-4S156Parcels_dseg.json'], ['/AtlasPack/tpl-fsLR_atlas-4S256Parcels_dseg.json'], ['/AtlasPack/tpl-fsLR_atlas-4S356Parcels_dseg.json'], ['/AtlasPack/tpl-fsLR_atlas-4S456Parcels_dseg.json'], ['/AtlasPack/tpl-fsLR_atlas-4S556Parcels_dseg.json'], ['/AtlasPack/tpl-fsLR_atlas-4S656Parcels_dseg.json'], ['/AtlasPack/tpl-fsLR_atlas-4S756Parcels_dseg.json'], ['/AtlasPack/tpl-fsLR_atlas-4S856Parcels_dseg.json'], ['/AtlasPack/tpl-fsLR_atlas-4S956Parcels_dseg.json'], ['/usr/local/miniconda/lib/python3.10/site-packages/xcp_d/data/atlases/tpl-fsLR_atlas-Glasser_dseg.json'], ['/usr/local/miniconda/lib/python3.10/site-packages/xcp_d/data/atlases/tpl-fsLR_atlas-Gordon_dseg.json'], ['/usr/local/miniconda/lib/python3.10/site-packages/xcp_d/data/atlases/tpl-fsLR_atlas-HCP_dseg.json'], ['/usr/local/miniconda/lib/python3.10/site-packages/xcp_d/data/atlases/tpl-fsLR_atlas-Tian_dseg.json']]
inv = <undefined>
label = <undefined>
measure = <undefined>
meta_dict = <undefined>
modality = <undefined>
mode = <undefined>
model = <undefined>
mt = <undefined>
part = <undefined>
proc = <undefined>
reconstruction = <undefined>
recording = <undefined>
res = <undefined>
roi = <undefined>
run = <undefined>
sample = <undefined>
scans = <undefined>
session = <undefined>
source_file = <undefined>
space = <undefined>
staining = <undefined>
subject = <undefined>
subset = <undefined>
suffix = dseg
task = <undefined>
to = <undefined>
tracer = <undefined>
tracksys = <undefined>

Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/plugins/multiproc.py", line 344, in _send_procs_to_workers
    self.procs[jobid].run(updatehash=updatehash)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 527, in run
    result = self._run_interface(execute=True)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 1380, in _run_interface
    result = self._collate_results(
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 1249, in _collate_results
    for i, nresult, err in nodes:
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/utils.py", line 94, in nodelist_runner
    result = node.run(updatehash=updatehash)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 527, in run
    result = self._run_interface(execute=True)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 645, in _run_interface
    return self._run_command(execute)
  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 771, in _run_command
    raise NodeExecutionError(msg)
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node _ds_atlas_metadata6.

Traceback:
	Traceback (most recent call last):
	  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/interfaces/base/core.py", line 397, in run
	    runtime = self._run_interface(runtime)
	  File "/usr/local/miniconda/lib/python3.10/site-packages/niworkflows/interfaces/bids.py", line 732, in _run_interface
	    _copy_any(orig_file, str(out_file))
	  File "/usr/local/miniconda/lib/python3.10/site-packages/niworkflows/utils/misc.py", line 288, in _copy_any
	    copyfile(src, dst, copy=True, use_hardlink=True)
	  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/utils/filemanip.py", line 447, in copyfile
	    copyfile(
	  File "/usr/local/miniconda/lib/python3.10/site-packages/nipype/utils/filemanip.py", line 406, in copyfile
	    os.unlink(newfile)
	FileNotFoundError: [Errno 2] No such file or directory: '/seastor/CAMP/derivatives/xcpd_no_gsr/xcp_d/atlases/atlas-4S656Parcels/atlas-4S656Parcels_dseg.json'

psychelzh · 2024-02-29T15:28:17Z

@tsalo Given my last comment, should we reopen this issue? Or just wait for Niworkflows?

tsalo · 2024-02-29T15:33:19Z

It turns out both of the failures (first the tsv, then the json) you mentioned were due to the DerivativesDataSink, so it seems like #1066 didn't fix anything (though it's entirely possible the atlas files were susceptible to the race condition issue).

I will reopen this and attempt to use a similar file-copying approach instead of DerivativesDataSink for these two outputs.

psychelzh added the bug Issues noting problems and PRs fixing those problems. label Feb 25, 2024

tsalo mentioned this issue Feb 26, 2024

Fix potential atlas file race condition #1066

Merged

psychelzh mentioned this issue Feb 27, 2024

Handle file read/write race in parallel computing nipreps/niworkflows#856

Open

tsalo closed this as completed in #1066 Feb 27, 2024

tsalo reopened this Feb 29, 2024

tsalo mentioned this issue Feb 29, 2024

Use copy_atlas to write out atlas tsv and json files #1073

Merged

tsalo closed this as completed in #1073 Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in finding parcellation atlases when running pipeline in parallel #1064

Error in finding parcellation atlases when running pipeline in parallel #1064

psychelzh commented Feb 25, 2024 •

edited

Loading

tsalo commented Feb 25, 2024

psychelzh commented Feb 25, 2024

psychelzh commented Feb 27, 2024

tsalo commented Feb 27, 2024

psychelzh commented Feb 27, 2024 •

edited

Loading

psychelzh commented Feb 28, 2024 •

edited

Loading

psychelzh commented Feb 29, 2024

tsalo commented Feb 29, 2024

Error in finding parcellation atlases when running pipeline in parallel #1064

Error in finding parcellation atlases when running pipeline in parallel #1064

Comments

psychelzh commented Feb 25, 2024 • edited Loading

Summary

Additional details

What were you trying to do?

What did you expect to happen?

What actually happened?

Reproducing the bug

tsalo commented Feb 25, 2024

psychelzh commented Feb 25, 2024

psychelzh commented Feb 27, 2024

tsalo commented Feb 27, 2024

psychelzh commented Feb 27, 2024 • edited Loading

psychelzh commented Feb 28, 2024 • edited Loading

psychelzh commented Feb 29, 2024

tsalo commented Feb 29, 2024

psychelzh commented Feb 25, 2024 •

edited

Loading

psychelzh commented Feb 27, 2024 •

edited

Loading

psychelzh commented Feb 28, 2024 •

edited

Loading