Error with PersistentDataset in pytorch distributed setting #2079

jpcenteno80 · 2021-04-24T04:27:18Z

jpcenteno80
Apr 24, 2021

I am using .nrrd files and was getting errors while using CacheDataset (probably related to that note under the ITKReader class about the process not being thread safe - I even updated ITK to version 5.2.0). So I switched to PersistentDataset and it worked great on single GPU. However, I am following the setup in the tutorial for the dynunet_pipeline using 1 node with 8 GPUs and I am getting an error of No such file or directory: 'persistent_data_cache/9fd3bad5a1225c76284263dc0bcbb196.temp_write_cache'. This is the temp file generated while the final .pt file is being created. I notice that the GPUs are already processing while the persistent dataset is being written to disk. So I was wondering how I could make sure the persistent dataset is first created before letting the GPUs start their work, without diverging too far from the template in the dynunet_pipeline tutorial.

Nic-Ma · 2021-04-25T00:23:28Z

Nic-Ma
Apr 25, 2021
Maintainer

Hi @yiheng-wang-nv ,

Could you please help verify PersistentDataset in the dyunet_pipeline tutorial with 2 GPUs?

Thanks.

0 replies

Nic-Ma · 2021-04-25T14:23:48Z

Nic-Ma
Apr 25, 2021
Maintainer

Hi @jpcenteno80 ,

I think this PR can fix your issue: #2086.

Thanks.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error with PersistentDataset in pytorch distributed setting #2079

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Error with PersistentDataset in pytorch distributed setting #2079

jpcenteno80 Apr 24, 2021

Replies: 2 comments

Nic-Ma Apr 25, 2021 Maintainer

Nic-Ma Apr 25, 2021 Maintainer

jpcenteno80
Apr 24, 2021

Nic-Ma
Apr 25, 2021
Maintainer

Nic-Ma
Apr 25, 2021
Maintainer