Error with PersistentDataset in pytorch distributed setting #2079
jpcenteno80
started this conversation in
General
Replies: 2 comments
-
Hi @yiheng-wang-nv , Could you please help verify Thanks. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi @jpcenteno80 , I think this PR can fix your issue: #2086. Thanks. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am using
.nrrd
files and was getting errors while usingCacheDataset
(probably related to that note under theITKReader
class about the process not being thread safe - I even updated ITK to version 5.2.0). So I switched toPersistentDataset
and it worked great on single GPU. However, I am following the setup in the tutorial for thedynunet_pipeline
using 1 node with 8 GPUs and I am getting an error ofNo such file or directory: 'persistent_data_cache/9fd3bad5a1225c76284263dc0bcbb196.temp_write_cache'
. This is the temp file generated while the final.pt
file is being created. I notice that the GPUs are already processing while the persistent dataset is being written to disk. So I was wondering how I could make sure the persistent dataset is first created before letting the GPUs start their work, without diverging too far from the template in thedynunet_pipeline
tutorial.Beta Was this translation helpful? Give feedback.
All reactions