-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: received 0 items of ancdata #701
Comments
Hi @petteriTeikari , Thanks for your bug report, could you please help provide the test program to reproduce this issue? Thanks. |
I try to make you some minimal example @Nic-Ma (as my codebase has grown and hard to share as it is), but I assume that the problem is somewhere deeper, not handling properly the case when there is no access to data (for unknown reason on my end) for a while. |
I haven't seen this problem before - did you make any progress with it, @petteriTeikari ? |
I "solved" this by using the standard |
@Nic-Ma : I came across again this error when trying a custom loss outside the Apparently it is the multiprocessing that is causing the headaches, thus making the problem a bit local to my machine and hard to reproduce. And tried some of the fixes from there pytorch/pytorch#973, and got at least past the first epoch, and will see how robust these workarounds are pytorch/pytorch#973 (comment): torch.multiprocessing.set_sharing_strategy('file_system') pytorch/pytorch#973 (comment): pool = torch.multiprocessing.Pool(torch.multiprocessing.cpu_count(), maxtasksperchild=1) pytorch/pytorch#973 (comment): import resource
rlimit = resource.getrlimit(resource.RLIMIT_NOFILE)
resource.setrlimit(resource.RLIMIT_NOFILE, (4096, rlimit[1])) @sampathweb had that suggestion for debugging pytorch/pytorch#973 (comment): |
Hi @petteriTeikari , I see, so what's the latest status? Have you solved this issue? Thanks. |
@Nic-Ma Yes I started training last night and it has not crashed by now so in that sense the fix seems to be working |
Sounds good! |
Thanks, it solved my problem. |
the issue is reproducible with this script (pytorch/pytorch#973 (comment)) import torch
import torch.multiprocessing as multiprocessing
torch.multiprocessing.set_sharing_strategy('file_descriptor')
def _worker_loop(data_queue, ):
while True:
t = torch.FloatTensor(1)
data_queue.put(t)
if __name__ == '__main__':
data_queue = multiprocessing.Queue(maxsize=1)
p = multiprocessing.Process(
target=_worker_loop,
args=(data_queue,))
p.daemon = True
p.start()
lis = []
for i in range(10000):
try:
lis.append(data_queue.get())
except:
print('i = {}'.format(i))
raise when |
"Stochastic" issue happening with training at some point. Training starts okay for x number of epochs and at some point this often happens with
Pytorch Lightning
(quite close still to the Build a segmentation workflow (with PyTorch Lightning))
, and is probably propagating from Pytorch code? (e.g. fastai/fastai#23)
Which I thought was happening first with the
CacheDataset
as it was quite RAM-intensive?:but the same behavior was happening with the vanilla loader
with the following transformation
I guess this depends on environment in which the code is run, but do you have any ideas how to get rid of this?
Full trace:
The text was updated successfully, but these errors were encountered: