-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of IDs after reading many files #63
Comments
Hmm, that's a pain. It's probably possible to use a similar workaround to the one described in the thread you linked. Did you find any way around it in the end? |
No, I moved over to https://github.com/jonathantompson/torchzlib instead and it has worked for me with no problems. |
Was this problem handled by anyone? I am using HDF5 1.8.17 in enable-threadsafe mode, and training with multi-threads loading many hdf5 files during network training. It never seems to release memory and it crashes after a few hours with 'too many open files' although i checked many times that opened files are closed.
The same code doesn't crash with HDF5 1.8.12, it just leaks memory the same way. |
I'm having what appears to be a similar issue, but with an earlier version of HDF5:
This is after many iterations of open/read/close on a HDF5 file. Usually the program just hangs forever, I feel like I was "lucky" to even see the error message. |
I created a fork of torch-hdf5 which works with HDF5 1.10 (https://github.com/anibali/torch-hdf5/tree/hdf5-1.10), installed HDF5 1.10 with 1.8 API compatibility, and reran the sample program provided by the OP. The program now finishes successfully, whereas before it did not. So either a) the issue is properly fixed in newer versions of HDF5, or b) the new 64-bit IDs in 1.10 have increased the number of available IDs but they will still eventually run out given enough time. |
I'm using many data files in hdf5 format to train a neural network. After running for many epochs over a few hours, it crashes with an error
It seems to be a known(?) bug, and exists in both 1.8.14 and 1.8.16
https://stackoverflow.com/questions/35522633/hdf5-consumes-all-resource-ids-for-dataspaces-and-exits-c-api
I can reproduce it with this if I just let it run for a while (to be precise, around 2^24 = 16777216 iterations)
Any ideas? Should I just not use hdf5?
The text was updated successfully, but these errors were encountered: