-
-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build libhdf5 with the --enable-threadsafe flag #776
Comments
@ZanSara : Good point. I agree that I guess I will add For reference, the HDF5 config of the current wheels is:
|
Thank you for your help! |
MRG: Build HDF5 with thread safety enabled Build HDF5 with --enable-thread-safe flag. This was brought to my attention by @ZanSara over at PyTables/PyTables#776 It seems a good idea to enable thread safety for the HDF5 library. This makes a lot of sense, as the conda-forge [hdf5 package is also build](https://github.com/conda-forge/hdf5-feedstock/blob/master/recipe/build.sh) with this flag.
Once this get merged, I'll rebuild the wheels for 3.6.1 and let you know here. |
Hi, I'm sorry if I`m posting my question in wrong way but I've been really struggling with this problem for a few days now and came onto this bug fix which really got my hopes up but I've had no success with the 3.6.1 version I installed with pip. I'm having the same problems with multiple threads with release 3.6.1 on windows. My program is very simple. Actually, I've ultra simplified it and still have issues. Open different hdf5 files in each thread and just retrieve the objects at the keys.
Was the problem only fixed for linux and mac in release 3.6.1 ? Thank you for your help. |
@tdagnino : The current wheels vendor a HDF5 lib that is not yet compiled with the Still working on this over at matthew-brett/multibuild#277 Will report here when finished. |
@ZanSara threadsafe wheels ( You can install from testpypi using:
This should install a manylinux wheel (on Linux) with hdf5 compiled with Please let me know if you are able. |
I have a try now and let you know soon 👍 |
My original test code was written with Pandas, not directly with PyTables, and with it, recompiling the library solved the bug just fine. However, installing the test version of tables you linked (before installing Pandas of course) doesn't fix Pandas bug. For reference, my test code is taken from one of the issues I reference above: import pandas as pd
import numpy as np
from multiprocessing import Pool
import warnings
# To avoid natural name warnings
warnings.filterwarnings('ignore')
def init(hdf_store):
global hdf_buff
hdf_buff = hdf_store
def reader(name):
df = hdf_buff[name]
return (name, df)
def main():
# Creating the store
with pd.HDFStore('storage.h5', 'w') as store:
for i in range(100):
df = pd.DataFrame(np.random.rand(5,3), columns=list('abc'))
store.append(str(i), df, index=False, expectedrows=5)
# Reading concurrently with one connection
with pd.HDFStore('storage.h5', 'r') as store:
with Pool(4, initializer=init, initargs=(store,)) as p:
ret = pd.concat(dict(p.map(reader, [str(i) for i in range(100)])))
if __name__ == '__main__':
main() For reference, on a vanilla version of Pandas the code crashes with one of these two error messages at random:
or
On the recompiled libraries instead it works fine. I am now writing a simple test that uses tables directly to see if the problem persists . Maybe it's also a problem with my setup. I'll keep you updated. |
Ok, here is the modified test (I post it so you can double-check for mistakes): import tables
import numpy as np
from multiprocessing import Pool
import warnings
# To avoid natural name warnings
warnings.filterwarnings('ignore')
class Particle(tables.IsDescription):
name = tables.StringCol(16) # 16-character String
idnumber = tables.Int64Col() # Signed 64-bit integer
def init(hdf_store):
global hdf_buff
hdf_buff = hdf_store
def reader(name):
table = hdf_buff.root.readout
return (name, table)
def main():
# Create test file
with tables.open_file("storage.h5", mode="w", title="Test file") as store:
table = store.create_table("/", 'readout', Particle, "Readout example")
particle = table.row
for i in range(100):
particle['name'] = 'Particle: %6d' % (i)
particle['idnumber'] = i * (2 ** 34)
particle.append()
table.flush()
# Simply read the table - no problem
with tables.open_file("storage.h5", mode="r", title="Test file") as store:
cols = []
init(store)
for i in range(100):
cols.append( reader(str(i)) )
ret = np.column_stack(cols)
# Reading concurrently with one connection - fails
with tables.open_file("storage.h5", mode="r", title="Test file") as store:
with Pool(4, initializer=init, initargs=(store,)) as p:
ret = np.column_stack( dict(p.map(reader, [str(i) for i in range(100)])).values() )
print(ret)
if __name__ == '__main__':
main() If you comment out the concurrent read code, the script above works fine. If you run it all, it will crash with one of these two errors at random:
or
This is what happend with the current PyTables, so not with your test ones yet. Now I setup with your version and let you know. |
I can confirm that the new test version does not fix the bug. |
@ZanSara thanks this helps a lot! I will look into it and come back to you (probably asking for more help) |
Hi, I am still dealing with this exact issue. I access H5 files for read access in using python's multiprocessing on two separate environments. It works fine on one but fails in the other and I can't pinpoint why. Any tips for solving this? |
@eriniocentric if you are using multiprocessing probably it is not an threading issue IMHO. |
Any idea what these errors are telling me. It doesn't happen when I run the command alone but happens when I use multiprocessing Pool class. I am basically doing a read_where in each thread on the same h5 file table. Exception ignored in: 'tables.indexesextension.IndexArray._g_read_sorted_slice' File "H5Dio.c", line 199, in H5Dread End of HDF5 error back trace Problems reading chunk records. |
It is hard to say what can be the problem. |
Did --enable-threadsafe get added at all? |
HDF5 for wheels is built using https://github.com/PyTables/PyTables/blob/master/ci/github/get_hdf5.sh |
Hello,
It seems to me that PyTables relies on the version of
libhdf5.so
found in the system rather than building its own, and that most systems have versions oflibhdf5.so
that are compiled without the--enable-threadsafe
flag (please correct me if I'm wrong). This causes some annoying concurrency issues even while reading files (probably issues #700 and #593, Pandas issue #12236, and duplicates).Rebuilding the HDF5 library with the proper flag and building PyTables over it seems to solve most of these issues, at least in the tests I've done so far.
Do you think is possible to bundle a version of
libhdf5.so
compiled with that flag, or to build it when installing PyTables? Unfortunately I am not an expert in this matter. For now I am doing the whole process from a bash script, but it would be amazing to have it done somehow when installing PyTables.The exact flags I use for the build are
/configure --prefix=/usr/local/hdf5 --disable-hl --enable-threadsafe
The text was updated successfully, but these errors were encountered: