New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in H5FD__free_cls with netcdf 4.9.1 #2617
Comments
|
You should try with the newly released hdf5-1.14.0. If fixes a lot of bugs, perhaps this is one of them. |
|
Interesting; is this happening on particular hardware? I'm trying to get our big-endian machine back online to try to track down the other issue you've reported, @opoplawski, (#1338) and if this is also on specific hardware, let me know. Thanks! |
|
Seems to happen on all - and definitely on x86_64. |
|
Thanks! That will make the issue easier to track down. Looking at the traces you provided, I'm trying to pin down at what point a function in libnetcdf is being called? I'm trying to grab the latest octave release, I assume this test is failing as part of the regular suite of tests, I will see if I can replicate it locally. |
|
Presumably netcdf calls are made during the tests, but at the point of the crash I think we're just closing down the HDF5 library. I'm afraid this is multiple levels deep - it's not the tests from octave, but from the octave-netcdf package https://gnu-octave.github.io/packages/netcdf/ later I'll try to see if it's sensitive to updating hdf5. |
|
Could it be that both netcdf-c and octave are shutting down the HDF5 library? The netcdf-c one succeeds, and then the octave attempt fails? |
|
I set a breakpoint on H5close(), but I only see it called the once leading to the segfault. |
|
I've at least stripped it down to a simple reproducer in octave with the netcdf package installed: So it really does seem to be something to do with library tear down in the octave environment as we aren't doing much else. It also crashed without the I've also just discovered that I've still been building netcdf with |
|
Any chance of running your program with valgrind? |
|
The valgrind output is the same as before - see the initial comment. |
|
Dropping |
|
Some time ago I changed the netcdf-c code so that it does not depend on H5_USE_110_API or any other macro redefinition schemes that they use at HDF5. Simpler to just call the desired functions directly, and not worry about the redefines of their APIs (a misguided approach, IMO). Is there a way of telling if the HDF5 library has been closed down? |
|
Still seeing this with netcdf 4.9.2 |
|
Investigating. Thanks! |
|
The same (or very similar?) bug has just been reported for Debian (and reproduced on Ubuntu 23.10) which are using older netcdf and hdf libraries : On my Ubuntu 23.10 (when compiled with -Og -ggdb3): This is with Dmitri. |
|
Not sure how much help it is, but I notice that netcdf built with -DENABLE_BYTERANGE=OFF does not seem to have the issue |
|
Is there a test program that produces this error but does not use octave? |
|
Issue is that octave load/unloads Hd5, but when netcdf is unloaded (before octaves hd5 unload) it has not unregistered the new class which is now not in memory. Adding a finalize call for the HTTP class fixes the crash. |
|
Something about this bothers me. It seems to me the problem is caused by octave, and it should |
|
Also, I have a feeling that this is going to be an ongoing issue because there is at least one other |
I can make octave unload netcdf before closing, however since netcdf isnt cleaning up itself on unload (ie: unregistering things it registered) the issue isnt fixed. As when octave (and hdf5 closes later on) it still has references to the netcdf class that are no longer in memory. |
|
HDFCore is loaded within the hdf5 library isnt it? - therefor owned by hdf5, so will be there for the lifetime of the hdf5 library being loaded. |
|
For the hdf5filter.c, it looks like the code already has an unregister call done within nc4_global_filter_action |
|
Thank you for fixing this! Looks good here. |
I'm testing updating Fedora to netcdf 4.9.1 and I'm seeing a new failure when running the tests for the octave-netcdf package. This might be tricky to track down, but I'm not seeing the failure with the current netcdf 4.9.0 package. HDF5 is remaining constant at 1.12.1.
The segfault occurs when exiting:
valgrind:
There are quite a lot of iterations through the loop in H5I_clear_type before the segfault. I have no idea what other information would be useful for tracking this down.
The text was updated successfully, but these errors were encountered: