Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unit test fails #137

Open
shaomeng opened this issue Feb 25, 2024 · 42 comments · May be fixed by #140
Open

unit test fails #137

shaomeng opened this issue Feb 25, 2024 · 42 comments · May be fixed by #140

Comments

@shaomeng
Copy link

I was following the installation steps to compile HDF5, ZFP, and H5Z-ZFP on my system. However, the make check fails with a datatype error:

If using HDF5-1.8, make sure you have patched repack
cc -c print_h5repack_farg.c -o print_h5repack_farg.o -fPIC -I../src
cc print_h5repack_farg.o -o print_h5repack_farg 
 
h5repack -n     -f UD=32013,0,4,3,0,3539053052,1062232653 ...HDF5-DIAG: Error detected in HDF5 (1.14.3) thread 0:
  #000: /home/shaomeng/Git/hdf5/CMake-hdf5-1.14.3/hdf5-1.14.3/src/H5T.c line 2052 in H5Tget_class(): not a datatype
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.3) thread 0:
  #000: /home/shaomeng/Git/hdf5/CMake-hdf5-1.14.3/hdf5-1.14.3/src/H5T.c line 2052 in H5Tget_class(): not a datatype
    major: Invalid arguments to routine
    minor: Inappropriate type

(Note: all other tests succeed except for this one. The full make check output is attached: h5z-zfp.log).

I have HDF5 version 1.14.3, ZFP from the main branch as of 2/24/2024, and H5Z-ZFP from the main branch as of 2/24/2024. I'm not sure if the failed checks are consequential or not.

@lindstro
Copy link
Member

I can confirm that h5repack gives this error (with 8-bit word size, as needed). I've not patched my h5repack, but presumably one shouldn't have to with recent versions of HDF5--I, too, have 1.14.3 installed.

@markcmiller86 Any ideas?

@markcmiller86
Copy link
Member

The patch is only necessary for 1.8. Probably something wrong with the testing logic itself. I will take a look but not this week.

@markcmiller86
Copy link
Member

So, after a quick look, I don't see anything wrong with testing logic. The test is intended to h5repack the test file, mesh.h5 producing mesh_repack.h5 which should be smaller than the original by at least 20%. The fact that h5repack itself is reporting an issue suggests that it doesn't like a datatype its encountered in mesh.h5. But, all previous versions of HDF5 have been fine with that file. What does h5dump mesh.h5 or h5ls -vlrd mesh.h5 produce on either your system, @lindstro or your system @shaomeng using the HDF5 1.14.3 you have installed there?

@shaomeng
Copy link
Author

Hi Mark, I have the following output from h5dump mesh.h5 (attached txt file).
mesh.h5.txt

@markcmiller86
Copy link
Member

markcmiller86 commented Feb 27, 2024

So, something has me confused but in test/Makefile here...

H5Z-ZFP/test/Makefile

Lines 382 to 385 in 092190c

-l XYZ2:CHUNK=1617x1 \
-l XYZ3:CHUNK=33x1 \
mesh.h5 mesh_repack.h5 2>&1 1>/dev/null; \
if [[ $$? -ne 0 ]]; then \

there is a redirection of both stdout and stderr with 2>&1 1>/dev/null which means we should NOT SEE any HDF5 error logging anywhere. But, we are. Why? It might be worth removing that rediction and doing make test again and see what else it indicates is going wrong.

Next, it appears to complete the h5repack with 0 return value because it is NOT exiting the test here...

H5Z-ZFP/test/Makefile

Lines 385 to 389 in 092190c

if [[ $$? -ne 0 ]]; then \
printf " [$(ERROR_COLOR)FAILED$(NO_COLOR)]\n"; \
touch check-failed; \
exit 0; \
fi; \

Instead, it is exiting at the check of the size ratio. So, I think h5repack has encountered some unknown issue with the datatypes in mesh.h5 and refused to compress the file. You could confirm by looking for an mesh_repack.h5 and see if it is smaller (or different using h5diff) than mesh.h5.

@lindstro
Copy link
Member

It produces the kind of output you might expect, with no warnings or errors. Output attached.

h5dump.txt.gz
h5ls.txt.gz

@lindstro
Copy link
Member

Could it be that H5T_STD_U8LE and H5T_STD_I8LE are not handled? zfp supports only int32, int64, float, and double.

Here's part of the output I'm seeing:

h5repack -n     -f UD=32013,0,4,3,0,3539053052,1062232653 ...HDF5-DIAG: Error detected in HDF5 (1.14.3) thread 0:
  #000: /tmp/hdf5-20240118-4051-ziphrl/hdf5-1.14.3/src/H5T.c line 2052 in H5Tget_class(): not a datatype
    major: Invalid arguments to routine
    minor: Inappropriate type
...
 [FAILED (size ratio) ]

@markcmiller86
Copy link
Member

Hmm...I think you could be right. In the mesh.h5 most of the datasets are int, float or double. However, there are a few there which are not...

[scratlantis:zfp_filter/H5Z-ZFP/test] miller86% h5ls -vlr mesh.h5 | grep -B 4 signed
/Pressure                Dataset {10/10, 20/20, 30/30}
    Location:  1:42412
    Links:     1
    Storage:   24000 logical bytes, 24000 allocated bytes, 100.00% utilization
    Type:      native unsigned int
--
/Pressure_2D             Dataset {20/20, 30/30}
    Location:  1:1400
    Links:     1
    Storage:   600 logical bytes, 600 allocated bytes, 100.00% utilization
    Type:      native unsigned char
--
/VelocityX_2D            Dataset {21/21, 31/31}
    Location:  1:1672
    Links:     1
    Storage:   651 logical bytes, 651 allocated bytes, 100.00% utilization
    Type:      native signed char

What happens for you if you adjust the Makefile to read...

             -l X,Y,Z,Indexes:CHUNK=217 \
             -l Indexes2:CHUNK=1517 \
             -l Pressure2,Pressure3:CHUNK=10x20x5 \
             -l Stress,Velocity,Stress2,Velocity2,Stress3,Velocity3,VelocityZ,VelocityZ2,VelocityZ3:CHUNK=11x21x1x1 \
             -l XY:CHUNK=651x1 \
             -l XYZ:CHUNK=217x1 \
             -l XYZ2:CHUNK=1617x1 \
             -l XYZ3:CHUNK=33x1 \

which removes the request to repack Pressure, Pressure_2D and VelocityX_2D?

@markcmiller86
Copy link
Member

Well, I think its just VelocityX_2D and Pressure_2D that are the culprits. Interesting thing...earlier versons of HDF5 I think were silently skipping these or maybe h5repack continued past them in earlier versions of HDF5 but now maybe stops upon encountering the error?

@lindstro
Copy link
Member

What happens for you if you adjust the Makefile to read...

Same errors as far as I can tell.

@lindstro
Copy link
Member

FWIW, the test passes if I leave out HDF5_PLUGIN_PATH=../src/plugin. Not sure what HDF5 does in this case when requesting zfp compression.

@markcmiller86
Copy link
Member

So, I just tried with master on HDF5-1.14.3 with gcc on macOS and do not see this error. I did a make clean first.

@markcmiller86
Copy link
Member

markcmiller86 commented Feb 27, 2024

FWIW, the test passes if I leave out HDF5_PLUGIN_PATH=../src/plugin. Not sure what HDF5 does in this case when requesting zfp compression.

It should fail the size ratio if you do that. Mine does....

Plugin precision tests ...................................... [PASSED] 
If using HDF5-1.8, make sure you have patched repack
 
h5repack -n     -f UD=32013,0,4,3,0,3539053052,1062232653 ... [FAILED (size ratio) ]
 
./test_write_plugin zfpmode=5 ............................... [PASSED] 
 
h5diff -v -d 0.00001 test_zfp_le.h5 test_zfp_be.h5 compressed compressed ........ [PASSED] 
 
h5dump bigendian.h5 ............................................................. [PASSED] 

I wonder if it is somehow finding an older version of the plugin filter code somewhere in your path, ld lib var or something?

@markcmiller86
Copy link
Member

BTW...I am working from H5Z-ZFP master branch and ZFP develop branch.

@shaomeng
Copy link
Author

So, I just tried with master on HDF5-1.14.3 with gcc on macOS and do not see this error. I did a make clean first.

I tried make clean and also the remove of the request to repack Pressure, Pressure_2D and VelocityX_2D, it's still giving the same error.

In my case, I'm pretty sure that this is the only version of H5Z-ZFP I have; I just started to interact with this piece of software a few days ago ;) (And yes, I'm also using H5Z-ZFP master, ZFP develop.) I'm having this problem on ubuntu 22.04 though.

@lindstro
Copy link
Member

lindstro commented Feb 27, 2024

Some potential differences: I'm building H5Z-ZFP using GNU make and AppleClang 15.0.0. I'm using the master branch of both H5Z-ZFP and zfp (I had some vague recollection that there was a weird interaction between H5Z-ZFP and the zfp develop branch, but maybe that was pre 1.0.0).

I wonder if it is somehow finding an older version of the plugin filter code somewhere in your path, ld lib var or something?

LD_LIBRARY_PATH is not set, and I can confirm using DYLD_PRINT_LIBRARIES the correct path to the plugin.

@markcmiller86
Copy link
Member

@lindstro or @shaomeng when you get a chance, can you attach a copy of mesh_repack.h5 you get (assuming you do indeed get one...I think you are because its failing the size ratio check, not the first step in performing the repack)?

@shaomeng
Copy link
Author

@lindstro or @shaomeng when you get a chance, can you attach a copy of mesh_repack.h5 you get (assuming you do indeed get one...I think you are because its failing the size ratio check, not the first step in performing the repack)?

Hi Mark, I do get one! Please see the attachment
mesh_repack.h5.tar.gz

@lindstro
Copy link
Member

So my mesh_repack.h5 does not contain any zfp magic numbers (0x0570667a in little endian), suggesting that compression is not even being invoked.

@markcmiller86
Copy link
Member

So my mesh_repack.h5 does not contain any zfp magic numbers (0x0570667a in little endian), suggesting that compression is not even being invoked.

Well, is the mesh_repack.h5 you are looking at one you generated by disabling setting of HDF5_PLUGIN_PATH as you mentioned above

@markcmiller86
Copy link
Member

Hi Mark, I do get one! Please see the attachment mesh_repack.h5.tar.gz

Thanks. That file appears to be an exact copy of mesh.h5 with no datasets compressed. So, the repack operation produced an output but no compression was employed. I don't think h5repack will return non-zero return value in this case. I will have to check. But, pretty sure it returned 0 implying the operation succeeded. So, it went on to compare the sizes of the files and then failed the size ratio as @shaomeng original log shows (e.g. [FAILED (size ratio) ]).

So, the question is, why isn't it applying the compression? The warning messages about not a data type are certainly a clue. I am still trying to get to bottom of it.

@lindstro is your mac Intel or Apple chips?

@markcmiller86
Copy link
Member

I am thinking difference in behavior may have to do with how we have respectively compiled HDF5 1.14.3. Can each of you also past your CMake or autoconf command-line for how you configured HDF5?

@lindstro
Copy link
Member

Well, is the mesh_repack.h5 you are looking at one you generated by disabling setting of HDF5_PLUGIN_PATH as you mentioned #137 (comment)

No, the path is correct and the plugin is loaded:

dyld[60726]: <FBD8E492-073B-3FDF-A6D2-3FE5D0E53BEA> /.../H5Z-ZFP/src/plugin/libh5zzfp.so
dyld[60726]: <1B084037-309A-3541-8B17-20022829C01E> /opt/homebrew/Cellar/hdf5/1.14.3/lib/libhdf5.310.3.0.dylib
dyld[60726]: <BC330C52-DB91-346D-830B-514BDC7F1043> /.../H5Z-ZFP/zfp/lib/libzfp.1.0.1.dylib

@lindstro is your mac Intel or Apple chips?

I have an Apple M2 Max.

I am thinking difference in behavior may have to do with how we have respectively compiled HDF5 1.14.3. Can each of you also past your CMake or autoconf command-line for how you configured HDF5?

I couldn't get CMake to override the path to zfp (I tried ZFP_DIR); I don't know how it's finding zfp under my home directory. So I used GNU make:

make ZFP_HOME=`pwd`/zfp HDF5_HOME=/opt/homebrew/Cellar/hdf5/1.14.3 HDF5_PLUGIN_PATH=`pwd`/src/plugin all

@markcmiller86
Copy link
Member

markcmiller86 commented Feb 28, 2024

@jhendersonHDF wanted to ask if you could have a looksee here briefly and see if anything looks familiar. It kinda sounds like this HDF forum issue but we're not seg-faulting. Its just like h5repack is deciding it cannot invoke the requested filtering operations due to some issue in recognizing the datatype.

For reference, here is the command that works for me but fails for the other two users here...

env \
LD_LIBRARY_PATH=/Users/miller86/silo/hdf5-1.14.3/build_serial/myinstall/lib: \
HDF5_PLUGIN_PATH=../src/plugin
/Users/miller86/silo/hdf5-1.14.3/build_serial/myinstall/bin/h5repack \
-n -f UD=32013,1,4,3,0,3539053052,1062232653 \
-l X,Y,Z,Indexes:CHUNK=217 \
-l Indexes2:CHUNK=1517 \
-l Pressure,Pressure2,Pressure3:CHUNK=10x20x5 \
-l Pressure_2D:CHUNK=10x20 \
-l Stress,Velocity,Stress2,Velocity2,Stress3,Velocity3,VelocityZ,VelocityZ2,VelocityZ3:CHUNK=11x21x1x1 \
-l VelocityX_2D:CHUNK=21x31 \
-l XY:CHUNK=651x1 \
-l XYZ:CHUNK=217x1 \
-l XYZ2:CHUNK=1617x1 \
-l XYZ3:CHUNK=33x1 \
mesh.h5 mesh_repack.h5

@jhendersonHDF
Copy link
Contributor

jhendersonHDF commented Feb 28, 2024

@markcmiller86 Is it possible that you built a shared library only version of HDF5 while others here have built either a static only version or a static + shared version? This looks suspiciously similar to problems I've seen in the past where using an HDF5 tool which is statically linked to HDF5 along with a dynamically loaded plugin that is linked to the shared version of the library causes two HDF5s to be loaded into memory and results in very odd issues. If an HDF5 build was built with both static and shared, there will be two versions of each tool, e.g. h5dump and h5dump-shared and the shared versions are the correct ones to use in this case. This has been a problem often enough that we're finally correcting this in HDFGroup/hdf5#4046 by only ever building one set of tools and linking to the shared library by default if available.

@markcmiller86
Copy link
Member

@jhendersonHDF thanks for the quick look. I do indeed have shared and static build.

otool -L ~/silo/hdf5-1.14.3/build_serial/myinstall/bin/h5repack 
/Users/miller86/silo/hdf5-1.14.3/build_serial/myinstall/bin/h5repack:
	/Users/miller86/silo/hdf5-1.14.3/build_serial/myinstall/lib/libhdf5.310.dylib (compatibility version 314.0.0, current version 314.0.0)
	/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.100.3)

@shaomeng
Copy link
Author

I am thinking difference in behavior may have to do with how we have respectively compiled HDF5 1.14.3. Can each of you also past your CMake or autoconf command-line for how you configured HDF5?

Hi Mark, I followed this instruction to build HDF5, and I didn't change anything besides the installation prefix. I do see that I have both h5repack and h5repack-shared in the installed bin directory, so it might be the issue??

@markcmiller86
Copy link
Member

@lindstro I have intel Mac but I do have an M1 I can try to test on. Also, my 1.0.0 zfp lib by default builds static, libzfp.a. When H5Z-ZFP is built, it pulls that statically into the .so file that gets created for the HDF5 filter/plugin. Thats how it is "finding" zfp. Its statically built into the plugin.

@jhendersonHDF
Copy link
Contributor

@markcmiller86 Interesting that your h5repack seems to be linking to the shared library version of HDF5 if you have a static + shared build; I would've expected it to be linking to the static library. But since your version is linking to the shared library, that's likely why you have no issues. For others here, I would try editing what's necessary to make the tests use h5repack-shared instead and see if that works. By default, HDF5's CMake should build both static and shared libs and you'll end up with this problem since the expected name for the tool (h5repack, for instance) actually points to the statically-linked tool.

@lindstro
Copy link
Member

I used homebrew to install HDF5.

otool -L `where h5repack`
/opt/homebrew/bin/h5repack:
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1336.61.1)
	/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.12)
	/opt/homebrew/opt/libaec/lib/libsz.2.dylib (compatibility version 2.0.0, current version 2.0.1)

Evidently it was linked to the static library.

@markcmiller86
Copy link
Member

Evidently it was linked to the static library.

Ok, so thats a problem. A statically linked tool cannot load a plugin. So, the test cannot be properly executed. So, we should detect that and just fail associated tests.

This must be the explanation for failed test here.

@jhendersonHDF
Copy link
Contributor

So, we should detect that and just fail associated tests.

You might also be able to detect if the "-shared" version of a tool exists and use that if available.

@lindstro
Copy link
Member

So, we should detect that and just fail associated tests.

You might also be able to detect if the "-shared" version of a tool exists and use that if available.

Indeed, the test passes with h5repack-shared.

@markcmiller86 markcmiller86 linked a pull request Feb 29, 2024 that will close this issue
@markcmiller86
Copy link
Member

@shaomeng see if this branch, mcm86-28feb24-static-hdf5-failures works for you.

@markcmiller86
Copy link
Member

@jhendersonHDF thanks for your attention and time...probably saved me a few hours 💪🏻

@shaomeng
Copy link
Author

@shaomeng see if this branch, mcm86-28feb24-static-hdf5-failures works for you.

Hi Mark, the new branch still fails, with the same errors.

If I manually change the command in make check from h5repack to h5repack-shared then the tests pass.

@jhendersonHDF
Copy link
Contributor

@markcmiller86 You probably need to check if the static lib is present rather than checking if the shared lib is missing. If static is present at all then the regular named tools should end up linked statically. If the static lib is present you should either skip the test like in that PR or prefer the -shared tool when the static lib is present and the regular tool when it's not.

@jhendersonHDF
Copy link
Contributor

That is, I guess you should aim for HDF5_IS_SHARED_ONLY rather than HDF5_IS_STATIC_ONLY

@markcmiller86
Copy link
Member

Ok, thanks for info @jhendersonHDF and @shaomeng. I will return to this next week. Other fires to put out now.

@markcmiller86
Copy link
Member

Something else that really troubles me here is that many of the tests use HDF5 tools (h5diff, h5dump, and h5repack) to confirm tests passed or failed. So, IMHO, if the test involving h5repack failed for @shaomeng because it was a statically linked executable, then many of the other tests should have failed for the same reason. Yet, his .log file shows them all passing. I worry that somehow the testing logic is getting defeated...that those tests also should have failed but the detection logic isn't catching that.

@markcmiller86
Copy link
Member

Ok, yeah, I am seeing some issues with test logic.

  • Tests involving h5dump mostly just check dataset metadata and do not actually require the filter to do their work. Nonetheless, we specify HDF5_PLUGIN_PATH env. variable for those tests. That is unnecessary and confusing. So, those can indeed pass even when it is a statically linked executable and that is ok. However, there is one h5dump-based test that is designed to check endianness handling and the pass/fail logic for that test is not robust enough.
  • I notice that the test-accuracy make target (which uses H5Z-ZFP as a plugin with ZFP in accuracy mode) and the test-lib-accuracy make target (which uses H5Z-ZFP as a lib with ZFP in accuracy mode) which are basically testing the exact same cases except using H5Z-ZFP a plugin vs. lib, wind up doing the detection for incorrect results differently. The plugin tests use h5diff whereas the lib tests read the data back in with the test_read_lib which likely uses different arithmetic to test accuracy.
  • If a user has installed HDF5 with --disable-shared, there is no way any of the testing features using H5Z-ZFP as a plugin can work. Only the lib based tests can work. The reverse situation (--disable-static) does not have this constraint. It still links H5Z-ZFP statically for the lib tests but winds up using a shared HDF5 lib for them and of course the plugin tests operate fine.
  • If a user has installed HDF5 without --disable-shared then HDF5 lib might be only shared or both static and shared. If its both, the shared lib version of the tools have -shared appended to their names.
    • @jhendersonHDF do you know if Autotools and CMake builds regarding this behave identically? I use Autotools and on my mac, my default built both static and shared libraries but only shared tools and WITHOUT -shared in their names.

@jhendersonHDF
Copy link
Contributor

@markcmiller86 I mostly use CMake these days, but as far as I can tell the default in Autotools is to always build both shared and static (same as CMake), but link to the shared library by default if available. I believe the "-shared" tools concept has always been unique to the CMake builds, which complicates things further. When HDFGroup/hdf5#4046 is merged, CMake and Autotools should have identical behavior, but that doesn't really help the current situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants