test-h5repack fails on big endian architectures #100

helmutg · 2023-01-31T05:52:59Z

We recently reported test failures on s390x, see #95. While @spanezz initially reported two failing test cases, the whole bug discussion focused on only one of them. I also figured that these issues are technically independent, so I am now forking the issue.

The second issue (not discussed in #95) is that test-h5repack fails with the message size-ratio. It expects a size ratio >= 200 and gets 99. Together with @spanezz, I tried to get ahold of this, but our progress was limited. Thus we report what we know here:

This issue affects big endian architectures in general (e.g. powerpc 32bit, ppc64, sparc64, ...).
The repacked mesh_repack.h5 file is slightly larger than the original while it should be significantly smaller.
If you try to h5dump mesh_repack.h5 on a little endian without the h5z-zfp plugin, it fails (expected). If you do the same on big endian, it succeeds. This indicates that the repacking step did not actually end up using h5z-zfp.
Changing the filter id (UD=...) on the big endian machine to some nonsense does not change its behavior while we would have expected it to error out.

Given these symptoms, do you already have a guess as to where the cause may be located or how we could narrow down the cause?

The text was updated successfully, but these errors were encountered:

spanezz · 2023-01-31T16:56:41Z

Indeed this command on s390x gives 0 exit, no output, larger result than input:

h5repack -f UD=32013,0,4,3,0,3539053052,1062232653 -l X,Y,Z,Indexes:CHUNK=217 -l Indexes2:CHUNK=1517              -l Pressure,Pressure2,Pressure3:CHUNK=10x20x5              -l Pressure_2D:CHUNK=10x20              -l Stress,Velocity,Stress2,Velocity2,Stress3,Velocity3,VelocityZ,VelocityZ2,VelocityZ3:CHUNK=11x21x1x1              -l VelocityX_2D:CHUNK=21x31              -l XY:CHUNK=651x1              -l XYZ:CHUNK=217x1              -l XYZ2:CHUNK=1617x1              -l XYZ3:CHUNK=33x1  test/mesh.h5 mesh-testcase1.h5

I wonder if those parameters computed in a way that may be dependent on endianness?

helmutg · 2023-02-06T19:39:25Z

You nailed it. I tried h5repack -f UD=32013,0,4,1,0,0,1074528256 in.h5 out.h5 and with a little debugging I learned a bit about those values. The first 3 numbers get used in some way that I don't understand. The remaining numbers are parsed by H5Z_zfp_set_local and are stored in a 32bit array mem_cd_values. The value 1 is H5Z_ZFP_MODE_RATE and in Enrico's example the 3 is H5Z_ZFP_MODE_ACCURACY. In both cases this causes the last two values to be interpreted as a single double value. In itself, this is an unaligned memory access and thus bad. But it also swaps the value around. Python can help us better understand this:

>>> struct.unpack("<d", struct.pack("<II", 0, 1074528256))
(3.5,)
>>> struct.unpack(">d", struct.pack(">II", 0, 1074528256))
(5.30887497e-315,)
>>> struct.unpack(">>I", struct.pack(">d", 3.5))
(1074528256, 0)
>>> struct.unpack("<d", struct.pack("<II", 3539053052, 1062232653))
(0.001,)

And once we swap the last two numbers, it actually works for my simplified array and H5Z_ZFP_MODE_RATE, but not for the actual test case.

It is not entirely clear how to properly fix this unaligned memory access that parses the values in an architecture-dependent way. Any kind of fix looks like it's probably breaking some user.

spanezz · 2023-02-10T14:35:38Z

After some research, here's an analysis of the current situation of endianness in cd_values arguments to
H5Z-ZFP.

General context: `cd_values`

The HDF5 filters interface allows to provide and store an arbitrarily long sequence of integer values (cd_values) as arguments to the filter itself.

This can be used to customize encoding and decoding, and the values used for encoding are stored in the HDF5 datastream and made available to the filter during decoding.

cd_values itself is documented here has the documentation of cd_values as part of the HDF5 API, as a sequence of arbitrary architecture-dependent unsigned int integers. It is reasonable to assume that the HDF5 library will take care of encoding and decoding those integers in a machine-independent way.

H5Z-ZFP use of `cd_values`

H5Z-ZFP needs to control zfp's compression modes which in some case require arguments expressed as double.

In order to fit double values into a sequence of integers, ZFP uses type-punning to fit doubles into unsigned integers.

The problem arises when IEEE 754 doubles, that are encoded in a machine-independent way, are copied verbatim via type punning into integer values whose interpretation is machine dependant.

Affected places

The conversion from doubles to ingegers, or the transmission of the integers with different endianness behaviours happens in a number of places:

In C code:

src/H5Zzfp_plugin.h: note, these are C macros, so a change here needs recompiling downstream code (if any) that uses this header
test/print_h5repack_farg.c

In command line arguments:

h5repack: note that if the decimal cd_values are endianness-dependent, command line arguments computed on a machine with one endianness will give different results on a machine with a different one

In Python code:

See hdf5plugin's Zfp class which performs the same double to integer conversion from Python.

Possible solution

Assumptions:

assume the underlying hdf5 library is correct (I tested with a simple custom plugin, and I can confirm that the cd_values are preserved correctly across little/big endian machines)
assume that the vast majority of existing usage is in little-endian machines

The only way I see to address all this is to include a specific definition of
endianness in the specification of cd_values for H5Z-ZFP, and therefore of
its interface.

The easiest way may be to define the affected cd_values as:

A first number with the decimal encoding of the low 32bits of the IEEE 754 double precision value
A second number with the decimal encoding of the high 32bits of the IEEE 754 double precision value

The above two point represent the current practice in little endian machines. In big endian machines, the order of the two cb_values is currently reversed.

Example C code for the conversion:

#include <endian.h>
double dval = 0.1;
uint64_t le_val = htole64(*(uint64_t*)&dval);
unsigned int cd_value_1 = le32toh(((uint32_t*)&le_val)[0]);
unsigned int cd_value_2 = le32toh(((uint32_t*)&le_val)[1]);

Example Python code for the conversion:

cd1, cd2 = struct.unpack('<II', struct.pack('<d', float(dval))))

On little endian machines, this has the same behaviour as what there is now. On big endian machines, it should become compatible with little endian machines.

However, this would mean a change in the H5Z-ZFP's interface specification, so your opinion as authors of H5Z-ZFP is important.

markcmiller86 · 2023-02-10T16:58:22Z

Apologies everyone for delays in responding. I hope to get to this and a few other issues here next week.

lindstro · 2023-02-28T21:59:26Z

It is somewhat unfortunate that H5Z-ZFP uses a different encoding of compression parameters than the zfp library does. All compression modes zfp recognizes are given by the four integer parameters minbits, maxbits, maxprec, and minexp (see the section on expert mode and the zfp_stream struct). Only minexp is a signed integer; all parameters in practice require 16 or fewer bits, though it would be natural to store them as (32-bit) unsigned integers in cd_values rather than resort to unsafe type punning. At the end of the day, double-precision parameters such as rate and tolerance have to be translated to these four integer parameters. Even reversible mode uses a particular setting of the four integer parameters.

Going forward, I wonder if new versions of H5Z-ZFP could use this alternate encoding in a backwards compatible manner, i.e., files written with newer versions of H5Z-ZFP would use zfp's internal encoding, while the filter would still be able to read files using the old encoding. In fact, H5Z-ZFP already supports the 4-parameter "expert mode," and so we could enforce this mode for all writes. zfp also provides a function, zfp_stream_compression_mode(), that infers the compression mode from the four expert-mode parameters.

Starting with zfp 1.0.0, there is also a new struct, zfp_config, that uses a union to encode compression parameters in a manner similar to (yet different from) how H5Z-ZFP currently uses cd_values. I just realized that there is no function that translates between zfp_stream and zfp_config (in either direction), which would be useful. I'll make a point to add that in the future.

This approach to encoding compression parameters does not address the endianness issue, but it does avoid type punning for fixed-rate and -accuracy modes.

markcmiller86 · 2023-02-28T22:04:35Z

It is somewhat unfortunate that H5Z-ZFP uses a different encoding of compression parameters than the zfp library does.

I am not sure I understand...we DO use ZFP's stream header to capture compression params. We don't do any "special encoding"

Its just that we write ZFP's stream header to the dataset's object header cd_vals, not to each individual chunk. And, IMOH, that makes a lot of sense because it eliminates per-chunk ZFP header overheads.

lindstro · 2023-02-28T22:26:04Z

I am not sure I understand...we DO use ZFP's stream header to capture compression params. We don't do any "special encoding"

True, this header is embedded, but the point I was making is how zfp's four integer compression parameters are encoded differently in cd_values (separate from the header) rather than simply as those four integers (except when expert mode is used). There's no particular benefit to encoding the rate as an enum plus a double split into two unsigned ints when that compression setting--indeed, all compression settings--already has an internal representation as four integers. When you call zfp_stream_set_rate(), the rate gets converted to the four expert-mode parameters--you may as well just store them in cd_values and instead call zfp_stream_set_params().

markcmiller86 · 2023-02-28T23:27:35Z

There's no particular benefit to encoding the rate as an enum plus a double split into two unsigned ints

Hmmm....that is only happening in the case an HDF5 caller wants to use the generic interface and then only when passing params from HDF5 caller to H5Z-ZFP plugin. And, there is just NO WAY around this given HDF5's generic filter interface.

That said, once the parameters passed in cd_values are decoded in the H5Z-ZFP filter, they are handled the way ZFP would...

a rate request calls

H5Z-ZFP/src/H5Zzfp.c

Line 369 in c76f847

Z zfp_stream_set_rate(dummy_zstr, *((double*) &mem_cd_values[2]), zt, ndims_used, 0);
a precision request calls

H5Z-ZFP/src/H5Zzfp.c

Line 375 in c76f847

Z zfp_stream_set_precision(dummy_zstr, mem_cd_values[2]);
an accuracy request calls

H5Z-ZFP/src/H5Zzfp.c

Line 382 in c76f847

Z zfp_stream_set_accuracy(dummy_zstr, *((double*) &mem_cd_values[2]));
a reversible request calls

H5Z-ZFP/src/H5Zzfp.c

Line 391 in c76f847

Z zfp_stream_set_reversible(dummy_zstr);

an expert request calls

H5Z-ZFP/src/H5Zzfp.c

Lines 386 to 387 in c76f847

    
           Z zfp_stream_set_params(dummy_zstr, mem_cd_values[2], mem_cd_values[3], 
        
               mem_cd_values[4], (int) mem_cd_values[5]);

If a user is suspect or otherwise doesn't want to use this generic interface, they should use the plugin instead as a library and then use the properties interface...which winds up not having to go through this unsigned int cd_values[] thingy.

markcmiller86 · 2023-02-28T23:47:01Z

@lindstro btw...wasn't sure if you saw but other work last week have fixed this issue, #101 and #102

helmutg · 2023-03-01T00:02:31Z

This issue specifically was created about the test case test-h5repack failing on big endian systems. #101 fixes tests, but not this one and #102 adds another test, but does not affect this test either. As such, this problem remains unsolved. Would you mind reopening the issue? Possibly the solution is adapting the test case?

spanezz · 2023-03-01T10:01:44Z

With regards to a proposal for a solution for this issue, #100 (comment) makes a compelling case for what to encode cb_values in a clean way, that is, to always encode cb_values as if expert mode was used.

#100 (comment) makes the point that the current situation is needed for a user interface point of view, to avoid making h5repack useable only for users who can juggle with expert values.

I wondered there's a way to extract/compute the expert level values from the zfp_stream_set_{rate|precision|accuracy|reversible} zfp calls. I checked what they do, and indeed, they seem to simply compute and set the 4 "expert mode" integers in the zfp header. So it's possible to turn all other zfpmode inputs into cb_values encoded in expert mode.

A patch proposal would thus be to:

Document that only hd5 files encoded in expert mode are portable across different architectures
Change print_h5repack_farg.c to always generate cb_values with zfpmode=4 (-f UD=32013,0,5,4,…)
Update the examples and tests accordingly
Possibly print a warning about portability during encoding/decoding on big endian machines, if zfpmode in cb_values is not 4/expert

This would:

always generate files compatible with old versions of H5Z-ZFP, which supports expert mode cb_values
stop generating new files that cannot be ported across architectures
require no change in the h5repack API: only changes to print_h5repack_farg.c and documentation

lindstro · 2023-03-02T03:46:31Z

I think there are possibly three independent issues at hand here. I've followed this discussion only from a distance and may not be up to speed on all the gory details, so please accept my apologies if I'm off here or am repeating what's already been said.

One issue is that cd_values are multibyte integers and therefore subject to byte swapping, no matter what is stored in them. In this sense, whether we store expert-mode parameters or other data matters not (but see below)--we're at the mercy of the HDF5 library's handling of endianness. From what I can glean from the discussion here and elsewhere, HDF5 does this in a consistent manner; cd_values are always stored internally (and on disk) as little endian and are converted to/from native byte ordering as needed. So I think that we're fine as long as we're storing integer values in cd_values, one at a time.

The second issue I see has to do with type punning between double types and a pair of unsigned ints (with the assumption that sizeof(double) = 2 * sizeof(unsigned int) = 8). First of all, the C/C++ standards provide no guarantees here, as an unsigned int can have trap representations that make such conversions unsafe--basically, any such conversion is undefined behavior. Really, an array of unsigned char [sizeof(double)] and memcpy() are your only safe bet. Ignoring this, I suspect one potential issue is how one should safely "stuff" a double into two unsigned ints. The code above *((double*) &mem_cd_values[2]) for reading a double is problematic to me. The reason here is that on a little endian machine, the bottom 32 bits of the double are stored in cd_values[2], while the top 32 bits are stored in cd_values[3], with the bottom 8 bits of the double stored in the first byte (at address &cd_values[2]). When processed on a big-endian machine, the bytes within each unsigned int are swapped, but the two unsigned ints are themselves not swapped, as would be needed for type punning (i.e., all 8 bytes of the double would have to be swapped). In other words, the top 8 bits of the double on a big-endian machine ought to reside in cd_values[2], but they're stored in cd_values[3] when written on a little-endian machine. Is this perhaps what's going wrong here?

The potential third issue is that zfp writes a header as a sequence of bytes to cd_values. This is a contiguous byte sequence that nevertheless is treated by HDF5 as a set of unsigned ints, which then are byte swapped as needed. This byte swapping of the header should never happen as the native byte order has no impact on the raw binary header. I believe I saw a suggestion of manually byte swapping the cd_values storing the header whenever the header is deemed malformed (i.e., when the zfp "magic" is not recognized) to see if that undoes the undesired byte swapping done by HDF5. I think that's a reasonable strategy.

Finally, I think my proposal to use expert-mode coding for all compression modes gets around the second issue. As the first issue appears to be a non-issue and the third-issue has a potential solution, we ought to be fine.

markcmiller86 · 2023-03-02T05:35:11Z

@lindstro will take a deeper look at yours and @spanezz comments (above) by early next week.

That said, it was taking me a really long time to understand why I wasn't getting information from h5repack about where/why things fail. h5repack returns 0 exit status and still produces an output file whether or not the requested compression works. This puzzled me for quite some time.

What is happening is that the logic in the filter to explicitly disallow endian targetting (that is writing an endianness that is different from the machine's native endianness) is getting triggered in the current design of the h5repack test on big-endian systems.

Why? Because the data file it uses is little-endian. The test would work fine if we had the same data file in big-endian format and then used that file when testing on big-endian systems.

I'll follow up with other issues raised here by next week.

markcmiller86 · 2023-03-02T06:45:23Z

I've confirmed adding the -n option to h5repack in that test causes the test to work on both little-endian and big-endian machines. Why?

When h5repack is creating output datasets, it has a small choice to make...maintain the original types in the file being copied from or select the equivalent native type. The -n option tells h5repack to do the latter. So, instead of trying to write little-endian data from a big-endian machine, it selects a native type that is big-endian.

Then, everything works.

markcmiller86 · 2023-03-07T00:10:17Z

Regarding type punning @lindstro and @spanezz mention above. This is a result of the set of goals of...

...ensuring filter can be used like any other HDF5 filter via HDF5 library's plugin interface and in particular...
...ensuring caller is not required to link to some additional library if it so chooses
...ensuring caller can manipulate filter using ZFP's native concepts of rate and accuracy both of which are handled as floating point values

As I have studied this more, the punning is highly constrained (constrained enough IMHO to be manageable) to within the passing of HDF5 property list content from HDF5 caller to the plugin in a single executable. Indeed, this happens at H5Dcreate time. When the dataset is created, whatever was punned is read, interpreted and then tossed out and instead the ZFP stream header is encoded in its place. There is no persisting of punned data to a file that a later potentially incomptible consumer may need to read. There is no exchange of punned data across an endianess interface. There is no exchange of punned data across a floating point format interface.

Long story short, the punned data lives only long enough for the filter settings set in H5Pset_filter() to be read and interpreted in a subsequent H5Dcreate() where the property list is used.

I believe I saw a suggestion of manually byte swapping the cd_values storing the header whenever the header is deemed malformed (i.e., when the zfp "magic" is not recognized) to see if that undoes the undesired byte swapping done by HDF5. I think that's a reasonable strategy.

This is in fact how the filter has been designed from the start. However, when I introduced logic to tease out just the ZFP library magic and version numbers, I neglected to include it in the byte-swapping logic. That was fixed in #101 and more is explained here

helmutg mentioned this issue Jan 31, 2023

Test suite errors on s390x #95

Closed

brtnfld added Type - Bug Component - Build Component - C Library Component - Tests Priority - Blocker ⛔ labels Feb 16, 2023

markcmiller86 closed this as completed Feb 28, 2023

markcmiller86 reopened this Mar 1, 2023

markcmiller86 mentioned this issue Mar 3, 2023

Fix h5repack test #107

Merged

markcmiller86 self-assigned this Mar 3, 2023

markcmiller86 added this to the 1.1.1 milestone Mar 3, 2023

markcmiller86 closed this as completed in #107 Mar 4, 2023

markcmiller86 mentioned this issue Mar 4, 2023

Software engineering improvements #108

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test-h5repack fails on big endian architectures #100

test-h5repack fails on big endian architectures #100

helmutg commented Jan 31, 2023

spanezz commented Jan 31, 2023

helmutg commented Feb 6, 2023 •

edited

spanezz commented Feb 10, 2023 •

edited

markcmiller86 commented Feb 10, 2023

lindstro commented Feb 28, 2023

markcmiller86 commented Feb 28, 2023

lindstro commented Feb 28, 2023

markcmiller86 commented Feb 28, 2023 •

edited

markcmiller86 commented Feb 28, 2023

helmutg commented Mar 1, 2023

spanezz commented Mar 1, 2023

lindstro commented Mar 2, 2023

markcmiller86 commented Mar 2, 2023

markcmiller86 commented Mar 2, 2023

markcmiller86 commented Mar 7, 2023

test-h5repack fails on big endian architectures #100

test-h5repack fails on big endian architectures #100

Comments

helmutg commented Jan 31, 2023

spanezz commented Jan 31, 2023

helmutg commented Feb 6, 2023 • edited

spanezz commented Feb 10, 2023 • edited

General context: cd_values

H5Z-ZFP use of cd_values

Affected places

Possible solution

markcmiller86 commented Feb 10, 2023

lindstro commented Feb 28, 2023

markcmiller86 commented Feb 28, 2023

lindstro commented Feb 28, 2023

markcmiller86 commented Feb 28, 2023 • edited

markcmiller86 commented Feb 28, 2023

helmutg commented Mar 1, 2023

spanezz commented Mar 1, 2023

lindstro commented Mar 2, 2023

markcmiller86 commented Mar 2, 2023

markcmiller86 commented Mar 2, 2023

markcmiller86 commented Mar 7, 2023

helmutg commented Feb 6, 2023 •

edited

spanezz commented Feb 10, 2023 •

edited

General context: `cd_values`

H5Z-ZFP use of `cd_values`

markcmiller86 commented Feb 28, 2023 •

edited