Added new endianness test #102

spanezz · 2023-01-31T16:53:53Z

I added to tests/ a bigendian.h5 file generated on a s390x big endian machine using:

yes 12345 | head -n 5000 | h5import /dev/stdin -dims 5000 -type TEXTIN -size 32 -o unpacked.h5
h5repack -f UD=32013,0,4,1,0,0,1074921472 unpacked.h5 bigendian.h5

And then added a test-endian1 test that check if h5dump on that file contains 12345 values.

This test fails on little-endian machines without #101 and succeeds with the patch in #101

brtnfld

Looks good to me. @markcmiller86, I wonder if we should add a testfiles directory to house all the HDF5 files used in the tests?

markcmiller86 · 2023-02-16T18:45:27Z

FYI...I am working on this. I don't have easy access to a big endian system so I've found a Docker s390x container that is working. I am trying to duplicate the bad behavior that is triggering this code change and so far haven't been able to in spite of having a big endian system.

For a minute, I was worried the two test files test_zfp_be.h5 and test_zfp_le.h5 were mistakenly both little endian. But, they do not appear to be. I am trying to prove that to myself by using od -j <skip> -e to print the original dataset values and prove to myself that doing so on both files returns byte swapped results. I did indeed do that but finding the correct skip value to get to the a file offset that represents the beginning of the original dataset's first chunk was a tad difficult. h5ls emits a dataset location as a pair of ints (e.g. 1:800) and that is certainly not an offset into the file where the dataset starts.

markcmiller86 · 2023-02-16T18:48:11Z

Ok, well, I guess I should have read the initial comment here.

First, thanks for adding this test!

That is exactly what I was looking for.

Ok, let me just confirm its behaving as I expect for me too.

markcmiller86 · 2023-02-16T22:33:37Z

@brtnfld I just want to check...how does HDF5 library handle endianness of cd_values array stored to the dataset header? These values are consumed (read and written) by the HDF5 caller but are also stored to the file. I mean, does the lib choose an endianness for the cd_values data stored in the file and automatically byte-swap when cd_values are being read or written in a cross-endian context?

brtnfld · 2023-02-16T23:09:08Z

From Jordan:

Looks like the library does a UINT32ENCODE on the cd_values when writing out the filter pipeline message
        /* Filter parameters */
        for (j = 0; j < filter->cd_nelmts; j++)
            UINT32ENCODE(p, filter->cd_values[j]);
So I think that means the values will get stored as whatever the machine's endian-ness is that the file is written from, which may not be ideal

markcmiller86 · 2023-02-16T23:25:16Z

From Jordan:
Looks like the library does a UINT32ENCODE on the cd_values when writing out the filter pipeline message
        /* Filter parameters */
        for (j = 0; j < filter->cd_nelmts; j++)
            UINT32ENCODE(p, filter->cd_values[j]);
So I think that means the values will get stored as whatever the machine's endian-ness is that the file is written from, which may not be ideal

Well, in looking at the definition of that macro in H5Fprivate.h...

#  define UINT32ENCODE(p, i) {                              \
   *(p) = (uint8_t)( (i)        & 0xff); (p)++;                      \
   *(p) = (uint8_t)(((i) >>  8) & 0xff); (p)++;                      \
   *(p) = (uint8_t)(((i) >> 16) & 0xff); (p)++;                      \
   *(p) = (uint8_t)(((i) >> 24) & 0xff); (p)++;                      \
}

That is writing the least significant byte to the lowest memory location and the most significant byte to the highest memory location...little endian. So, its storing little endian to the file.

But, it basically reverses this process when returning cd_values to a reader here...

https://github.com/HDFGroup/hdf5/blob/966454aac1231da7209ef81c11055d3312181f99/src/H5Opline.c#L230-L239

using this logic...

#  define UINT32DECODE(p, i) {                              \
   (i)    =  (uint32_t)(*(p) & 0xff);       (p)++;                  \
   (i) |= ((uint32_t)(*(p) & 0xff) <<  8); (p)++;                  \
   (i) |= ((uint32_t)(*(p) & 0xff) << 16); (p)++;                  \
   (i) |= ((uint32_t)(*(p) & 0xff) << 24); (p)++;                  \
}

So, my read is that its always storing little endian to the file but also effectively properly byte-swapping the result when consumer or producer are NOT little endian.

jhendersonHDF · 2023-02-16T23:38:26Z

@markcmiller86 Ah yes, thanks for the correction. Doing too many things at once..

I agree, it appears we're writing the filter's cd_values to the file as little endian and then byte-swapping on read.

spanezz · 2023-02-17T13:26:43Z

This is probably of interest, although a bit late: I did try to verify that cd_values are preserved correctly across machines of different endianness, by creating a mock filter plugin that sets cd_values and leaves data unchanged.

Here it is:
h5zcat.zip
together with an hdf5 file repacked with h5repack on a little endian machine and one repacked on a big endian machine.

markcmiller86 · 2023-02-17T19:11:44Z

thanks so much @spanezz for the added test. Just what we needed. Cool way to generate the big-endian data too. Much easier than what I was doing. I actually installed development tools and HDF5 on a s390x Docker image and then built and ran the filter and its tests there.

Added new endianness test

2343821

brtnfld self-requested a review February 16, 2023 00:22

brtnfld approved these changes Feb 16, 2023

View reviewed changes

brtnfld requested a review from markcmiller86 February 16, 2023 15:28

brtnfld added Type - Improvement Component - Tests Priority - Blocker ⛔ labels Feb 16, 2023

markcmiller86 merged commit c76f847 into LLNL:master Feb 17, 2023

markcmiller86 mentioned this pull request Feb 28, 2023

test-h5repack fails on big endian architectures #100

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added new endianness test #102

Added new endianness test #102

spanezz commented Jan 31, 2023 •

edited

brtnfld left a comment

markcmiller86 commented Feb 16, 2023

markcmiller86 commented Feb 16, 2023

markcmiller86 commented Feb 16, 2023

brtnfld commented Feb 16, 2023

markcmiller86 commented Feb 16, 2023

jhendersonHDF commented Feb 16, 2023

spanezz commented Feb 17, 2023

markcmiller86 commented Feb 17, 2023

Added new endianness test #102

Added new endianness test #102

Conversation

spanezz commented Jan 31, 2023 • edited

brtnfld left a comment

Choose a reason for hiding this comment

markcmiller86 commented Feb 16, 2023

markcmiller86 commented Feb 16, 2023

markcmiller86 commented Feb 16, 2023

brtnfld commented Feb 16, 2023

markcmiller86 commented Feb 16, 2023

jhendersonHDF commented Feb 16, 2023

spanezz commented Feb 17, 2023

markcmiller86 commented Feb 17, 2023

spanezz commented Jan 31, 2023 •

edited