fix reading of byte-swapped input files (#95) #101

helmutg · 2023-01-31T06:03:15Z

When reading a byte-swapped file, the input is grouped to 4-byte words and each of them is swapped individually. When we try to read such a file, we first validate its header using zfp_read_header with the ZFP_HEADER_MAGIC flag. This flag causes it to only validate the first word to be "zfp\x05". If it is not exactly that, it gives up. Unfortunately, this magic word can already be swapped. The actual byte swapping code would only be tried once the full header would fail to read, so automatic byte swapping never worked.

Instead, when encountering a header with bad magic, try swapping it already and only try reading the full header once the magic (normal or swapped) has been read successfully.

Thanks to Mark C. Miller, Peter Lindstrom and Enrico Zini for doing most of the debugging to get here.

When reading a byte-swapped file, the input is grouped to 4-byte words and each of them is swapped individually. When we try to read such a file, we first validate its header using zfp_read_header with the ZFP_HEADER_MAGIC flag. This flag causes it to only validate the first word to be "zfp\x05". If it is not exactly that, it gives up. Unfortunately, this magic word can already be swapped. The actual byte swapping code would only be tried once the full header would fail to read, so automatic byte swapping never worked. Instead, when encountering a header with bad magic, try swapping it already and only try reading the full header once the magic (normal or swapped) has been read successfully. Thanks to Mark C. Miller, Peter Lindstrom and Enrico Zini for doing most of the debugging to get here.

brtnfld

Looks good to me. The test is in #102

markcmiller86 · 2023-02-17T19:09:18Z

Ok, sorry for long delay in assessing. I needed to fully understand why orig. implementation was failing and that new approach is best way to go.

What the filter is basically doing is using HDF5's cd_values stuff (an array of user-defined length of unsigned ints) to store a faux (or dummy) ZFP stream header. That is because that header captures everything the ZFP library needs to know about the compressed data. We use ZFP library to write its header into a buffer that is later treated by HDF5 as the dataset's cd_values array and we do the reverse when reading.

But, ZFP is expecting to be able to do that in an endian-agnostic way. In otherwords, you get the same sequence of bytes in the ZFP stream regardless of whether its done on a big-endian or little-endian machine. Thats fine but the cd_values array of unsigned ints is NOT endian-agnostic. It is designed to store little endian to the file but nonetheless handle byte swapping of the cd_values data (on both write and read) when interacting with a big endian caller.

So, ZFP's header will wind up experiencing byte swapping from HDF5 on big endian systems. The implementation was to trigger off a failure of ZFP to read its header as a hint that maybe the data got byte-swapped and then UNswap the data and try reading the header a second time. That worked fine until separate logic to read just ZFP's MAGIC was introduced (cd1de1c) but did not also handle the possible byte-swapping scenario.

Thanks so much to @helmutg for finding and fixing.

helmutg mentioned this pull request Jan 31, 2023

Test suite errors on s390x #95

Closed

spanezz mentioned this pull request Jan 31, 2023

Added new endianness test #102

Merged

brtnfld requested review from brtnfld and markcmiller86 February 16, 2023 00:37

brtnfld approved these changes Feb 16, 2023

View reviewed changes

brtnfld added Type - Bug Component - C Library Priority - Blocker ⛔ labels Feb 16, 2023

markcmiller86 merged commit 8acf824 into LLNL:master Feb 17, 2023

markcmiller86 mentioned this pull request Feb 28, 2023

test-h5repack fails on big endian architectures #100

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix reading of byte-swapped input files (#95) #101

fix reading of byte-swapped input files (#95) #101

helmutg commented Jan 31, 2023

brtnfld left a comment •

edited

Loading

markcmiller86 commented Feb 17, 2023

fix reading of byte-swapped input files (#95) #101

fix reading of byte-swapped input files (#95) #101

Conversation

helmutg commented Jan 31, 2023

brtnfld left a comment • edited Loading

Choose a reason for hiding this comment

markcmiller86 commented Feb 17, 2023

brtnfld left a comment •

edited

Loading