[C++] Use signed arithmetic for frame of reference in DeltaBitPackEncoder #37939

etseidl · 2023-09-28T18:01:36Z

Describe the enhancement requested

The current implementation of DeltaBitPackEncoder uses unsigned arithmetic to handle possible overflow when calculating deltas (see here). This has unfortunate consequences when encoding small negative deltas. As an example, writing a vector with values {1, 0, -1, 0, 1, 0, -1, 0, 1} produces the following output (starting at the delta binary header):

00000030:                          8001 0409 0202  ................
00000040: 2000 0000 feff ffff feff ffff 0000 0000   ...............
00000050: 0000 0000 feff ffff feff ffff 0000 0000  ................
00000060: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000070: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000080: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000090: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000c0: 0000 0000

The encoder uses a bit width of 32 for all values. If signed values are used instead, then the result is:

00000030:                     8001 0409 0201 0200  ................
00000040: 0000 a0a0 0000 0000 0000

Here the encoder can use 2 bits per value. This can result in much smaller files, especially in cases where the logical type is less than 32 bits.

Component(s)

C++

The text was updated successfully, but these errors were encountered:

…oding DELTA_BINARY_PACKED (#37940) Closes #37939. ### What changes are included in this PR? This PR changes values used in the `DELTA_BINARY_PACKED` encoder to signed types. To gracefully handle overflow, arithmetic is still performed in the unsigned domain, but other operations such as computing the min and max deltas are done in the signed domain. Using signed types ensures the optimal number of bits is used when encoding the deltas, which was not the case before if any negative deltas were encountered (which is obviously common). ### Are these changes tested? I've included two tests that result in overflow. ### Are there any user-facing changes? No * Closes: #37939 Authored-by: seidl <seidl2@llnl.gov> Signed-off-by: Antoine Pitrou <antoine@python.org>

…en encoding DELTA_BINARY_PACKED (apache#37940) Closes apache#37939. ### What changes are included in this PR? This PR changes values used in the `DELTA_BINARY_PACKED` encoder to signed types. To gracefully handle overflow, arithmetic is still performed in the unsigned domain, but other operations such as computing the min and max deltas are done in the signed domain. Using signed types ensures the optimal number of bits is used when encoding the deltas, which was not the case before if any negative deltas were encountered (which is obviously common). ### Are these changes tested? I've included two tests that result in overflow. ### Are there any user-facing changes? No * Closes: apache#37939 Authored-by: seidl <seidl2@llnl.gov> Signed-off-by: Antoine Pitrou <antoine@python.org>

etseidl added the Type: enhancement label Sep 28, 2023

github-actions bot added the Component: C++ label Sep 28, 2023

etseidl mentioned this issue Sep 28, 2023

GH-37939: [C++] Use signed arithmetic for frame of reference when encoding DELTA_BINARY_PACKED #37940

Merged

github-actions bot assigned etseidl Sep 28, 2023

pitrou closed this as completed in #37940 Oct 3, 2023

pitrou added this to the 14.0.0 milestone Oct 3, 2023

abandy added a commit to abandy/arrow that referenced this issue Mar 16, 2024

apacheGH-37939: [Swift] initial impl of C Data interface

d1b0ba1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] Use signed arithmetic for frame of reference in DeltaBitPackEncoder #37939

[C++] Use signed arithmetic for frame of reference in DeltaBitPackEncoder #37939

etseidl commented Sep 28, 2023

[C++] Use signed arithmetic for frame of reference in DeltaBitPackEncoder #37939

[C++] Use signed arithmetic for frame of reference in DeltaBitPackEncoder #37939

Comments

etseidl commented Sep 28, 2023

Describe the enhancement requested

Component(s)