[C++][Parquet] Encoding: DELTA_BYTE_ARRAY not memcpy when possible #37873

mapleFU · 2023-09-26T14:41:35Z

Describe the enhancement requested

DELTA_BYTE_ARRAY will first decode the prefix and postfix, then Decode will memcpy the data. If prefix-length == 0 or posfix-length == 0, we can avoid memcpy if possible

Component(s)

C++, Parquet

The text was updated successfully, but these errors were encountered:

…ssible (#37874) ### Rationale for this change When decoding DELTA_BYTE_ARRAY data, if the prefix (respectively suffix) is empty, we don't need to recreate the original string by copying the data into a new buffer, we can just point to the existing suffix (respectively suffix). ### What changes are included in this PR? Avoid spurious memory copies in the DELTA_BYTE_ARRAY decoder (also reducing the memory footprint when decoding). Benchmark numbers show that decoding can be up to 2x faster. ### Are these changes tested? Yes, already tested. ### Are there any user-facing changes? No. * Closes: #37873 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

…hen possible (apache#37874) ### Rationale for this change When decoding DELTA_BYTE_ARRAY data, if the prefix (respectively suffix) is empty, we don't need to recreate the original string by copying the data into a new buffer, we can just point to the existing suffix (respectively suffix). ### What changes are included in this PR? Avoid spurious memory copies in the DELTA_BYTE_ARRAY decoder (also reducing the memory footprint when decoding). Benchmark numbers show that decoding can be up to 2x faster. ### Are these changes tested? Yes, already tested. ### Are there any user-facing changes? No. * Closes: apache#37873 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

mapleFU added the Type: enhancement label Sep 26, 2023

github-actions bot added Component: Parquet Component: C++ labels Sep 26, 2023

github-actions bot mentioned this issue Sep 26, 2023

GH-37873: [C++][Parquet] DELTA_BYTE_ARRAY: avoid copying data when possible #37874

Merged

github-actions bot assigned mapleFU Sep 26, 2023

pitrou closed this as completed in #37874 Oct 3, 2023

pitrou added this to the 14.0.0 milestone Oct 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++][Parquet] Encoding: DELTA_BYTE_ARRAY not memcpy when possible #37873

[C++][Parquet] Encoding: DELTA_BYTE_ARRAY not memcpy when possible #37873

mapleFU commented Sep 26, 2023

[C++][Parquet] Encoding: DELTA_BYTE_ARRAY not memcpy when possible #37873

[C++][Parquet] Encoding: DELTA_BYTE_ARRAY not memcpy when possible #37873

Comments

mapleFU commented Sep 26, 2023

Describe the enhancement requested

Component(s)