-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++][Parquet] Encoding: DELTA_BYTE_ARRAY not memcpy when possible #37873
Comments
pitrou
added a commit
that referenced
this issue
Oct 3, 2023
…ssible (#37874) ### Rationale for this change When decoding DELTA_BYTE_ARRAY data, if the prefix (respectively suffix) is empty, we don't need to recreate the original string by copying the data into a new buffer, we can just point to the existing suffix (respectively suffix). ### What changes are included in this PR? Avoid spurious memory copies in the DELTA_BYTE_ARRAY decoder (also reducing the memory footprint when decoding). Benchmark numbers show that decoding can be up to 2x faster. ### Are these changes tested? Yes, already tested. ### Are there any user-facing changes? No. * Closes: #37873 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
JerAguilon
pushed a commit
to JerAguilon/arrow
that referenced
this issue
Oct 23, 2023
…hen possible (apache#37874) ### Rationale for this change When decoding DELTA_BYTE_ARRAY data, if the prefix (respectively suffix) is empty, we don't need to recreate the original string by copying the data into a new buffer, we can just point to the existing suffix (respectively suffix). ### What changes are included in this PR? Avoid spurious memory copies in the DELTA_BYTE_ARRAY decoder (also reducing the memory footprint when decoding). Benchmark numbers show that decoding can be up to 2x faster. ### Are these changes tested? Yes, already tested. ### Are there any user-facing changes? No. * Closes: apache#37873 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
loicalleyne
pushed a commit
to loicalleyne/arrow
that referenced
this issue
Nov 13, 2023
…hen possible (apache#37874) ### Rationale for this change When decoding DELTA_BYTE_ARRAY data, if the prefix (respectively suffix) is empty, we don't need to recreate the original string by copying the data into a new buffer, we can just point to the existing suffix (respectively suffix). ### What changes are included in this PR? Avoid spurious memory copies in the DELTA_BYTE_ARRAY decoder (also reducing the memory footprint when decoding). Benchmark numbers show that decoding can be up to 2x faster. ### Are these changes tested? Yes, already tested. ### Are there any user-facing changes? No. * Closes: apache#37873 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
dgreiss
pushed a commit
to dgreiss/arrow
that referenced
this issue
Feb 19, 2024
…hen possible (apache#37874) ### Rationale for this change When decoding DELTA_BYTE_ARRAY data, if the prefix (respectively suffix) is empty, we don't need to recreate the original string by copying the data into a new buffer, we can just point to the existing suffix (respectively suffix). ### What changes are included in this PR? Avoid spurious memory copies in the DELTA_BYTE_ARRAY decoder (also reducing the memory footprint when decoding). Benchmark numbers show that decoding can be up to 2x faster. ### Are these changes tested? Yes, already tested. ### Are there any user-facing changes? No. * Closes: apache#37873 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the enhancement requested
DELTA_BYTE_ARRAY
will first decode the prefix and postfix, then Decode will memcpy the data. If prefix-length == 0 or posfix-length == 0, we can avoid memcpy if possibleComponent(s)
C++, Parquet
The text was updated successfully, but these errors were encountered: