Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Parquet] Add DELTA_BYTE_ARRAY encoder to Parquet writer #32863

Closed
asfimport opened this issue Sep 5, 2022 · 5 comments · Fixed by #14341
Closed

[C++][Parquet] Add DELTA_BYTE_ARRAY encoder to Parquet writer #32863

asfimport opened this issue Sep 5, 2022 · 5 comments · Fixed by #14341

Comments

@asfimport
Copy link
Collaborator

asfimport commented Sep 5, 2022

See examples/decode_benchmark.cc

Reporter: Wes McKinney / @wesm
Assignee: Shan Huang / @shanhuuang

Related issues:

PRs and other links:

Note: This issue was originally created as PARQUET-492. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Ian Cook / @ianmcook:
@rok  how do you think we would we would expose an option for whether to apply the DELTA_BYTE_ARRAY encoder when writing column(s) to Parquet? Do you think it would be in the same way we expose the option for dictionary encoding, e.g. here in PyArrow?

@asfimport
Copy link
Collaborator Author

Rok Mihevc / @rok:
@ianmcook yes I believe so.
While we already have the decoder we would need to implement the encoder for this to work though.

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:
There is actually already a column_encoding option in pyarrow's write_table (just encoding on the C++ side), where you can specify the exact encoding to use per column. So the APIs to expose this as an option are already there (which encoding is used by default, that's something else)

@asfimport
Copy link
Collaborator Author

Apache Arrow JIRA Bot:
This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned per project policy. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.

@pitrou pitrou added this to the 14.0.0 milestone Aug 21, 2023
pitrou added a commit that referenced this issue Aug 21, 2023
…er (#14341)

This is to add DELTA_BYTE_ARRAY encoder.
* Closes: #32863

Lead-authored-by: Rok Mihevc <rok@mihevc.org>
Co-authored-by: Rok <rok@mihevc.org>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Gang Wu <ustcwg@gmail.com>
Co-authored-by: mwish <1506118561@qq.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…t writer (apache#14341)

This is to add DELTA_BYTE_ARRAY encoder.
* Closes: apache#32863

Lead-authored-by: Rok Mihevc <rok@mihevc.org>
Co-authored-by: Rok <rok@mihevc.org>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Gang Wu <ustcwg@gmail.com>
Co-authored-by: mwish <1506118561@qq.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants