New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GO]: pqarrow (github.com/apache/arrow/go/v9/parquet/pqarrow) cannot handle arrow's DICTIONARY field #33466
Comments
Matthew Topol / @zeroshade:
|
### Rationale for this change Implementing a kernel for computing the "unique" values in an arrow array, primarily for use in solving #33466. ### What changes are included in this PR? Adds a "unique" function to the compute list and helper convenience functions. ### Are these changes tested? Yes, unit tests are included. ### Are there any user-facing changes? Just the new available functions. * Closes: #34171 Authored-by: Matt Topol <zotthewizard@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
### Rationale for this change Implementing a kernel for computing the "unique" values in an arrow array, primarily for use in solving apache#33466. ### What changes are included in this PR? Adds a "unique" function to the compute list and helper convenience functions. ### Are these changes tested? Yes, unit tests are included. ### Are there any user-facing changes? Just the new available functions. * Closes: apache#34171 Authored-by: Matt Topol <zotthewizard@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
### Rationale for this change Implementing a kernel for computing the "unique" values in an arrow array, primarily for use in solving apache#33466. ### What changes are included in this PR? Adds a "unique" function to the compute list and helper convenience functions. ### Are these changes tested? Yes, unit tests are included. ### Are there any user-facing changes? Just the new available functions. * Closes: apache#34171 Authored-by: Matt Topol <zotthewizard@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
…34342) ### Rationale for this change The Parquet package should properly handle dictionary array types to allow consumers to efficiently read/write dictionary encoded arrays for Dictionary encoded parquet files. ### What changes are included in this PR? Updates and fixes to allow Parquet read/write directly to/from dictionary arrays. Because it requires the `Unique` and `Take` compute functions, the dictionary handling requires go1.18+ just like the compute package does. Updates the schema to handle dictionary types when storing the arrow schema. This also adds some new methods to the `ColumnWriter` interface and the `BinaryRecordReader` for handling Dictionaries. ### Are these changes tested? Yes, unit tests are added in the change. * Closes: #33466 Lead-authored-by: Matt Topol <zotthewizard@gmail.com> Co-authored-by: Will Jones <willjones127@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Hey, Arrow Go Dev:
I was trying to save some arrow tables out to parquet files, with the help of the "github.com/apache/arrow/go/v9/parquet/pqarrow" package. btw, it's generally a great design (of Arrow) and a great Go implementation.
However, one issue sticks out: in my original arrow Table I have some DICTIONARY fields, which pqarrow does NOT currently support.
I would assume supporting them will be quite straightward: just "denormalize" the DICTIONARY value into corresponding values (string, Timestamp, etc), and it's up to the Parquet to do the right trick (using DICTIONARY encoding, etc).
I would have done this conversion on-the-fly by myself, by converting each DICTIONARY field into underlying values. However, the arrow table schema is dynamic and outside my control, and I need to iterate through fields (maybe structs) to locate those) -> it would be much better if pqarrow can support this natively.
Can anyone help? thanks!
Reporter: Kevin Yang
Assignee: Matthew Topol / @zeroshade
Note: This issue was originally created as ARROW-18288. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: