New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-33466: [Go][Parquet] Add support for Dictionary arrays to pqarrow #34342
Conversation
f045a1f
to
f0d5395
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Go-side changes look good to me, but I don't currently have the context for the Arrow-side changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. The arrow-go bits look good to me, but I unfortunately don't have the context for reviewing the Parquet bits right now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looked through the tests. Looks like you've covered the important cases there :) 👍
Co-authored-by: Will Jones <willjones127@gmail.com>
ba45bb6
to
c53b174
Compare
Benchmark runs are scheduled for baseline = 73e2b56 and contender = afe5514. afe5514 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Rationale for this change
The Parquet package should properly handle dictionary array types to allow consumers to efficiently read/write dictionary encoded arrays for Dictionary encoded parquet files.
What changes are included in this PR?
Updates and fixes to allow Parquet read/write directly to/from dictionary arrays. Because it requires the
Unique
andTake
compute functions, the dictionary handling requires go1.18+ just like the compute package does.Updates the schema to handle dictionary types when storing the arrow schema. This also adds some new methods to the
ColumnWriter
interface and theBinaryRecordReader
for handling Dictionaries.Are these changes tested?
Yes, unit tests are added in the change.