Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Go][Parquet] Handle Boolean RLE encoding/decoding #38462

Closed
zeroshade opened this issue Oct 25, 2023 · 0 comments · Fixed by #38367
Closed

[Go][Parquet] Handle Boolean RLE encoding/decoding #38462

zeroshade opened this issue Oct 25, 2023 · 0 comments · Fixed by #38367

Comments

@zeroshade
Copy link
Member

Describe the enhancement requested

In addition to the plain boolean encoding (done as a bitmap) we should also support the bit-packed RLE/hybrid boolean encoding/decoding for parquet files in the Go Parquet implementation. This is also relevant more now because several of the parquet-testing repo files have been updated to utilize boolean columns that have the RLE encoding type.

Component(s)

Go, Parquet

zeroshade added a commit that referenced this issue Oct 30, 2023
### Rationale for this change
Looks like the parquet-testing repo files have been updated and now include boolean columns which use the RLE encoding type. This causes the Go parquet lib to fail verification tests when it pulls the most recent commits for the parquet-testing repository. So a solution for this is to actually implement the RleBoolean encoder and decoder.

### What changes are included in this PR?
Adding `RleBooleanEncoder` and `RleBooleanDecoder` and updating the `parquet-testing` repo.

### Are these changes tested?
Unit tests are added, and this is also tested via the `parquet-testing` golden files.

* Closes: #38345
* Closes: #38462

Lead-authored-by: Matt Topol <zotthewizard@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
@zeroshade zeroshade added this to the 15.0.0 milestone Oct 30, 2023
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…pache#38367)

### Rationale for this change
Looks like the parquet-testing repo files have been updated and now include boolean columns which use the RLE encoding type. This causes the Go parquet lib to fail verification tests when it pulls the most recent commits for the parquet-testing repository. So a solution for this is to actually implement the RleBoolean encoder and decoder.

### What changes are included in this PR?
Adding `RleBooleanEncoder` and `RleBooleanDecoder` and updating the `parquet-testing` repo.

### Are these changes tested?
Unit tests are added, and this is also tested via the `parquet-testing` golden files.

* Closes: apache#38345
* Closes: apache#38462

Lead-authored-by: Matt Topol <zotthewizard@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…pache#38367)

### Rationale for this change
Looks like the parquet-testing repo files have been updated and now include boolean columns which use the RLE encoding type. This causes the Go parquet lib to fail verification tests when it pulls the most recent commits for the parquet-testing repository. So a solution for this is to actually implement the RleBoolean encoder and decoder.

### What changes are included in this PR?
Adding `RleBooleanEncoder` and `RleBooleanDecoder` and updating the `parquet-testing` repo.

### Are these changes tested?
Unit tests are added, and this is also tested via the `parquet-testing` golden files.

* Closes: apache#38345
* Closes: apache#38462

Lead-authored-by: Matt Topol <zotthewizard@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant