Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Go][Parquet] DeltaLengthByteArray encoding breaks with null values in go library #36318

Closed
convoi opened this issue Jun 27, 2023 · 0 comments · Fixed by #36322
Closed

[Go][Parquet] DeltaLengthByteArray encoding breaks with null values in go library #36318

convoi opened this issue Jun 27, 2023 · 0 comments · Fixed by #36322
Assignees
Milestone

Comments

@convoi
Copy link
Contributor

convoi commented Jun 27, 2023

Describe the bug, including details regarding any error messages, version, and platform.

The DeltaLengthByteArray encoding cannot handle null values in the go library.
It fails on
a) all nulls in one batch - here it reaches an index out of bounds error when initializing the delta bit packing decoder
b) some nulls in one batch - here it continues to decode the lengths for all rows and interprets actual data for lengths information. Data that later is missing when it wants to decode it.

I have a fix and will update the Issue as soon as the PR is created.

Component(s)

Go

@zeroshade zeroshade changed the title DeltaLengthByteArray encoding breaks with null values in go library [Go][Parquet] DeltaLengthByteArray encoding breaks with null values in go library Jun 27, 2023
zeroshade pushed a commit that referenced this issue Jul 17, 2023
… not for all nvalues. (#36322)

### Rationale for this change

Fixes issue 36318.
DeltaLengthBinaryArray Encoding fails to handle null values.

### What changes are included in this PR?

Instead of decoding lengths for "all" values (even the undefined ones), we only decode the lengths for the actually set values.
The Go Version of arrow was unable to read parquet files it produced itself if the mentioned encoding was used but the values contain nulls.

### Are these changes tested?

Tests are included.

### Are there any user-facing changes?

No.

**This PR contains a "Critical Fix".**
* Closes: #36318

Authored-by: Justin Heesemann <jheesemann@argo.ai>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
@zeroshade zeroshade added this to the 14.0.0 milestone Jul 17, 2023
chelseajonesr pushed a commit to chelseajonesr/arrow that referenced this issue Jul 20, 2023
…alues, not for all nvalues. (apache#36322)

### Rationale for this change

Fixes issue 36318.
DeltaLengthBinaryArray Encoding fails to handle null values.

### What changes are included in this PR?

Instead of decoding lengths for "all" values (even the undefined ones), we only decode the lengths for the actually set values.
The Go Version of arrow was unable to read parquet files it produced itself if the mentioned encoding was used but the values contain nulls.

### Are these changes tested?

Tests are included.

### Are there any user-facing changes?

No.

**This PR contains a "Critical Fix".**
* Closes: apache#36318

Authored-by: Justin Heesemann <jheesemann@argo.ai>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
R-JunmingChen pushed a commit to R-JunmingChen/arrow that referenced this issue Aug 20, 2023
…alues, not for all nvalues. (apache#36322)

### Rationale for this change

Fixes issue 36318.
DeltaLengthBinaryArray Encoding fails to handle null values.

### What changes are included in this PR?

Instead of decoding lengths for "all" values (even the undefined ones), we only decode the lengths for the actually set values.
The Go Version of arrow was unable to read parquet files it produced itself if the mentioned encoding was used but the values contain nulls.

### Are these changes tested?

Tests are included.

### Are there any user-facing changes?

No.

**This PR contains a "Critical Fix".**
* Closes: apache#36318

Authored-by: Justin Heesemann <jheesemann@argo.ai>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants