Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Go] [Parquet] panic writing with DeltaBinaryPacked when column only has nulls #35718

Closed
mslevine opened this issue May 23, 2023 · 4 comments · Fixed by #37112 or #39497
Closed

[Go] [Parquet] panic writing with DeltaBinaryPacked when column only has nulls #35718

mslevine opened this issue May 23, 2023 · 4 comments · Fixed by #37112 or #39497

Comments

@mslevine
Copy link

Describe the bug, including details regarding any error messages, version, and platform.

The writer panics if DeltaBinaryPacked encoding is requested for a column that is all null. The following small change to an existing test case demonstrates the issue.

In case it matters, I'm using go version go1.20.4 darwin/amd64
I ran into this issue using arrow v12.0.0, but this changed testcase fails using the current head of main.

diff --git a/go/parquet/pqarrow/encode_arrow_test.go b/go/parquet/pqarrow/encode_arrow_test.go
index 877f584f2..0b83ec696 100644
--- a/go/parquet/pqarrow/encode_arrow_test.go
+++ b/go/parquet/pqarrow/encode_arrow_test.go
@@ -380,6 +380,8 @@ func TestWriteEmptyLists(t *testing.T) {
 
        props := parquet.NewWriterProperties(
                parquet.WithVersion(parquet.V1_0),
+               parquet.WithDictionaryDefault(false),
+               parquet.WithEncoding(parquet.Encodings.DeltaBinaryPacked),
        )
        arrprops := pqarrow.DefaultWriterProps()
        var buf bytes.Buffer

Component(s)

Go

@mslevine
Copy link
Author

Reading more code, it seems possible the fix could be as simple as

diff --git a/go/parquet/internal/encoding/delta_bit_packing.go b/go/parquet/internal/encoding/delta_bit_packing.go
index 2ebe6ad98..d68044a07 100644
--- a/go/parquet/internal/encoding/delta_bit_packing.go
+++ b/go/parquet/internal/encoding/delta_bit_packing.go
@@ -458,7 +458,11 @@ func (enc *deltaBitPackEncoder) FlushValues() (Buffer, error) {
 
 // EstimatedDataEncodedSize returns the current amount of data actually flushed out and written
 func (enc *deltaBitPackEncoder) EstimatedDataEncodedSize() int64 {
-       return int64(enc.bitWriter.Written())
+       if enc.bitWriter != nil {
+               return int64(enc.bitWriter.Written())
+       } else {
+               return 0
+       }
 }
 
 // DeltaBitPackInt32Encoder is an encoder for the delta bitpacking encoding for int32 data.

(although I could easily believe that the correct response when nil is some constant other than 0)

@zeroshade
Copy link
Member

@mslevine Thanks for filing this! Would you be interested in filing a PR with a new test case and your proposed fix? 😄 It should automatically tag me as a codeowner to review it if you do.

If you're not interested, I'll try to get to this when I have some time.

zeroshade pushed a commit that referenced this issue Aug 15, 2023
…Bytes (#37112)

### Rationale for this change

See #37102. Also fixes #35718

Golang BitWriter didn't reserve when `ReserveBytes` called. So it make it wrong.

### What changes are included in this PR?

Change `ReserveBytes` impl

### Are these changes tested?

Currently not

### Are there any user-facing changes?

bugfix

* Closes: #37102

Lead-authored-by: Matt Topol <zotthewizard@gmail.com>
Co-authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
@abhimanyusinghgaur
Copy link

abhimanyusinghgaur commented Sep 20, 2023

Hi @zeroshade @mapleFU, This issue isn't fixed by #37112. Can you please reopen it?

I have tested it locally on the current main and it fails. The proposed fix in this issue seems to work fine as a solution.

@zeroshade
Copy link
Member

@abhimanyusinghgaur I can reopen this issue, after which can you file a PR with your proposed fix and add a unit test case that can replicate the crash without the fix?

@zeroshade zeroshade reopened this Sep 21, 2023
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…eserveBytes (apache#37112)

### Rationale for this change

See apache#37102. Also fixes apache#35718

Golang BitWriter didn't reserve when `ReserveBytes` called. So it make it wrong.

### What changes are included in this PR?

Change `ReserveBytes` impl

### Are these changes tested?

Currently not

### Are there any user-facing changes?

bugfix

* Closes: apache#37102

Lead-authored-by: Matt Topol <zotthewizard@gmail.com>
Co-authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
MagicBoost pushed a commit to MagicBoost/arrow that referenced this issue Jan 8, 2024
MagicBoost pushed a commit to MagicBoost/arrow that referenced this issue Jan 8, 2024
zeroshade added a commit that referenced this issue Jan 9, 2024
### Rationale for this change

closes: #35718 

### What changes are included in this PR?

Fix painc writing with DeltaBinaryPacked or DeltaByteArray when column only has nulls

### Are these changes tested?

Yes

- add a test writing nulls to columns with DeltaBinaryPacked / DeltaByteArray / DeltaLengthByteArray encodings

### Are there any user-facing changes?

No

* Closes: #35718

Lead-authored-by: yufanmo <yufan.mo@transwarp.io>
Co-authored-by: Matt Topol <zotthewizard@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
@zeroshade zeroshade added this to the 16.0.0 milestone Jan 9, 2024
clayburn pushed a commit to clayburn/arrow that referenced this issue Jan 23, 2024
…e#39497)

### Rationale for this change

closes: apache#35718 

### What changes are included in this PR?

Fix painc writing with DeltaBinaryPacked or DeltaByteArray when column only has nulls

### Are these changes tested?

Yes

- add a test writing nulls to columns with DeltaBinaryPacked / DeltaByteArray / DeltaLengthByteArray encodings

### Are there any user-facing changes?

No

* Closes: apache#35718

Lead-authored-by: yufanmo <yufan.mo@transwarp.io>
Co-authored-by: Matt Topol <zotthewizard@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…e#39497)

### Rationale for this change

closes: apache#35718 

### What changes are included in this PR?

Fix painc writing with DeltaBinaryPacked or DeltaByteArray when column only has nulls

### Are these changes tested?

Yes

- add a test writing nulls to columns with DeltaBinaryPacked / DeltaByteArray / DeltaLengthByteArray encodings

### Are there any user-facing changes?

No

* Closes: apache#35718

Lead-authored-by: yufanmo <yufan.mo@transwarp.io>
Co-authored-by: Matt Topol <zotthewizard@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this issue Feb 28, 2024
…e#39497)

### Rationale for this change

closes: apache#35718 

### What changes are included in this PR?

Fix painc writing with DeltaBinaryPacked or DeltaByteArray when column only has nulls

### Are these changes tested?

Yes

- add a test writing nulls to columns with DeltaBinaryPacked / DeltaByteArray / DeltaLengthByteArray encodings

### Are there any user-facing changes?

No

* Closes: apache#35718

Lead-authored-by: yufanmo <yufan.mo@transwarp.io>
Co-authored-by: Matt Topol <zotthewizard@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants