Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Go] DeltaBitPack deltaBitWidths overflow #38399

Closed
Hattonuri opened this issue Oct 23, 2023 · 7 comments · Fixed by #38413
Closed

[Go] DeltaBitPack deltaBitWidths overflow #38399

Hattonuri opened this issue Oct 23, 2023 · 7 comments · Fixed by #38413
Assignees
Milestone

Comments

@Hattonuri
Copy link
Contributor

Describe the bug, including details regarding any error messages, version, and platform.

I wrote some int64 using c++ ::parquet::Encoding::DELTA_BINARY_PACKED
and then tried to read this with go but got:

panic: runtime error: index out of range [4] with length 4

goroutine 199 [running]:
github.com/apache/arrow/go/v14/parquet/internal/encoding.(*DeltaBitPackInt64Decoder).unpackNextMini(0xc1aee302a0)
        /home/dstasenko/go/pkg/mod/github.com/apache/arrow/go/v14@v14.0.0-20231023113027-37935604bf16/parquet/internal/encoding/delta_bit_packing.go:230 +0x2da
github.com/apache/arrow/go/v14/parquet/internal/encoding.(*DeltaBitPackInt64Decoder).Decode(0xc1aee302a0, {0xc1cc73b008?, 0xa7d50d?, 0xc1adb08360?})
        /home/dstasenko/go/pkg/mod/github.com/apache/arrow/go/v14@v14.0.0-20231023113027-37935604bf16/parquet/internal/encoding/delta_bit_packing.go:267 +0x118
github.com/apache/arrow/go/v14/parquet/file.(*Int64ColumnChunkReader).ReadBatch.func1(0x15a201, 0x167d01)
        /home/dstasenko/go/pkg/mod/github.com/apache/arrow/go/v14@v14.0.0-20231023113027-37935604bf16/parquet/file/column_reader_types.gen.go:93 +0xa3
github.com/apache/arrow/go/v14/parquet/file.(*columnChunkReader).readBatch(0xc1adb08360, 0x3c9c00, {0xc1d3f26000, 0x3c9c00, 0x3c9c00}, {0x0, 0x0, 0x0}, 0xc1aeb73c80)
        /home/dstasenko/go/pkg/mod/github.com/apache/arrow/go/v14@v14.0.0-20231023113027-37935604bf16/parquet/file/column_reader.go:508 +0xc2
github.com/apache/arrow/go/v14/parquet/file.(*Int64ColumnChunkReader).ReadBatch(0xc1aee2e3d0?, 0xc1aee36960?, {0xc1cbc6a000?, 0x2b21d60?, 0xc0001a9d58?}, {0xc1d3f26000?, 0x1a49d55?, 0xa?}, {0x0, 0x0, ...})

I assume that this line lacks boundaries check

d.deltaBitWidth = d.deltaBitWidths.Bytes()[int(d.miniBlockIdx)]

Component(s)

Go

@mapleFU
Copy link
Member

mapleFU commented Oct 23, 2023

Would you mind provide the test data which might trigger the bug here?

@Hattonuri
Copy link
Contributor Author

storage_000000000.parquet.txt
added .txt ending because github does not allow to upload parquet files

@Hattonuri
Copy link
Contributor Author

}
if d.currentMiniBlockVals == 0 {

probably you forgot "if err != nil return"

@mapleFU
Copy link
Member

mapleFU commented Oct 23, 2023

I've found if err != nil return, but it's still propogate an error

C++ is able to read file like this, let me investigate it.

@mapleFU
Copy link
Member

mapleFU commented Oct 23, 2023

I've find out the reason. Let me submit a fixing

@mapleFU
Copy link
Member

mapleFU commented Oct 23, 2023

@Hattonuri I've submit a fixing here: #38413

You can have a try at the new code.

@Hattonuri
Copy link
Contributor Author

Now it works! Thank you

zeroshade pushed a commit that referenced this issue Oct 24, 2023
…tData (#38413)

### Rationale for this change

As #38399 says. DeltaBitPack will corrupt when we meet a column chunk
with more than one page. During first page decoding, it works well. But when the second page comes, the
`d.usedFirst` haven't been reset, which cause the bug.

### What changes are included in this PR?

1. Some style enhancement
2. Bug fix

### Are these changes tested?

Currently not

### Are there any user-facing changes?

bugfix

* Closes: #38399

Authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
@zeroshade zeroshade added this to the 15.0.0 milestone Oct 24, 2023
JerAguilon pushed a commit to JerAguilon/arrow that referenced this issue Oct 25, 2023
…ter SetData (apache#38413)

### Rationale for this change

As apache#38399 says. DeltaBitPack will corrupt when we meet a column chunk
with more than one page. During first page decoding, it works well. But when the second page comes, the
`d.usedFirst` haven't been reset, which cause the bug.

### What changes are included in this PR?

1. Some style enhancement
2. Bug fix

### Are these changes tested?

Currently not

### Are there any user-facing changes?

bugfix

* Closes: apache#38399

Authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…ter SetData (apache#38413)

### Rationale for this change

As apache#38399 says. DeltaBitPack will corrupt when we meet a column chunk
with more than one page. During first page decoding, it works well. But when the second page comes, the
`d.usedFirst` haven't been reset, which cause the bug.

### What changes are included in this PR?

1. Some style enhancement
2. Bug fix

### Are these changes tested?

Currently not

### Are there any user-facing changes?

bugfix

* Closes: apache#38399

Authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…ter SetData (apache#38413)

### Rationale for this change

As apache#38399 says. DeltaBitPack will corrupt when we meet a column chunk
with more than one page. During first page decoding, it works well. But when the second page comes, the
`d.usedFirst` haven't been reset, which cause the bug.

### What changes are included in this PR?

1. Some style enhancement
2. Bug fix

### Are these changes tested?

Currently not

### Are there any user-facing changes?

bugfix

* Closes: apache#38399

Authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants