New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Go] delta_bit_packing Decode may panic #33483
Comments
Matthew Topol / @zeroshade: |
Matthew Topol / @zeroshade: Here's the code I used, let me know if you're able to replicate the panic as I can't replicate it on v9 either. func TestDeltaBitPacking(t *testing.T) {
f, err := os.Open("timestamp.data")
if err != nil {
t.Fatal(err)
}
defer f.Close()
values := make([]int64, 0)
scanner := bufio.NewScanner(f)
for scanner.Scan() {
v, err := strconv.ParseInt(scanner.Text(), 10, 64)
if err != nil {
t.Fatal(err)
}
values = append(values, v)
}
if err := scanner.Err(); err != nil {
t.Fatal(err)
}
col := schema.NewColumn(schema.MustPrimitive(schema.NewPrimitiveNode("foo", parquet.Repetitions.Required,
parquet.Types.Int64, -1, -1)), 0, 0)
enc := encoding.NewEncoder(parquet.Types.Int64, parquet.Encodings.DeltaBinaryPacked, false, col, memory.DefaultAllocator).(encoding.Int64Encoder)
enc.Put(values)
buf, err := enc.FlushValues()
if err != nil {
t.Fatal(err)
}
defer buf.Release()
dec := encoding.NewDecoder(parquet.Types.Int64, parquet.Encodings.DeltaBinaryPacked, col, memory.DefaultAllocator).(encoding.Int64Decoder)
dec.SetData(len(values), buf.Bytes())
out := make([]int64, len(values))
n, err := dec.Decode(out)
if err != nil {
t.Fatal(err)
}
assert.EqualValues(t, len(values), n)
assert.Equal(t, values, out)
} |
jun wang: func TestDeltaBitPacking(t *testing.T) {
f, err := os.Open("timestamp.data")
if err != nil {
t.Fatal(err)
}
defer f.Close()
values := make([]int64, 0)
scanner := bufio.NewScanner(f)
for scanner.Scan() {
v, err := strconv.ParseInt(scanner.Text(), 10, 64)
if err != nil {
t.Fatal(err)
}
values = append(values, v)
}
if err := scanner.Err(); err != nil {
t.Fatal(err)
}
col := schema.NewColumn(schema.MustPrimitive(schema.NewPrimitiveNode("foo", parquet.Repetitions.Required,
parquet.Types.Int64, -1, -1)), 0, 0)
enc := encoding.NewEncoder(parquet.Types.Int64, parquet.Encodings.DeltaBinaryPacked, false, col, memory.DefaultAllocator).(encoding.Int64Encoder)
enc.Put(values)
buf, err := enc.FlushValues()
if err != nil {
t.Fatal(err)
}
defer buf.Release()
dec := encoding.NewDecoder(parquet.Types.Int64, parquet.Encodings.DeltaBinaryPacked, col, memory.DefaultAllocator).(encoding.Int64Decoder)
dec.SetData(len(values), buf.Bytes())
ll := len(values)
for i := 0; i < ll; i += 1024 {
out := make([]int64, 1024)
n, err := dec.Decode(out)
if err != nil {
t.Fatal(err)
}
assert.Equal(t, values[:n], out)
values = values[n:]
}
assert.Equal(t, dec.ValuesLeft(), 0)
}
|
Matthew Topol / @zeroshade: |
Matthew Topol / @zeroshade: |
jun wang: |
Matthew Topol / @zeroshade: |
https://github.com/apache/arrow/blob/master/go/parquet/internal/encoding/delta_bit_packing.go
The DeltaBitPackInt32 and DeltaBitPackInt64 Decode method did not use d.nvals subtract decoded number at end, which lead streaming decode panic.
Also, when copy the decoded value to out, the end value should be
shared_utils.MinInt(int(d.valsPerMini), start + len(out))
When encode 68610 timestamp data, and decode 1024 value a batch, we encounter the panic
Environment: all release version
Reporter: jun wang
Assignee: Matthew Topol / @zeroshade
Original Issue Attachments:
PRs and other links:
Note: This issue was originally created as ARROW-18309. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: