Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Go] delta_bit_packing Decode may panic #33483

Closed
asfimport opened this issue Nov 11, 2022 · 7 comments
Closed

[Go] delta_bit_packing Decode may panic #33483

asfimport opened this issue Nov 11, 2022 · 7 comments

Comments

@asfimport
Copy link

https://github.com/apache/arrow/blob/master/go/parquet/internal/encoding/delta_bit_packing.go

The  DeltaBitPackInt32 and DeltaBitPackInt64 Decode method did not use d.nvals subtract decoded number at end, which lead streaming decode panic. 

Also, when copy the decoded value to out, the end value should be 
shared_utils.MinInt(int(d.valsPerMini), start + len(out))

When encode 68610 timestamp data, and decode 1024 value a batch, we encounter the panic

Environment: all release version
Reporter: jun wang
Assignee: Matthew Topol / @zeroshade

Original Issue Attachments:

PRs and other links:

Note: This issue was originally created as ARROW-18309. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Matthew Topol / @zeroshade:
I'll take a look at this, could you provide some sample code which encountered the panic?

@asfimport
Copy link
Author

Matthew Topol / @zeroshade:
Also, can you try testing with Arrow v10. There were some changes in the delta bit packing in v10 and so far I've been unable to replicate the panic you've mentioned using the data file you provided.

Here's the code I used, let me know if you're able to replicate the panic as I can't replicate it on v9 either.

func TestDeltaBitPacking(t *testing.T) {
	f, err := os.Open("timestamp.data")
	if err != nil {
		t.Fatal(err)
	}
	defer f.Close()

	values := make([]int64, 0)

	scanner := bufio.NewScanner(f)
	for scanner.Scan() {
		v, err := strconv.ParseInt(scanner.Text(), 10, 64)
		if err != nil {
			t.Fatal(err)
		}
		values = append(values, v)
	}

	if err := scanner.Err(); err != nil {
		t.Fatal(err)
	}

	col := schema.NewColumn(schema.MustPrimitive(schema.NewPrimitiveNode("foo", parquet.Repetitions.Required,
		parquet.Types.Int64, -1, -1)), 0, 0)
	enc := encoding.NewEncoder(parquet.Types.Int64, parquet.Encodings.DeltaBinaryPacked, false, col, memory.DefaultAllocator).(encoding.Int64Encoder)

	enc.Put(values)
	buf, err := enc.FlushValues()
	if err != nil {
		t.Fatal(err)
	}
	defer buf.Release()

	dec := encoding.NewDecoder(parquet.Types.Int64, parquet.Encodings.DeltaBinaryPacked, col, memory.DefaultAllocator).(encoding.Int64Decoder)
	dec.SetData(len(values), buf.Bytes())

	out := make([]int64, len(values))
	n, err := dec.Decode(out)
	if err != nil {
		t.Fatal(err)
	}
	assert.EqualValues(t, len(values), n)
	assert.Equal(t, values, out)
}

@asfimport
Copy link
Author

jun wang:
I tested Arrow V10,  it panic now, and the ValuesLeft() is broken. Here is the code I used

func TestDeltaBitPacking(t *testing.T) {
   f, err := os.Open("timestamp.data")
   if err != nil {
      t.Fatal(err)
   }
   defer f.Close()

   values := make([]int64, 0)

   scanner := bufio.NewScanner(f)
   for scanner.Scan() {
      v, err := strconv.ParseInt(scanner.Text(), 10, 64)
      if err != nil {
         t.Fatal(err)
      }
      values = append(values, v)
   }

   if err := scanner.Err(); err != nil {
      t.Fatal(err)
   }

   col := schema.NewColumn(schema.MustPrimitive(schema.NewPrimitiveNode("foo", parquet.Repetitions.Required,
      parquet.Types.Int64, -1, -1)), 0, 0)
   enc := encoding.NewEncoder(parquet.Types.Int64, parquet.Encodings.DeltaBinaryPacked, false, col, memory.DefaultAllocator).(encoding.Int64Encoder)

   enc.Put(values)
   buf, err := enc.FlushValues()
   if err != nil {
      t.Fatal(err)
   }
   defer buf.Release()

   dec := encoding.NewDecoder(parquet.Types.Int64, parquet.Encodings.DeltaBinaryPacked, col, memory.DefaultAllocator).(encoding.Int64Decoder)
   dec.SetData(len(values), buf.Bytes())

   ll := len(values)
   for i := 0; i < ll; i += 1024 {
      out := make([]int64, 1024)
      n, err := dec.Decode(out)
      if err != nil {
         t.Fatal(err)
      }
      assert.Equal(t, values[:n], out)
      values = values[n:]
   }
   assert.Equal(t, dec.ValuesLeft(), 0) 
}

 

@asfimport
Copy link
Author

Matthew Topol / @zeroshade:
Thanks! I'll take a look

@asfimport
Copy link
Author

Matthew Topol / @zeroshade:
[~lntotk] I've put up a PR to fix this issue. Please give it a try and verify for yourself. Thanks!

@asfimport
Copy link
Author

jun wang:
@zeroshade I verified this PR,  the issue was fixed. Thanks!

@asfimport
Copy link
Author

Matthew Topol / @zeroshade:
Issue resolved by pull request 14649
#14649

@asfimport asfimport added this to the 11.0.0 milestone Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants