Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Go] Panic reading nested empty list #35202

Closed
minyoung opened this issue Apr 18, 2023 · 7 comments · Fixed by #35276
Closed

[Go] Panic reading nested empty list #35202

minyoung opened this issue Apr 18, 2023 · 7 comments · Fixed by #35276
Assignees
Milestone

Comments

@minyoung
Copy link
Contributor

Describe the bug, including details regarding any error messages, version, and platform.

We have some data that is causing a panic to happen. I've stripped the schema and data as much as I could and created a test case that panics:

func (ps *ParquetIOTestSuite) TestNestedEmptyList() {
	bldr := array.NewStructBuilder(memory.DefaultAllocator, arrow.StructOf(
		arrow.Field{
			Name:     "root",
			Type: arrow.StructOf(
				arrow.Field{
					Name:     "child1",
					Type: arrow.ListOf(arrow.StructOf(
						arrow.Field{
							Name:     "child2",
							Type: arrow.ListOf(arrow.StructOf(
								arrow.Field{
									Name:     "name",
									Type:     arrow.BinaryTypes.String,
								},
							)),
						},
					)),
				},
			),
		},
	))
	defer bldr.Release()

	rootBldr := bldr.FieldBuilder(0).(*array.StructBuilder)
	child1Bldr := rootBldr.FieldBuilder(0).(*array.ListBuilder)
	child1ElBldr := child1Bldr.ValueBuilder().(*array.StructBuilder)
	child2Bldr := child1ElBldr.FieldBuilder(0).(*array.ListBuilder)

	// target structure
	// {
	//   "root": {
	//     "child1": [
	//       { "child2: [] }
	//     ]
	//   }
	// }

	bldr.Append(true)
	rootBldr.Append(true)
	child1Bldr.Append(true)

	child1ElBldr.Append(true)
	child2Bldr.Append(true)

	arr := bldr.NewArray()
	defer arr.Release()

	field := arrow.Field{Name: "x", Type: arr.DataType(), Nullable: true}
	expected := array.NewTable(
		arrow.NewSchema([]arrow.Field{field}, nil),
		[]arrow.Column{*arrow.NewColumn(field, arrow.NewChunked(field.Type, []arrow.Array{arr}))},
		-1,
	)
	defer expected.Release()

	ps.roundTripTable(expected, false)
}

Test output

$ go test ./parquet/pqarrow -run TestParquetArrowIO/TestNestedEmptyList
panic: runtime error: index out of range [0] with length 0

goroutine 41 [running]:
github.com/apache/arrow/go/v11/parquet/internal/utils.NewFirstTimeBitmapWriter(...)
        /Users/minyoung/repo/arrow/go/parquet/internal/utils/bitmap_writer.go:83
github.com/apache/arrow/go/v11/parquet/file.defLevelsToBitmapInternal({0x140003eca00, 0x1, 0x20}, {0x4f008e0?, 0x1?, 0x0?, 0x5108?}, 0x14000155160, 0xc8?)
        /Users/minyoung/repo/arrow/go/parquet/file/level_conversion.go:173 +0x1dc
github.com/apache/arrow/go/v11/parquet/file.DefLevelsToBitmap(...)
        /Users/minyoung/repo/arrow/go/parquet/file/level_conversion.go:186
github.com/apache/arrow/go/v11/parquet/pqarrow.(*structReader).BuildArray(0x140003fdec0, 0x0)
        /Users/minyoung/repo/arrow/go/parquet/pqarrow/column_readers.go:279 +0x148
github.com/apache/arrow/go/v11/parquet/pqarrow.(*listReader).BuildArray(0x1400040e140, 0x1)
        /Users/minyoung/repo/arrow/go/parquet/pqarrow/column_readers.go:396 +0x334
github.com/apache/arrow/go/v11/parquet/pqarrow.(*structReader).BuildArray(0x14000414000, 0x1)
        /Users/minyoung/repo/arrow/go/parquet/pqarrow/column_readers.go:289 +0x3b0
github.com/apache/arrow/go/v11/parquet/pqarrow.(*listReader).BuildArray(0x1400040e180, 0x1)
        /Users/minyoung/repo/arrow/go/parquet/pqarrow/column_readers.go:396 +0x334
github.com/apache/arrow/go/v11/parquet/pqarrow.(*structReader).BuildArray(0x14000414120, 0x1)
        /Users/minyoung/repo/arrow/go/parquet/pqarrow/column_readers.go:289 +0x3b0
github.com/apache/arrow/go/v11/parquet/pqarrow.(*structReader).BuildArray(0x14000414240, 0x1)
        /Users/minyoung/repo/arrow/go/parquet/pqarrow/column_readers.go:289 +0x3b0
github.com/apache/arrow/go/v11/parquet/pqarrow.(*ColumnReader).NextBatch(0x140003df490, 0x1400039c6b0?)
        /Users/minyoung/repo/arrow/go/parquet/pqarrow/file_reader.go:134 +0x68
github.com/apache/arrow/go/v11/parquet/pqarrow.(*FileReader).ReadColumn(0x1400039c780?, {0x140003dc8a8?, 0x14000125b60?, 0x0?}, 0x1400040e0c0?)
        /Users/minyoung/repo/arrow/go/parquet/pqarrow/file_reader.go:247 +0x78
github.com/apache/arrow/go/v11/parquet/pqarrow.(*FileReader).ReadRowGroups.func1()
        /Users/minyoung/repo/arrow/go/parquet/pqarrow/file_reader.go:341 +0xa4
created by github.com/apache/arrow/go/v11/parquet/pqarrow.(*FileReader).ReadRowGroups
        /Users/minyoung/repo/arrow/go/parquet/pqarrow/file_reader.go:332 +0x27c
FAIL    github.com/apache/arrow/go/v11/parquet/pqarrow  0.407s
FAIL

Component(s)

Go

@minyoung
Copy link
Contributor Author

Could I be building an empty list incorrectly?

@zeroshade
Copy link
Member

@minyoung looks like you're building it just fine, I've figured out the issue and put up a PR to fix it which has been linked here. Please take a look and confirm that this fix works for your data that caused the original issue. (I used your example to add a unit test to the code in the PR which passes, but it would still be good to confirm the original cause is solved too)

@minyoung
Copy link
Contributor Author

Thanks for having a look @zeroshade!

Some of the panics are gone now 🎉
Unsure if related or not but we do get a panic with a slightly more populated version of our dataset:

func (ps *ParquetIOTestSuite) TestNestedEmptyList() {
	bldr := array.NewStructBuilder(memory.DefaultAllocator, arrow.StructOf(
		arrow.Field{
			Nullable: true,
			Name:     "root",
			Type: arrow.StructOf(
				arrow.Field{
					Nullable: true,
					Name:     "child1",
					Type: arrow.ListOf(arrow.StructOf(
						arrow.Field{
							Nullable: true,
							Name:     "child2",
							Type: arrow.ListOf(arrow.StructOf(
								arrow.Field{
									Nullable: true,
									Name:     "name",
									Type:     arrow.BinaryTypes.String,
								},
							)),
						},
					)),
				},
			),
		},
	))
	defer bldr.Release()

	rootBldr := bldr.FieldBuilder(0).(*array.StructBuilder)
	child1Bldr := rootBldr.FieldBuilder(0).(*array.ListBuilder)
	child1ElBldr := child1Bldr.ValueBuilder().(*array.StructBuilder)
	child2Bldr := child1ElBldr.FieldBuilder(0).(*array.ListBuilder)
	leafBldr := child2Bldr.ValueBuilder().(*array.StructBuilder)
	nameBldr := leafBldr.FieldBuilder(0).(*array.StringBuilder)

	// target structure 8 times
	// {
	//   "root": {
	//     "child1": [
	//       { "child2": [{ "name": "foo" }] },
	//       { "child2": [] }
	//     ]
	//   }
	// }

	for i := 0; i < 8; i++ {
		bldr.Append(true)
		rootBldr.Append(true)
		child1Bldr.Append(true)

		child1ElBldr.Append(true)
		child2Bldr.Append(true)
		leafBldr.Append(true)
		nameBldr.Append("foo")

		child1ElBldr.Append(true)
		child2Bldr.Append(true)
	}

	arr := bldr.NewArray()
	defer arr.Release()

	field := arrow.Field{Name: "x", Type: arr.DataType(), Nullable: true}
	expected := array.NewTable(
		arrow.NewSchema([]arrow.Field{field}, nil),
		[]arrow.Column{*arrow.NewColumn(field, arrow.NewChunked(field.Type, []arrow.Array{arr}))},
		-1,
	)
	defer expected.Release()

	ps.roundTripTable(expected, false)
}

@zeroshade
Copy link
Member

@minyoung Have you tested that version with the branch that fixes the panics in the PR?

When I run the sample code you've provided (or update the test in the branch to match) I'm not able to replicate the panic in the branch with the fixes that is attached to this issue.

@minyoung
Copy link
Contributor Author

@zeroshade I do still encounter a panic. I've created a PR (based on the referenced branch) and the automated tests there also panics

@zeroshade
Copy link
Member

@minyoung Sorry for the delay here, and thanks for providing that PR with the failing test. I've updated the pr #35276 and found the solution to that panic. Please give it a try to confirm it works with your data, or we can just continue our game of whack-a-panic 😄

@minyoung
Copy link
Contributor Author

minyoung commented May 4, 2023

No worries @zeroshade, thanks for having a look. Yes, we no longer get a panic with our dataset from before 🎉
Thanks so much!

zeroshade added a commit that referenced this issue May 5, 2023
### What changes are included in this PR?
A simple check in the struct reader to validate that we're trying to read something with at least one child before we start trying to build a nullbitmap that wouldn't exist if there are 0 elements in the actual array.

### Are these changes tested?
Yes a unit test is included in this change.

* Closes: #35202

Authored-by: Matt Topol <zotthewizard@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
@zeroshade zeroshade added this to the 13.0.0 milestone May 5, 2023
liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this issue May 11, 2023
…ache#35276)

### What changes are included in this PR?
A simple check in the struct reader to validate that we're trying to read something with at least one child before we start trying to build a nullbitmap that wouldn't exist if there are 0 elements in the actual array.

### Are these changes tested?
Yes a unit test is included in this change.

* Closes: apache#35202

Authored-by: Matt Topol <zotthewizard@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this issue May 15, 2023
…ache#35276)

### What changes are included in this PR?
A simple check in the struct reader to validate that we're trying to read something with at least one child before we start trying to build a nullbitmap that wouldn't exist if there are 0 elements in the actual array.

### Are these changes tested?
Yes a unit test is included in this change.

* Closes: apache#35202

Authored-by: Matt Topol <zotthewizard@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
rtpsw pushed a commit to rtpsw/arrow that referenced this issue May 16, 2023
…ache#35276)

### What changes are included in this PR?
A simple check in the struct reader to validate that we're trying to read something with at least one child before we start trying to build a nullbitmap that wouldn't exist if there are 0 elements in the actual array.

### Are these changes tested?
Yes a unit test is included in this change.

* Closes: apache#35202

Authored-by: Matt Topol <zotthewizard@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants