feat(parquet/pqarrow): support writing LARGE_LIST types by lidavidm · Pull Request #838 · apache/arrow-go

lidavidm · 2026-06-03T07:51:42Z

Rationale for this change

We can't write large list to a Parquet file.

What changes are included in this PR?

Implement support for large list in pqarrow.

Are these changes tested?

Yes

Are there any user-facing changes?

No

Assisted-by: Claude Opus 4.6 noreply@anthropic.com

Closes apache#834. Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds end-to-end support for Arrow LARGE_LIST types in the parquet/pqarrow integration so large-list arrays can be written to (and read back from) Parquet, including schema handling and regression tests for #834.

Changes:

Extend schema/type handling to preserve LARGE_LIST during (de)serialization and nested type reconstruction.
Add write-path traversal support for array.LargeList and read-path support by expanding list offsets to int64.
Add regression/round-trip tests covering nullable large lists, empty lists, and stored schema behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
parquet/pqarrow/schema.go	Teach nested-type reconstruction to rebuild `LargeList` when the original schema used it.
parquet/pqarrow/path_builder.go	Add visitor support for `array.LargeList` by using `int64` offsets in path building.
parquet/pqarrow/path_builder_test.go	Add regression test validating def/rep levels for nullable large-list scenarios.
parquet/pqarrow/file_writer.go	Normalize `LargeList` element field names to `element` when storing Arrow schema metadata.
parquet/pqarrow/file_reader.go	Ensure `LARGE_LIST` fields are routed through list reader construction.
parquet/pqarrow/column_readers.go	When reading `LARGE_LIST`, convert computed `int32` offsets buffer into an `int64` offsets buffer.
parquet/pqarrow/encode_arrow_test.go	Add round-trip + store-schema regression tests for large lists, plus minor whitespace cleanup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+	cnk := arrow.NewChunked(field.Type, []arrow.Array{arr})
+	defer arr.Release()
+
+	tbl := array.NewTable(arrow.NewSchema([]arrow.Field{field}, nil), []arrow.Column{*arrow.NewColumn(field, cnk)}, -1)
+	defer cnk.Release()
+	defer tbl.Release()


zeroshade · 2026-06-05T18:15:13Z

 		buffers[0] = validityBuffer
 	}

+	if lr.field.Type.ID() == arrow.LARGE_LIST {


Could we modify DefRepLevelsToListInfo to be generic and take []int64 directly to avoid having to first allocate and create the []int32 and then allocate again and copy everything over?

feat(parquet/pqarrow): support writing LARGE_LIST types

8c2ab5b

Closes apache#834. Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>

lidavidm requested a review from zeroshade as a code owner June 3, 2026 07:51

lidavidm marked this pull request as draft June 3, 2026 07:51

lidavidm marked this pull request as ready for review June 3, 2026 22:32

zeroshade requested a review from Copilot June 5, 2026 18:07

Copilot started reviewing on behalf of zeroshade June 5, 2026 18:08 View session

Copilot AI reviewed Jun 5, 2026

View reviewed changes

zeroshade reviewed Jun 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(parquet/pqarrow): support writing LARGE_LIST types#838

feat(parquet/pqarrow): support writing LARGE_LIST types#838
lidavidm wants to merge 1 commit into
apache:mainfrom
lidavidm:gh-834

lidavidm commented Jun 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

zeroshade Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lidavidm commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

zeroshade Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lidavidm commented Jun 3, 2026 •

edited

Loading