feat(parquet/pqarrow): support writing LARGE_LIST types#838
Open
lidavidm wants to merge 1 commit into
Open
Conversation
Closes apache#834. Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds end-to-end support for Arrow LARGE_LIST types in the parquet/pqarrow integration so large-list arrays can be written to (and read back from) Parquet, including schema handling and regression tests for #834.
Changes:
- Extend schema/type handling to preserve
LARGE_LISTduring (de)serialization and nested type reconstruction. - Add write-path traversal support for
array.LargeListand read-path support by expanding list offsets toint64. - Add regression/round-trip tests covering nullable large lists, empty lists, and stored schema behavior.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| parquet/pqarrow/schema.go | Teach nested-type reconstruction to rebuild LargeList when the original schema used it. |
| parquet/pqarrow/path_builder.go | Add visitor support for array.LargeList by using int64 offsets in path building. |
| parquet/pqarrow/path_builder_test.go | Add regression test validating def/rep levels for nullable large-list scenarios. |
| parquet/pqarrow/file_writer.go | Normalize LargeList element field names to element when storing Arrow schema metadata. |
| parquet/pqarrow/file_reader.go | Ensure LARGE_LIST fields are routed through list reader construction. |
| parquet/pqarrow/column_readers.go | When reading LARGE_LIST, convert computed int32 offsets buffer into an int64 offsets buffer. |
| parquet/pqarrow/encode_arrow_test.go | Add round-trip + store-schema regression tests for large lists, plus minor whitespace cleanup. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+2027
to
+2032
| cnk := arrow.NewChunked(field.Type, []arrow.Array{arr}) | ||
| defer arr.Release() | ||
|
|
||
| tbl := array.NewTable(arrow.NewSchema([]arrow.Field{field}, nil), []arrow.Column{*arrow.NewColumn(field, cnk)}, -1) | ||
| defer cnk.Release() | ||
| defer tbl.Release() |
zeroshade
reviewed
Jun 5, 2026
| buffers[0] = validityBuffer | ||
| } | ||
|
|
||
| if lr.field.Type.ID() == arrow.LARGE_LIST { |
Member
There was a problem hiding this comment.
Could we modify DefRepLevelsToListInfo to be generic and take []int64 directly to avoid having to first allocate and create the []int32 and then allocate again and copy everything over?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rationale for this change
We can't write large list to a Parquet file.
What changes are included in this PR?
Implement support for large list in pqarrow.
Are these changes tested?
Yes
Are there any user-facing changes?
No
Assisted-by: Claude Opus 4.6 noreply@anthropic.com