feat(parquet): selective null padding for list child readers #9848
Draft
HippoBaro wants to merge 5 commits intoapache:mainfrom
Draft
feat(parquet): selective null padding for list child readers #9848HippoBaro wants to merge 5 commits intoapache:mainfrom
HippoBaro wants to merge 5 commits intoapache:mainfrom
Conversation
Extend the existing `arrow_reader` runtime benchmarks with `Int32` and `FixedBinary32` list columns alongside the existing `StringList`, with parameterized null density (0%, 50%, 90%, 99%). The prior benchmarks only covered string lists, which didn't surface costs specific to fixed-width and primitive element types. Add a new `arrow_reader_peak_memory` benchmark that measures peak heap usage during `ListArrayReader::consume_batch` using a thread-local tracking allocator. It captures how RSS-efficient we are when materializing a column into its final Arrow in-memory representation. Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>
`InMemoryArrayReader` coupled a column's in-memory representation (an Arrow array) with its storage representation (def/rep levels) by assuming a 1:1 mapping between array elements and levels. This held when list readers consumed fully-padded child arrays — one element per level, nulls included. Upcoming work pushes null filtering from `ListArrayReader` down into the child reader at the storage level, breaking that 1:1 assumption: the child returns fewer array elements than levels, and the mapping between them depends on the filtering logic itself. Keeping the mock would mean reimplementing that logic: testing filtered output against a second, hand-rolled filter. Replace `InMemoryArrayReader` with real `PrimitiveArrayReader` instances backed by in-memory Parquet pages. Tests now accept raw non-null values and levels (matching what Parquet actually stores) and exercise the production `RecordReader` path. Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>
When reading nested list columns, the leaf `RecordReader` previously padded the values buffer at every null definition level, including positions corresponding to null or empty parent lists. The parent `ListArrayReader` then had to scan the child array with `MutableArrayData` to filter out those list-level padding entries before building offsets, copying the entire child array in the process. This commit pushes the filtering decision down into the `RecordReader` via a new `padding_threshold` parameter. When set (to the parent list's `def_level`), the `RecordReader` only pads item-level nulls (def >= threshold) and skips list-level entries, producing a compact child array. A compact null bitmap is accumulated alongside the values buffer so leaf readers can consume it directly. A `max_def_level()` method is added to the `ArrayReader` trait so that `MapArrayReader` can derive the struct definition level from the key reader's schema rather than a hardcoded formula. This reduces peak memory usage and eliminates a full-array copy for list columns with any null density. Perf improves most for fully dense columns, because the `MutableArrayData` overhead dominated. For sparse columns, the bitmap construction that drives the filtering becomes load-bearing. Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>
626fc9b to
54b90c7
Compare
The previous commit introduced selective null padding for list child readers, which makes bitmap construction from definition levels a load-bearing hot path: the leaf `RecordReader` builds a compact bitmap on every batch, and the `StructArrayReader` builds a validity bitmap from the same level data. Both were calling `BooleanBufferBuilder::append` per bit, which dominated CPU profiles at ~30% of list-read time. Add `BooleanBufferBuilder::append_word(word, count)` for bulk-appending pre-packed bits, and optimise `append()` to avoid the per-bit `advance(1)` overhead. Introduce `build_level_bitmap()` as a shared, word-at-a-time implementation that processes 64 definition levels per iteration: two comparison passes produce filter/value masks, a portable compress (PEXT equivalent) extracts the relevant bits, and append_word writes them in one operation. The comparison loops are straightforward enough for the compiler to auto-vectorise, benefiting from wider SIMD execution units where available. Recovers the runtime regressions from the selective-padding commit and adds further gains across all list types. Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>
Selective null padding for list child readers avoids materializing child slots for list-level NULLs, but the `ArrayReaderBuilder` batch-size hint made `RecordReader` preallocate value buffers from the parent batch size before decoding definition levels. This reintroduced a large fixed memory cost for sparse list columns, especially fixed-width children. Initialize value buffers lazily and reserve capacity after repetition and definition levels have been decoded. For selective padding, reserve from the compact item count (def >= padding threshold); otherwise reserve from the decoded level count. Add a `ValuesBuffer::reserve_exact` hook so primitive, offset, view, dictionary, and fixed-length byte-array buffers can reserve the logical output capacity for the next read. Remove the internal `GenericRecordReader` capacity plumbing now that capacity is derived at read time. Keep the public batch-size setting as an allocation hint, not an exact internal capacity contract, and preserve the existing full-padding reservation behavior for readers without a padding threshold. This keeps sparse list memory proportional to the child values that will actually be emitted without changing the public batch-size API semantics. Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>
54b90c7 to
0879144
Compare
Contributor
Author
|
Although the code is ready for review, this is a draft PR because it depends on #9847 and #9848. The contents of both dependencies are included here as well so reviewers can better understand the direction of these changes, which may be hard to appreciate in isolation. I'll rebase and undraft once those have been merged 🙇 Feedback is appreciated! |
This was referenced Apr 29, 2026
alamb
pushed a commit
that referenced
this pull request
May 5, 2026
…er` in tests (#9847) # Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. --> - Contributes to #9731 - Dependency of #9848 # Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> `InMemoryArrayReader` couples a column's in-memory representation (an Arrow array) with its storage representation (def/rep levels) and assumes a 1:1 mapping between array elements and levels. This holds when list readers consume fully-padded child arrays — one element per level, nulls included. Upcoming work (see #9848) pushes null filtering from `ListArrayReader` down into the child reader at the storage level, breaking that 1:1 assumption: the child returns fewer array elements than levels, and the mapping between them depends on the filtering logic itself. Keeping the mock would mean reimplementing that logic: testing filtered output against a second, hand-rolled filter. # What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Replace `InMemoryArrayReader` with real `PrimitiveArrayReader` instances backed by in-memory Parquet pages. Tests now accept raw non-null values and levels (matching what Parquet actually stores) and exercise the production `RecordReader` path. # Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> This is a net positive in test coverage. The existing tests now exercise real readers instead of in-memory and already-dense representations. # Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. If there are any breaking changes to public APIs, please call them out. --> None. Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
InMemoryArrayReaderwithPrimitiveArrayReaderin tests #9847ListArraybenchmarks for runtime and peak memory #9846Rationale for this change
Parquet list decoding currently materializes padding for null or empty parent lists and then copies the child array to filter that padding back out. This is expensive for nested list columns, especially sparse lists and fixed-width children, where memory can scale with decoded levels instead of actual emitted child values.
What changes are included in this PR?
This PR makes list child readers emit compact child arrays directly by pushing selective null padding into the leaf
RecordReader. It also builds definition-level validity bitmaps word-at-a-time, sizes child buffers after levels are decoded, and adds list runtime and peak-memory benchmarks across element types and null densities.Are these changes tested?
Extensive test coverage for the new logic through existing list reader tests, which now exercise the production
PrimitiveArrayReaderpath with in-memory Parquet pages (see #9846), andBooleanBufferBuilder::append_wordhas targeted unit coverage.Benchmarks results:
Are there any user-facing changes?
None.