Skip to content

docs(arrow-select): document FilterSelection / FilterPredicate::selection (docs for #9755)#10056

Draft
alamb wants to merge 2 commits into
apache:mainfrom
alamb:alamb/filter-selection-docs
Draft

docs(arrow-select): document FilterSelection / FilterPredicate::selection (docs for #9755)#10056
alamb wants to merge 2 commits into
apache:mainfrom
alamb:alamb/filter-selection-docs

Conversation

@alamb
Copy link
Copy Markdown
Contributor

@alamb alamb commented Jun 3, 2026

Which issue does this PR close?

Documentation follow-up for #9755 (arrow-select: fuse inline Utf8View/BinaryView filter coalescing).

Note

This branch is stacked on top of #9755 — that PR is not yet merged to main, so the diff here also shows its feature commit. The contribution in this PR is the single docs commit on top (arrow-select/src/filter.rs only); the intent is to fold it into #9755 (or merge alongside it).

Rationale for this change

The pub(crate) filtering APIs added alongside the fused inline-view path (FilterSelection, FilterIterator, and FilterPredicate::selection) had little explanation of why they exist or how to use them. This adds that rationale.

What changes are included in this PR?

Comments only

Are there any user-facing changes?

No (the documented items are pub(crate)).

ClSlaid and others added 2 commits June 3, 2026 14:08
Teach BatchCoalescer to reuse a FilterPredicate when coalescing filtered batches whose non-primitive columns are inline Utf8View/BinaryView values. This avoids materializing an intermediate filtered RecordBatch for sparse filters and copies inline views and nulls directly into the in-progress arrays.

Keep materialized filtering for dense filters, batches that do not fit the coalescer buffer, and byte-view arrays with external buffers. Use a looser dense threshold for multi-column batches, where sharing the row selection across columns pays for itself.

Add shared FilterSelection iterators so primitive and byte-view coalescers can consume materialized or lazy row selections without matching per row.

Signed-off-by: cl <cailue@apache.org>
…ection

Add rationale and usage docs for the new `pub(crate)` filtering APIs
introduced alongside the fused inline-view coalescing path: explain that
`FilterSelection` borrows the predicate's internal indices/slices so the
same predicate can drive several arrays without cloning, document each
`FilterSelection` variant, the `FilterIterator` materialized/lazy split,
its `for_each`/`try_for_each` helpers, and the `strategy` field.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the arrow Changes to the arrow crate label Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants