feat(parquet): skip RowFilter evaluation for fully matched row groups#9694
feat(parquet): skip RowFilter evaluation for fully matched row groups#9694xudong963 wants to merge 1 commit intoapache:mainfrom
Conversation
…ully matched row groups When row group statistics prove that all rows in a row group satisfy the filter predicate, the RowFilter evaluation can be skipped entirely for those row groups. This avoids the cost of decoding filter columns and evaluating the predicate expression. Adds `with_fully_matched_row_groups(Vec<usize>)` to ArrowReaderBuilder which flows through to RowGroupReaderBuilder. When processing a fully matched row group, the Start state transitions directly to StartData, bypassing all filter evaluation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
258a28d to
76c0284
Compare
When row group statistics prove that ALL rows satisfy the filter predicate, skip both RowFilter evaluation (late materialization) and page index pruning for those row groups. This avoids wasted work decoding filter columns and evaluating predicates that produce no useful filtering. Depends on apache/arrow-rs#9694 for the `with_fully_matched_row_groups()` builder API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When row group statistics prove that ALL rows satisfy the filter predicate, skip both RowFilter evaluation (late materialization) and page index pruning for those row groups. This avoids wasted work decoding filter columns and evaluating predicates that produce no useful filtering. Depends on apache/arrow-rs#9694 for the `with_fully_matched_row_groups()` builder API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
I've opened a related PR in DataFusion, which is using my arrow-rs fork version. If we think the two PRs are in the right direction, then we can review the arrow-rs side PR first, after it's merged and lands in DF, we can continue to change the arrow-rs version in DF PR, and merge the DF PR |
alamb
left a comment
There was a problem hiding this comment.
Thank you @xudong963 -- this is a neat idea and I think the idea of not evaluating pushdown fulters when DataFusion can prove they won't filter things out is totally the right way to go
However, I am not sure this is the right level to do this filtering. I think it might keep the APIs simpler if the push down filter removal happens in DataFusion itself -- for example, DataFusion could make different ParquetPushDecoders for each row group, and the ones where the filters don't filter anything can be disabled
IN the context of the "morsel" work -- I think we are heading towards a scan in DataFusion where each RowGroup (or collection of row groups) is a morsel and then we can treat the morsels individually for IO, pruning, and even moving around cores.
Any chance you want to help explore that option in DataFusion?
ahah, this is cleaner, I'd like to have a try |
When row group statistics prove that ALL rows satisfy the filter predicate, skip both RowFilter evaluation (late materialization) and page index pruning for those row groups. This avoids wasted work decoding filter columns and evaluating predicates that produce no useful filtering. Depends on apache/arrow-rs#9694 for the `with_fully_matched_row_groups()` builder API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
apache/datafusion@d6c3879 made a refactor on DF side, which doesn't need the changes from arrow side |
This is neat. What do you think about the idea of an API to "split" decoders or something 🤔 |
Which issue does this PR close?
Rationale for this change
When DataFusion evaluates a Parquet scan with filter pushdown, it uses row group statistics to determine which row groups to scan. In many real-world queries, the predicate matches all rows in some (or all) row groups — for example, a time-range filter where entire row groups fall within the range, or a
WHERE status != 'DELETED'filter on data that contains no deleted rows.Today, even when row group statistics prove that every row satisfies the predicate, the
RowFilteris still evaluated row-by-row during decoding. This means the filter columns are decoded and the predicate expression is evaluated for every row — work that produces no useful filtering and can be expensive, especially when filter columns are large (e.g., strings) or the predicate is complex.This PR adds a mechanism to skip
RowFilterevaluation entirely for row groups that are known to be "fully matched" based on statistics. The caller (e.g., DataFusion) determines which row groups are fully matched during row group pruning and passes that information to the reader builder. During decoding, fully matched row groups skip straight to data materialization, bypassing filter column decoding and predicate evaluation.What changes are included in this PR?
New builder method
with_fully_matched_row_groups(Vec<usize>)onArrowReaderBuilder— allows callers to specify which row groups have all rows matching the filter predicate.Skip filter in
RowGroupReaderBuilder::try_transition()— when a row group is in the fully-matched set, theStartstate transitions directly toStartData, bypassing theFilters/WaitingOnFilterDatastates entirely. The filter is preserved (put back intoself.filter) for subsequent non-fully-matched row groups.Plumbed through all decoder paths — the field is propagated through
ParquetPushDecoderBuilder,ParquetRecordBatchStreamBuilder(async), and ignored in the sync reader (which processes one row group at a time).Design choices:
HashSet<usize>onRowGroupReaderBuilderfor O(1) lookup, rather than onRowGroupDecoderState, so the state enum size is unchanged (preserving the existing 200-byte size test).Option<Vec<usize>>at the builder level and converts toHashSetinternally.Are these changes tested?
The optimization is exercised by an end-to-end benchmark in DataFusion that uses
ParquetPushDecoderdirectly (the same code path used by DataFusion's async Parquet opener). The benchmark verifies correctness by asserting the expected row count.Unit tests can be added if reviewers prefer — happy to add tests that verify:
(I'll open a draft PR in datafusion side tomorrow)
Are there any user-facing changes?
Yes — a new public method
ArrowReaderBuilder::with_fully_matched_row_groups()is added. This is a purely additive, non-breaking change. Existing code is unaffected since the default isNone(no row groups are marked as fully matched).