-
Notifications
You must be signed in to change notification settings - Fork 48
Description
Search before asking
- I searched in the issues and found nothing similar.
Motivation
With reusable Predicate types in place, the next high-value optimization is partition-level pruning during scan planning.
Today TableScan still reads and plans all manifest entries for a partitioned table even when the user filter references only partition columns. For a table partitioned by day, a query such as dt = '2024-01-01' should only plan splits for the matching partition instead of scanning all partitions first.
Solution
Add ReadBuilder::with_filter(Predicate) and wire it into TableScan::plan_snapshot to prune manifest entries by partition values.
The core approach is split-then-evaluate (aligned with Java Paimon's splitPartitionPredicatesAndDataPredicates):
- Split the user predicate at top-level AND boundaries into conjuncts
- Classify each conjunct: if all leaves reference partition keys → partition predicate; otherwise → data predicate (saved for future phases)
- Remap partition predicates from table schema index space to partition row index space
- Evaluate only the remapped partition predicates against each entry's partition
BinaryRow
This is deliberately not "replace non-partition leaves with true and evaluate the whole tree" — that approach produces false negatives under NOT/OR and causes data loss:
NOT(dt = '2024-01-01' AND id > 10)
Wrong (non-partition leaf → true):
NOT(dt = '2024-01-01' AND true) = NOT(dt = '2024-01-01')
→ skips dt=2024-01-01 partition ❌ DATA LOSS
Correct (split-then-evaluate):
This NOT expression mixes partition/non-partition columns
→ cannot safely extract a partition-only predicate
→ no partition filtering applied ✅ SAFE
Error handling: eval_row returns Result<bool> and errors propagate as scan errors. Only BinaryRow decode failures are treated as fail-open (never skip data due to corrupt metadata).
Anything else?
Depends on the preceding Predicate data structure issue #154
Willingness to contribute
- I'm willing to submit a PR!