Skip to content

feat: Partition predicate evaluation in TableScan #155

@QuakeWang

Description

@QuakeWang

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

With reusable Predicate types in place, the next high-value optimization is partition-level pruning during scan planning.

Today TableScan still reads and plans all manifest entries for a partitioned table even when the user filter references only partition columns. For a table partitioned by day, a query such as dt = '2024-01-01' should only plan splits for the matching partition instead of scanning all partitions first.

Solution

Add ReadBuilder::with_filter(Predicate) and wire it into TableScan::plan_snapshot to prune manifest entries by partition values.

The core approach is split-then-evaluate (aligned with Java Paimon's splitPartitionPredicatesAndDataPredicates):

  1. Split the user predicate at top-level AND boundaries into conjuncts
  2. Classify each conjunct: if all leaves reference partition keys → partition predicate; otherwise → data predicate (saved for future phases)
  3. Remap partition predicates from table schema index space to partition row index space
  4. Evaluate only the remapped partition predicates against each entry's partition BinaryRow

This is deliberately not "replace non-partition leaves with true and evaluate the whole tree" — that approach produces false negatives under NOT/OR and causes data loss:

NOT(dt = '2024-01-01' AND id > 10)

Wrong (non-partition leaf → true):
  NOT(dt = '2024-01-01' AND true) = NOT(dt = '2024-01-01')
  → skips dt=2024-01-01 partition ❌ DATA LOSS

Correct (split-then-evaluate):
  This NOT expression mixes partition/non-partition columns
  → cannot safely extract a partition-only predicate
  → no partition filtering applied ✅ SAFE

Error handling: eval_row returns Result<bool> and errors propagate as scan errors. Only BinaryRow decode failures are treated as fail-open (never skip data due to corrupt metadata).

Anything else?

Depends on the preceding Predicate data structure issue #154

Willingness to contribute

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions