[python] Add per-partition bucket pruning for HASH_FIXED tables by TheR1sing3un · Pull Request #7804 · apache/paimon

TheR1sing3un · 2026-05-10T07:44:45Z

Background

PR-5.4 (#7744) added bucket pruning for HASH_FIXED tables but only on
the bucket-key dimension. Predicates that mix a partition column and
a bucket column under a top-level OR — e.g.
(part='a' AND id=1) OR (part='b' AND id=2) — couldn't be pruned:
the OR mixes two dimensions, so the existing logic gave up and read
every bucket in both partitions. PR-5.4 left this as a TODO in the
module docstring.

Effect

Same query now reads exactly one bucket per partition (the bucket
holding id=1 in part='a', the bucket holding id=2 in
part='b'). The selector evaluates the predicate per partition
value first — the OR collapses to a single AND inside each partition
— and bucket selection runs on that simplified form.

Soundness contract is unchanged: the bucket set remains a superset
of the buckets that contain matching rows; any error falls open to
"all buckets accept", never drops a bucket with matches.

Two commits — helper + FileScanner wiring. 9 unit tests cover the
predicate-fold walker and the per-partition cache; one e2e test on a
2-partition × 4-bucket table proves the mixed-OR query reads ≤ 2
splits instead of one per (partition, bucket).

Adds the predicate-replace + AND/OR fold infrastructure that lets the bucket selector specialise itself per concrete partition value, the piece called out as a TODO at the bottom of the bucket_select_converter module docstring. Three pieces ship in this commit, all internal: * ``replace_partition_predicate(predicate, partition_field_names, partition_values)``: walker that substitutes partition leaves with their evaluated truth value and folds AND/OR. Three-way return — ``None`` (cleared / always true), ``False`` (always false), or the simplified ``Predicate``. * ``_Selector`` is now keyed by ``(partition_tuple, total_buckets)`` and accepts a third positional ``partition`` arg in ``__call__``. Two-arg legacy callers (early manifest filter) still work — they get the partition-agnostic over-approximation. * ``create_bucket_selector`` now takes an optional ``partition_fields`` list. The selector built without it (or with a predicate that does not touch any partition column) keeps the existing shape and result. This commit does not yet wire the partition into ``FileScanner``; ``_filter_manifest_entry`` still calls the selector with two args, so all existing pushdown_bucket tests stay green. Tests: nine new unit cases covering ``replace_partition_predicate`` folding, the per-partition cache, fall-through when partition is unknown, and the empty-bucket-set result for an unsatisfiable partition.

Switches ``_filter_manifest_entry`` to call the bucket selector with the entry's partition row, and passes the table's partition fields into ``create_bucket_selector`` so the selector can specialise the predicate per concrete partition value. The early manifest filter (``_build_early_bucket_filter``) still uses the two-arg form because the partition row hasn't been deserialised at that stage; the selector internally falls back to a sound partition-agnostic over-approximation there. Per-partition tightening runs on the late filter once the entry is fully decoded. End-to-end test: ``(part='a' AND id=1) OR (part='b' AND id=2)`` on a two-partition four-bucket table, asserting both correctness (only the two matching rows come back) and pruning effectiveness (≤ 2 splits instead of one per (partition, bucket) combination).

leaves12138

Reviewed the change against the Java BucketSelectConverter / BucketSelector / PartitionValuePredicateVisitor flow.

The important semantics line up: the selector specializes the predicate with the concrete partition value before extracting finite Equal/In bucket-key constraints, keeps the existing MAX_VALUES / fail-open behavior for unsupported bucket predicates, and keeps the early manifest filter conservative when the partition is still unknown.

One difference from Java is intentional and correctness-safe: when partition specialization folds the predicate to false, Python returns an empty bucket set, while Java's BucketSelector itself falls open and relies on the scan-level partition filter to drop the entry. Since no row in that concrete partition can satisfy the complete predicate, the stricter Python pruning does not introduce false negatives.

The added tests cover the partition predicate folding, per-partition cache keying, unknown-partition fallback, and the mixed partition/bucket OR integration case. Looks good to me.

JingsongLi · 2026-05-11T08:11:36Z

+1

TheR1sing3un added 2 commits May 10, 2026 15:33

leaves12138 approved these changes May 11, 2026

View reviewed changes

JingsongLi merged commit 5eee480 into apache:master May 11, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python] Add per-partition bucket pruning for HASH_FIXED tables#7804

[python] Add per-partition bucket pruning for HASH_FIXED tables#7804
JingsongLi merged 2 commits into
apache:masterfrom
TheR1sing3un:py-bucket-pruning-per-partition

TheR1sing3un commented May 10, 2026 •

edited

Loading

Uh oh!

leaves12138 left a comment

Uh oh!

JingsongLi commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

TheR1sing3un commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Effect

Uh oh!

leaves12138 left a comment

Choose a reason for hiding this comment

Uh oh!

JingsongLi commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

TheR1sing3un commented May 10, 2026 •

edited

Loading