Skip to content

[spark] Split non-partition and partition predicates from pushPredicates to limit pushdown#3397

Open
Yohahaha wants to merge 2 commits into
apache:mainfrom
Yohahaha:spark-pushdown-partition-filter
Open

[spark] Split non-partition and partition predicates from pushPredicates to limit pushdown#3397
Yohahaha wants to merge 2 commits into
apache:mainfrom
Yohahaha:spark-pushdown-partition-filter

Conversation

@Yohahaha
Copy link
Copy Markdown
Contributor

Purpose

Linked issue: close #xxx

Fix a bug where LIMIT is never pushed down when a query has partition filters (e.g., WHERE dt = 'x' LIMIT 1). The root cause is that pushPredicates returns all predicates including partition ones, causing Spark to keep a Filter node. Spark's pushDownLimit only invokes pushLimit when there are no filters (PhysicalOperation(_, Nil, ...)), so the Filter node blocks limit pushdown.

Brief change log

  • SparkPartitionPredicate.scala: Refactor extract() to return nonPartitionPredicates and partitionPredicate.
  • FlussScanBuilder.scala: Return only non-partition predicates from pushPredicates() across all scan builders (partition filters, ARROW filters, lake filters).

Tests

API and Format

Documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant