Skip to content

[spark] Support filter pushdown for log tables#3116

Merged
luoyuxia merged 4 commits intoapache:mainfrom
fresh-borzoni:feat/spark-log-filter-pushdown
Apr 25, 2026
Merged

[spark] Support filter pushdown for log tables#3116
luoyuxia merged 4 commits intoapache:mainfrom
fresh-borzoni:feat/spark-log-filter-pushdown

Conversation

@fresh-borzoni
Copy link
Copy Markdown
Member

@fresh-borzoni fresh-borzoni commented Apr 17, 2026

closes #3117

Adds SupportsPushDownFilters to FlussAppendScanBuilder and a SparkPredicateConverter mirroring Flink's PredicateConverter.

Record-batch pushdown uses the server-side filter from #2951, Spark re-applies every filter as a safety net, making pushdown a pure optimization.

@fresh-borzoni
Copy link
Copy Markdown
Member Author

@Yohahaha @YannByron @luoyuxia PTAL 🙏

Copy link
Copy Markdown
Contributor

@Yohahaha Yohahaha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you! left some comments.

Copy link
Copy Markdown
Contributor

@Yohahaha Yohahaha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@fresh-borzoni fresh-borzoni force-pushed the feat/spark-log-filter-pushdown branch from ee3d458 to 71a79c5 Compare April 21, 2026 08:50
@fresh-borzoni
Copy link
Copy Markdown
Member Author

@Yohahaha @YannByron Ty for the review 👍

Redesigned to the newer API, PTAL 🙏

@YannByron
Copy link
Copy Markdown
Contributor

LGTM. thanks @fresh-borzoni

@fresh-borzoni
Copy link
Copy Markdown
Member Author

@luoyuxia Can you take a look, pls?

@luoyuxia luoyuxia requested review from Copilot and removed request for YannByron April 24, 2026 09:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Spark-side filter pushdown for Fluss log (append) tables by converting Spark V2 predicates into Fluss predicates and applying them as server-side record-batch filters, while still letting Spark re-apply predicates for row-exact correctness.

Changes:

  • Introduce SparkPredicateConverter to translate Spark V2 Predicate expressions into Fluss Predicates.
  • Wire Spark V2 filter pushdown into FlussAppendScanBuilder/FlussAppendScan and apply the pushed predicate in FlussAppendPartitionReader via table.newScan().filter(...).
  • Add unit/integration tests validating predicate conversion and verifying pushdown shows up in Spark scan plans.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/utils/SparkPredicateConverter.scala New converter from Spark predicates to Fluss predicates for pushdown.
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussScanBuilder.scala Add SupportsPushDownV2Filters mixin and pushdown plumbing for append scans.
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussScan.scala Extend FlussAppendScan to carry pushed predicates and include them in scan description.
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussBatch.scala Thread pushed predicate into append batch reader factory creation.
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussPartitionReaderFactory.scala Extend append reader factory to accept an optional pushed predicate.
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussAppendPartitionReader.scala Apply server-side batch filter via TableScan.filter(...) when reading.
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussMicroBatchStream.scala Update append micro-batch reader factory call signature (currently still passes None).
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/lake/FlussLakeAppendBatch.scala Update factory construction signature for fallback path.
fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/lake/FlussLakePartitionReaderFactory.scala Update append partition reader construction signature for log splits.
fluss-spark/fluss-spark-ut/src/test/scala/org/apache/fluss/spark/utils/SparkPredicateConverterTest.scala New unit tests covering predicate conversion semantics.
fluss-spark/fluss-spark-ut/src/test/scala/org/apache/fluss/spark/SparkLogTableReadTest.scala New tests asserting Spark plans show pushed predicates for log-table reads.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fresh-borzoni Thanks for the pr. Only one minor comments. Also, will the pushdown for lake reader be done in following pr?

@fresh-borzoni
Copy link
Copy Markdown
Member Author

fresh-borzoni commented Apr 25, 2026

@luoyuxia Ty for the review, addressed comments, PTAL 🙏

Yes, pushdown for the lake reader is planned as a follow-up, here it is primarly converter introduction for further use-cases.

@luoyuxia luoyuxia merged commit 4982aaf into apache:main Apr 25, 2026
7 checks passed
Ugbot pushed a commit to Ugbot/fluss that referenced this pull request Apr 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[spark] Support filter pushdown for log tables

5 participants