[SPARK-55887][CONNECT] Special handling for CollectLimitExec/CollectTailExec to avoid full table scans#54685
Closed
LuciferYang wants to merge 6 commits intoapache:masterfrom
Closed
[SPARK-55887][CONNECT] Special handling for CollectLimitExec/CollectTailExec to avoid full table scans#54685LuciferYang wants to merge 6 commits intoapache:masterfrom
CollectLimitExec/CollectTailExec to avoid full table scans#54685LuciferYang wants to merge 6 commits intoapache:masterfrom
Conversation
Contributor
Author
|
test first |
CollectLimitExec/CollectTailExecCollectLimitExec/CollectTailExec to avoid full table scans
LuciferYang
commented
Mar 9, 2026
| sendBatch(bytes, count, offset) | ||
| offset += count | ||
| } | ||
| case collectLimit: CollectLimitExec => |
Contributor
Author
There was a problem hiding this comment.
This fix might bring additional memory pressure to the Connect server. However, I think we can implement a simple fix first and then look for a better solution later.
Contributor
Author
|
Let me test it in the production environment. |
Contributor
Author
The test indicates that the function is working ok. |
dongjoon-hyun
approved these changes
Mar 10, 2026
Contributor
Author
|
Thank you @dongjoon-hyun |
yikf
approved these changes
Mar 11, 2026
| offset += count | ||
| } | ||
| case collectLimit: CollectLimitExec => | ||
| SQLExecution.withNewExecutionId(dataframe.queryExecution, Some("collectArrow")) { |
Contributor
There was a problem hiding this comment.
nit: shall we use names like collectLimitArrow/collectTailArrow?
zhengruifeng
approved these changes
Mar 12, 2026
Member
|
Merged to master for Apache Spark 4.2.0. Thank you, @LuciferYang and all. |
Contributor
Author
|
Thanks @dongjoon-hyun @zhengruifeng @yikf |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR updates
SparkConnectPlanExecutionto useexecuteCollect()instead ofexecute()when processingCollectLimitExecandCollectTailExecphysical plans.In Spark Connect, operations like
head(),take(), andtail()are translated intoCollectLimitExecorCollectTailExecphysical nodes. Previously, these were executed via the standardexecute()path, which often resulted in scanning all partitions before reducing the results.By switching to
executeCollect(), Spark Connect now leverages the optimizedexecuteTake()andexecuteTail()implementations already present in Spark Classic. These optimizations ensure that only the necessary partitions are scanned (e.g., scanning only the first partition forhead(1)), significantly reducing I/O and task overhead.Why are the changes needed?
Parity with Spark Classic behavior and performance optimization.
In Spark Classic,
Dataset.collect()(and by extensionhead/take/tail) usesplan.executeCollect(). This path includes optimizations to avoid full table scans:CollectLimitExecusesexecuteTake(): It starts by scanning only the first partition and incrementally scans more only if the limit isn't met.CollectTailExecusesexecuteTail(): It starts scanning from the last partition backwards.In Spark Connect (Before this PR),
SparkConnectPlanExecutionusedplan.execute(). For alimit(1)query on a 100-partition table, this would launch 100 tasks (one for each partition'sLocalLimit), causing unnecessary computation and resource usage.Example Scenario:
Running
spark.range(0, 10000, 1, 100).limit(1).collect():Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added new test cases in
SparkConnectServiceSuiteto verify the task count reduction:test("SPARK-55887: Use executeCollect for limit to avoid full scan"): Verified thatlimit(1)on a 100-partition DataFrame triggers significantly fewer tasks than partitions (expected: 1 task).test("SPARK-55887: Use executeCollect for tail to avoid full scan"): Verified thattail(1)on a 100-partition DataFrame triggers significantly fewer tasks than partitions (expected: 1 task).Was this patch authored or co-authored using generative AI tooling?
Test cases were generated with the assistance of Gemini 3.