fix: [branch-0.14] backport #3879 - skip Comet columnar shuffle for stages with DPP scans#3934
Open
andygrove wants to merge 1 commit intoapache:branch-0.14from
Open
fix: [branch-0.14] backport #3879 - skip Comet columnar shuffle for stages with DPP scans#3934andygrove wants to merge 1 commit intoapache:branch-0.14from
andygrove wants to merge 1 commit intoapache:branch-0.14from
Conversation
When a scan uses Dynamic Partition Pruning (DPP) and falls back to
Spark, Comet was still wrapping the stage with columnar shuffle,
creating inefficient row-to-columnar transitions:
CometShuffleWriter → CometRowToColumnar → SparkFilter →
SparkColumnarToRow → SparkScan
This adds a check in columnarShuffleSupported() that walks the child
plan tree to detect FileSourceScanExec nodes with dynamic pruning
filters. When found, the shuffle is not converted to Comet, allowing
the entire stage to fall back to Spark.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Backport of #3879 to branch-0.14.
Rationale for this change
PR #3879 is a performance optimization that prevents stages with DPP scans from being converted to Comet. This has a huge impact on the TPC-DS benchmark.
What changes are included in this PR?
Cherry-pick of commit acbdeac from main. The changes skip Comet columnar shuffle for stages that contain DPP scans.
How are these changes tested?
Tests are included in the original PR and were cherry-picked along with the fix.