Skip to content

fix: [branch-0.14] backport #3879 - skip Comet columnar shuffle for stages with DPP scans#3934

Open
andygrove wants to merge 1 commit intoapache:branch-0.14from
andygrove:backport-3879-to-0.14
Open

fix: [branch-0.14] backport #3879 - skip Comet columnar shuffle for stages with DPP scans#3934
andygrove wants to merge 1 commit intoapache:branch-0.14from
andygrove:backport-3879-to-0.14

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented Apr 13, 2026

Which issue does this PR close?

Backport of #3879 to branch-0.14.

Rationale for this change

PR #3879 is a performance optimization that prevents stages with DPP scans from being converted to Comet. This has a huge impact on the TPC-DS benchmark.

What changes are included in this PR?

Cherry-pick of commit acbdeac from main. The changes skip Comet columnar shuffle for stages that contain DPP scans.

How are these changes tested?

Tests are included in the original PR and were cherry-picked along with the fix.

When a scan uses Dynamic Partition Pruning (DPP) and falls back to
Spark, Comet was still wrapping the stage with columnar shuffle,
creating inefficient row-to-columnar transitions:

  CometShuffleWriter → CometRowToColumnar → SparkFilter →
    SparkColumnarToRow → SparkScan

This adds a check in columnarShuffleSupported() that walks the child
plan tree to detect FileSourceScanExec nodes with dynamic pruning
filters. When found, the shuffle is not converted to Comet, allowing
the entire stage to fall back to Spark.
@andygrove andygrove changed the title fix: backport #3879 - skip Comet columnar shuffle for stages with DPP scans fix: [branch-0.14] backport #3879 - skip Comet columnar shuffle for stages with DPP scans Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant