feat: CometNativeScan per-partition plan data, add DPP [iceberg] #3446

mbutrovich · 2026-02-08T17:21:44Z

Claude helped me with the PR description. After fact-checking a couple of hallucinations about tests that it thought were added, I think this is accurate now:

Which issue does this PR close?

Closes #3442.

Rationale for this change

Dynamic Partition Pruning (DPP) is an important optimization for star schema queries. Previously, native_datafusion scans fell back to Spark when DPP was present. This PR adds full DPP support by deferring partition serialization to execution time, after DPP subqueries resolve.

Absent of DPP optimizations, this also reduces the serialization overhead of scanning large amounts of Parquet data. Previously every Spark partition received every other partition's metadata about Parquet files to read.

What changes are included in this PR?

Architecture:

CometNativeScanExec now defers partition serialization to execution time via lazy serializedPartitionData
At planning time, CometNativeScan.convert() creates a placeholder operator with just a scan_id
At execution time, serializePartitions() resolves DPP subqueries and serializes the filtered partitions
Uses originalPlan.partitionFilters instead of partitionFilters because AQE's PlanDynamicPruningFilters transforms InSubqueryExec → Literal.TrueLiteral via makeCopy, but originalPlan is not in the active plan tree and retains the original filters

Config:

New spark.comet.scan.dpp.enabled (default: true) replaces spark.comet.dppFallback.enabled

Scan Mode	`COMET_DPP_ENABLED`	Result
`native_datafusion`	true (default)	`CometNativeScanExec` with DPP (this PR)
`native_datafusion`	false	Fall back to Spark (existing behavior)
Iceberg native	true (default)	`CometIcebergNativeScanExec` with DPP (#3349)
Iceberg native	false	Fall back to Spark (this PR)
`auto`	N/A	Fall back to Spark (existing behavior)
`native_iceberg_compat`	N/A	Fall back to Spark (existing behavior)

Shims:

Added getDppFilteredFilePartitions() and getDppFilteredBucketedFilePartitions() to ShimCometScanExec for Spark 3.4/3.5/4.0
Added resolveSubqueryAdaptiveBroadcast() to ShimSubqueryBroadcast for DPP subquery resolution

Other:

Removed custom equals/hashCode from CometNativeScanExec in favor of case class defaults to prevent incorrect AQE exchange reuse between scans with different projections

How are these changes tested?

Comet tests:
New tests in CometExecSuite:

DPP with native_datafusion scan - join with dynamic partition pruning - verifies basic DPP with partition pruning
DPP with native_datafusion scan - multiple partition columns - verifies DPP with two partition columns
DPP with native_datafusion scan - SubqueryExec (non-broadcast DPP) - verifies DPP works with non-broadcast subqueries
DPP with native_datafusion scan - ReusedSubqueryExec (subquery reuse) - verifies DPP works with reused subqueries

New test in CometIcebergNativeSuite:

runtime filtering - DPP with non-broadcast join - verifies Iceberg DPP works with SubqueryExec

Spark SQL tests:
After implementing, we had 24 Spark SQL test failures related to DPP. Updated Spark 3.5.8 diff now looks for CometNativeScanExec and properly pass the tests now.

We also had 22 test failures related to bucket scans with DPP. I modified the partitioning logic and tests pass, so I am confident that we're getting good test coverage out of Spark SQL tests.

…dd DPP.

# Conflicts: # spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala

….sql.execution.SubqueryExec

…t tests.

…ng materialize"

…uning"

mbutrovich added 2 commits February 8, 2026 12:20

Adopt PR apache#3349's per-partition scan logic to CometNativeScan. A…

3a9f6fa

…dd DPP.

Fix encryption.

9a3f747

mbutrovich added this to the 0.14.0 milestone Feb 8, 2026

mbutrovich added 27 commits February 8, 2026 13:01

Make format.

9af450d

Fix Spark 4 DPP API?

d9c4903

New plans.

f572220

make format

b32660e

Update CometScanRuleSuite.

ee0806e

Update the DPP config for Comet.

d444046

Merge branch 'main' into cometnativescan-dpp

20178dd

# Conflicts: # spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala

Fix after upmerge.

59d7b41

Update plans.

259fc99

Add test to reproduce Unexpected subquery plan type: org.apache.spark…

a804920

….sql.execution.SubqueryExec

Add failing Iceberg test too.

4352c04

Handle SubqueryExec in addition to SubqueryBroadcastExec. Add relevan…

7f0004f

…t tests.

Handle SubqueryExec in addition to SubqueryBroadcastExec. Add relevan…

959d517

…t tests.

Handle SubqueryExec in addition to SubqueryBroadcastExec. Add relevan…

69b3559

…t tests.

Fix format.

b7eedd1

clean up tables in new tests

e9dab0e

update spark 3.5.8 diff

b9b5bb8

add bucketed DPP scan support

0e71307

update spark 4.0 shim

fa6fd1a

make format

ed50be2

fix shims

48ba80a

fix shims

cd6539c

fix canonicalization?

9b00cc2

Update diffs

704bd2d

Try again with canonicalization

916c37e

Try again with canonicalization

b2d1540

Update diffs.

d142fa3

mbutrovich added 2 commits February 10, 2026 10:27

Attempt to fix "SPARK-30291: AQE should catch the exceptions when doi…

288a248

…ng materialize"

fix "DPP with native_datafusion scan - join with dynamic partition pr…

10f9e42

…uning"

andygrove mentioned this pull request Feb 10, 2026

chore: Remove all remaining uses of legacy BatchReader from Comet #3468

Draft

mbutrovich added 3 commits February 10, 2026 12:41

Merge branch 'main' into cometnativescan-dpp

bcfa289

Merge branch 'main' into cometnativescan-dpp

daab410

Rename NativePlanDataInjector

47993d5

mbutrovich mentioned this pull request Feb 10, 2026

Add native_datafusion V2 DataSource API reader #3481

Open

andygrove mentioned this pull request Feb 10, 2026

fix: unignore input_file_name Spark SQL tests for native_datafusion #3458

Draft

mbutrovich marked this pull request as ready for review February 10, 2026 21:06

minor cleanup

e25798a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: CometNativeScan per-partition plan data, add DPP [iceberg] #3446

feat: CometNativeScan per-partition plan data, add DPP [iceberg] #3446

mbutrovich commented Feb 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: CometNativeScan per-partition plan data, add DPP [iceberg] #3446

Are you sure you want to change the base?

feat: CometNativeScan per-partition plan data, add DPP [iceberg] #3446

Conversation

mbutrovich commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mbutrovich commented Feb 8, 2026 •

edited

Loading