Skip to content

test: fix SparkToColumnar plan-shape assertions on Spark 4#4032

Merged
andygrove merged 1 commit intoapache:mainfrom
andygrove:investigate-spark4-ignored-tests
Apr 22, 2026
Merged

test: fix SparkToColumnar plan-shape assertions on Spark 4#4032
andygrove merged 1 commit intoapache:mainfrom
andygrove:investigate-spark4-ignored-tests

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #4031.

Rationale for this change

Two tests in CometExecSuite (SparkToColumnar eliminate redundant in AQE and SparkToColumnar override node name for row input) were skipped on Spark 4 via assume(!isSpark40Plus). When the guards are removed they both fail with List() had length 0 instead of expected length 1: the tests look for exactly one CometSparkToColumnarExec in the final AQE plan and find zero.

The CometSparkToColumnarExec insertion is actually working correctly. Spark 4 introduced ResultQueryStageExec, which extends LeafExecNode and now wraps the final plan exposed by AdaptiveSparkPlanExec.executedPlan. The tests used SparkPlan.collect, which treats a LeafExecNode as opaque and does not descend into QueryStageExec.plan, so the node was invisible to the assertion.

What changes are included in this PR?

  • Switch the two affected assertions to use collect from AdaptiveSparkPlanHelper (already mixed into CometTestBase). The helper's collect descends through AdaptiveSparkPlanExec and QueryStageExec.plan via allChildren, so it works identically on Spark 3.4 / 3.5 (where the top-level plan is not wrapped in a ResultQueryStageExec) and Spark 4.
  • Remove the assume(!isSpark40Plus) guards that were disabling the tests on Spark 4.

How are these changes tested?

The two previously skipped tests now run and pass on Spark 4. Verified with:

./mvnw test -Pspark-4.0 -Dtest=none -Dsuites='org.apache.comet.exec.CometExecSuite SparkToColumnar'

All 9 SparkToColumnar* tests pass. Also verified on the default Spark 3.5 profile; all 9 still pass.

Closes apache#4031.

Spark 4 wraps the final AQE plan in `ResultQueryStageExec`, a
`LeafExecNode` that `SparkPlan.collect` treats as opaque. The two
affected tests used `adaptivePlan.collect { case c: CometSparkToColumnarExec => c }`,
which therefore found zero matches on Spark 4 even though the node was
present in the materialized plan.

Switch to the `collect` method from `AdaptiveSparkPlanHelper` (already
mixed into `CometTestBase`), which descends through query stages. Drop
the `assume(!isSpark40Plus)` guards that disabled the tests on Spark 4.
Copy link
Copy Markdown
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@andygrove andygrove merged commit dcaaffa into apache:main Apr 22, 2026
118 of 119 checks passed
@andygrove andygrove deleted the investigate-spark4-ignored-tests branch April 22, 2026 19:26
@andygrove
Copy link
Copy Markdown
Member Author

Merged. Thanks @parthchandra

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CometSparkToColumnarExec missing from AQE plan on Spark 4 (two skipped tests)

2 participants