[SPARK-33027][SQL] Add DisableUnnecessaryBucketedScan rule to AQE #30200

c21 · 2020-10-30T05:25:32Z

What changes were proposed in this pull request?

As a followup comment from #29804 (comment) , here we add add the physical plan rule DisableUnnecessaryBucketedScan into AQE AdaptiveSparkPlanExec.queryStagePreparationRules, to make auto bucketed scan work with AQE.

The change is mostly in:

AdaptiveSparkPlanExec.scala: add physical plan rule DisableUnnecessaryBucketedScan
DisableUnnecessaryBucketedScan.scala: propagate logical plan link for the file source scan exec operator, otherwise we lose the logical plan link information when AQE is enabled, and will get exception here. (for example, for query SELECT * FROM bucketed_table with AQE is enabled)
DisableUnnecessaryBucketedScanSuite.scala: add new test suite for AQE enabled - DisableUnnecessaryBucketedScanWithoutHiveSupportSuiteAE, and changed some of tests to use AdaptiveSparkPlanHelper.find/collect, to make the plan verification work when AQE enabled.

Why are the changes needed?

It's reasonable to add the support to allow disabling unnecessary bucketed scan with AQE is enabled, this helps optimize the query when AQE is enabled.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added unit test in DisableUnnecessaryBucketedScanSuite.

c21 · 2020-10-30T05:26:18Z

cc @cloud-fan if you have time to take a look, thanks.

SparkQA · 2020-10-30T06:12:47Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35045/

cloud-fan · 2020-10-30T06:18:16Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala

    removeRedundantSorts,
-    ensureRequirements
+    ensureRequirements,
+    disableUnnecessaryBucketedScan


seems we can just put DisableUnnecessaryBucketedScan. Same to other rules.

@cloud-fan - agree, updated.

cloud-fan · 2020-10-30T06:26:17Z

sql/core/src/test/scala/org/apache/spark/sql/sources/DisableUnnecessaryBucketedScanSuite.scala

-        val partitioning = spark.table("t1").queryExecution.executedPlan
-          .outputPartitioning
-        assert(partitioning match {
+        val inMemoryScan = find(spark.table("t1").queryExecution.executedPlan)(


how about stripAQEPlan(spark.table("t1").queryExecution.executedPlan).outputPartitioning

@cloud-fan - sure, it's less verbose, updated.

SparkQA · 2020-10-30T06:33:28Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35045/

SparkQA · 2020-10-30T07:05:02Z

Test build #130445 has finished for PR 30200 at commit 36815fc.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-10-30T07:05:02Z

Test build #130440 has finished for PR 30200 at commit f721855.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

c21 · 2020-10-30T07:07:04Z

retest this please

SparkQA · 2020-10-30T07:28:48Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35050/

SparkQA · 2020-10-30T07:58:19Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35050/

SparkQA · 2020-10-30T08:24:46Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35052/

SparkQA · 2020-10-30T08:47:08Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35052/

SparkQA · 2020-10-30T11:39:45Z

Test build #130447 has finished for PR 30200 at commit 36815fc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-11-02T06:44:11Z

thanks, merging to master!

c21 · 2020-11-02T07:46:47Z

Thank you @cloud-fan for review!

Make auto bucketed scan work with AQE

f721855

cloud-fan reviewed Oct 30, 2020

View reviewed changes

Address comments

36815fc

cloud-fan approved these changes Oct 30, 2020

View reviewed changes

cloud-fan closed this in e52b858 Nov 2, 2020

c21 deleted the auto-bucket-aqe branch November 2, 2020 07:46

[SPARK-33027][SQL] Add DisableUnnecessaryBucketedScan rule to AQE #30200

[SPARK-33027][SQL] Add DisableUnnecessaryBucketedScan rule to AQE #30200

Uh oh!

Conversation

c21 commented Oct 30, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

c21 commented Oct 30, 2020

Uh oh!

SparkQA commented Oct 30, 2020

Uh oh!

cloud-fan Oct 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

c21 Oct 30, 2020

Choose a reason for hiding this comment

Uh oh!

cloud-fan Oct 30, 2020

Choose a reason for hiding this comment

Uh oh!

c21 Oct 30, 2020

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 30, 2020

Uh oh!

SparkQA commented Oct 30, 2020

Uh oh!

SparkQA commented Oct 30, 2020

Uh oh!

c21 commented Oct 30, 2020

Uh oh!

SparkQA commented Oct 30, 2020

Uh oh!

SparkQA commented Oct 30, 2020

Uh oh!

SparkQA commented Oct 30, 2020

Uh oh!

SparkQA commented Oct 30, 2020

Uh oh!

SparkQA commented Oct 30, 2020

Uh oh!

cloud-fan commented Nov 2, 2020

Uh oh!

c21 commented Nov 2, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cloud-fan Oct 30, 2020 •

edited

Loading