Skip to content

Conversation

@c21
Copy link
Contributor

@c21 c21 commented Oct 30, 2020

What changes were proposed in this pull request?

As a followup comment from #29804 (comment) , here we add add the physical plan rule DisableUnnecessaryBucketedScan into AQE AdaptiveSparkPlanExec.queryStagePreparationRules, to make auto bucketed scan work with AQE.

The change is mostly in:

  • AdaptiveSparkPlanExec.scala: add physical plan rule DisableUnnecessaryBucketedScan
  • DisableUnnecessaryBucketedScan.scala: propagate logical plan link for the file source scan exec operator, otherwise we lose the logical plan link information when AQE is enabled, and will get exception here. (for example, for query SELECT * FROM bucketed_table with AQE is enabled)
  • DisableUnnecessaryBucketedScanSuite.scala: add new test suite for AQE enabled - DisableUnnecessaryBucketedScanWithoutHiveSupportSuiteAE, and changed some of tests to use AdaptiveSparkPlanHelper.find/collect, to make the plan verification work when AQE enabled.

Why are the changes needed?

It's reasonable to add the support to allow disabling unnecessary bucketed scan with AQE is enabled, this helps optimize the query when AQE is enabled.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added unit test in DisableUnnecessaryBucketedScanSuite.

@c21
Copy link
Contributor Author

c21 commented Oct 30, 2020

cc @cloud-fan if you have time to take a look, thanks.

@SparkQA
Copy link

SparkQA commented Oct 30, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35045/

removeRedundantSorts,
ensureRequirements
ensureRequirements,
disableUnnecessaryBucketedScan
Copy link
Contributor

@cloud-fan cloud-fan Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we can just put DisableUnnecessaryBucketedScan. Same to other rules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan - agree, updated.

val partitioning = spark.table("t1").queryExecution.executedPlan
.outputPartitioning
assert(partitioning match {
val inMemoryScan = find(spark.table("t1").queryExecution.executedPlan)(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about stripAQEPlan(spark.table("t1").queryExecution.executedPlan).outputPartitioning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan - sure, it's less verbose, updated.

@SparkQA
Copy link

SparkQA commented Oct 30, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35045/

@SparkQA
Copy link

SparkQA commented Oct 30, 2020

Test build #130445 has finished for PR 30200 at commit 36815fc.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 30, 2020

Test build #130440 has finished for PR 30200 at commit f721855.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@c21
Copy link
Contributor Author

c21 commented Oct 30, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Oct 30, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35050/

@SparkQA
Copy link

SparkQA commented Oct 30, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35050/

@SparkQA
Copy link

SparkQA commented Oct 30, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35052/

@SparkQA
Copy link

SparkQA commented Oct 30, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35052/

@SparkQA
Copy link

SparkQA commented Oct 30, 2020

Test build #130447 has finished for PR 30200 at commit 36815fc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in e52b858 Nov 2, 2020
@c21
Copy link
Contributor Author

c21 commented Nov 2, 2020

Thank you @cloud-fan for review!

@c21 c21 deleted the auto-bucket-aqe branch November 2, 2020 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants