[SPARK-35874][SQL] AQE Shuffle should wait for its subqueries to finish before materializing #33058

cloud-fan · 2021-06-24T11:00:02Z

What changes were proposed in this pull request?

Currently, AQE uses a very tricky way to trigger and wait for the subqueries:

submitting stage calls QueryStageExec.materialize
QueryStageExec.materialize calls executeQuery
executeQuery does some preparation works, which goes to QueryStageExec.doPrepare
QueryStageExec.doPrepare calls prepare of shuffle/broadcast, which triggers all the subqueries in this stage
executeQuery then calls waitForSubqueries, which does nothing because QueryStageExec itself has no subqueries
then we submit the shuffle/broadcast job, without waiting for subqueries
for ShuffleExchangeExec.mapOutputStatisticsFuture, it calls child.execute, which calls executeQuery and wait for subqueries in the query tree of child
The only missing case is: ShuffleExchangeExec itself may contain subqueries(repartition expression) and AQE doesn't wait for it.

A simple fix would be overwriting waitForSubqueries in QueryStageExec, and forward the request to shuffle/broadcast, but this PR proposes a different and probably cleaner way: we follow execute/doExecute in SparkPlan, and add similar APIs in the AQE version of "execute", which gets a future from shuffle/broadcast.

Why are the changes needed?

bug fix

Does this PR introduce any user-facing change?

a query fails without the fix and can run now

How was this patch tested?

new test

cloud-fan · 2021-06-24T11:00:23Z

cc @maryannxue @JkSelf @viirya @yaooqinn

SparkQA · 2021-06-24T12:24:59Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44792/

SparkQA · 2021-06-24T12:34:04Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44792/

cloud-fan · 2021-06-24T12:39:30Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala

@@ -58,7 +58,11 @@ trait BroadcastExchangeLike extends Exchange {
   * For registering callbacks on `relationFuture`.
   * Note that calling this method may not start the execution of broadcast job.
   */
-  def completionFuture: scala.concurrent.Future[broadcast.Broadcast[Any]]
+  final def submitBroadcastJob: scala.concurrent.Future[broadcast.Broadcast[Any]] = executeQuery {


This is kind of the AQE version of "execute", as AQE won't call execute of shuffle/broadcast.

Can we add some comments for this method?

SparkQA · 2021-06-24T13:29:21Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44795/

SparkQA · 2021-06-24T13:39:04Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44795/

SparkQA · 2021-06-24T16:10:05Z

Test build #140263 has finished for PR 33058 at commit 1f7e6c1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-06-24T16:57:52Z

Test build #140265 has finished for PR 33058 at commit c964743.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-07-01T10:47:32Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45015/

SparkQA · 2021-07-01T11:26:47Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45017/

SparkQA · 2021-07-01T11:35:54Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45017/

SparkQA · 2021-07-01T12:01:58Z

Test build #140503 has finished for PR 33058 at commit c964743.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

…ore materializing

SparkQA · 2021-07-07T20:28:27Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45276/

SparkQA · 2021-07-07T21:07:12Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45276/

SparkQA · 2021-07-07T23:38:43Z

Test build #140764 has finished for PR 33058 at commit 6a7e388.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yaooqinn · 2021-07-08T15:53:47Z

LGTM

…sh before materializing ### What changes were proposed in this pull request? Currently, AQE uses a very tricky way to trigger and wait for the subqueries: 1. submitting stage calls `QueryStageExec.materialize` 2. `QueryStageExec.materialize` calls `executeQuery` 3. `executeQuery` does some preparation works, which goes to `QueryStageExec.doPrepare` 4. `QueryStageExec.doPrepare` calls `prepare` of shuffle/broadcast, which triggers all the subqueries in this stage 5. `executeQuery` then calls `waitForSubqueries`, which does nothing because `QueryStageExec` itself has no subqueries 6. then we submit the shuffle/broadcast job, without waiting for subqueries 7. for `ShuffleExchangeExec.mapOutputStatisticsFuture`, it calls `child.execute`, which calls `executeQuery` and wait for subqueries in the query tree of `child` 8. The only missing case is: `ShuffleExchangeExec` itself may contain subqueries(repartition expression) and AQE doesn't wait for it. A simple fix would be overwriting `waitForSubqueries` in `QueryStageExec`, and forward the request to shuffle/broadcast, but this PR proposes a different and probably cleaner way: we follow `execute`/`doExecute` in `SparkPlan`, and add similar APIs in the AQE version of "execute", which gets a future from shuffle/broadcast. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? a query fails without the fix and can run now ### How was this patch tested? new test Closes #33058 from cloud-fan/aqe. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 2df67a1) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

cloud-fan · 2021-07-08T16:21:25Z

thanks for the review, merging to master/3.2!

…sh before materializing Currently, AQE uses a very tricky way to trigger and wait for the subqueries: 1. submitting stage calls `QueryStageExec.materialize` 2. `QueryStageExec.materialize` calls `executeQuery` 3. `executeQuery` does some preparation works, which goes to `QueryStageExec.doPrepare` 4. `QueryStageExec.doPrepare` calls `prepare` of shuffle/broadcast, which triggers all the subqueries in this stage 5. `executeQuery` then calls `waitForSubqueries`, which does nothing because `QueryStageExec` itself has no subqueries 6. then we submit the shuffle/broadcast job, without waiting for subqueries 7. for `ShuffleExchangeExec.mapOutputStatisticsFuture`, it calls `child.execute`, which calls `executeQuery` and wait for subqueries in the query tree of `child` 8. The only missing case is: `ShuffleExchangeExec` itself may contain subqueries(repartition expression) and AQE doesn't wait for it. A simple fix would be overwriting `waitForSubqueries` in `QueryStageExec`, and forward the request to shuffle/broadcast, but this PR proposes a different and probably cleaner way: we follow `execute`/`doExecute` in `SparkPlan`, and add similar APIs in the AQE version of "execute", which gets a future from shuffle/broadcast. bug fix a query fails without the fix and can run now new test Closes apache#33058 from cloud-fan/aqe. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 2df67a1) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…eries to finish before materializing (#998) * [SPARK-35874][SQL] AQE Shuffle should wait for its subqueries to finish before materializing Currently, AQE uses a very tricky way to trigger and wait for the subqueries: 1. submitting stage calls `QueryStageExec.materialize` 2. `QueryStageExec.materialize` calls `executeQuery` 3. `executeQuery` does some preparation works, which goes to `QueryStageExec.doPrepare` 4. `QueryStageExec.doPrepare` calls `prepare` of shuffle/broadcast, which triggers all the subqueries in this stage 5. `executeQuery` then calls `waitForSubqueries`, which does nothing because `QueryStageExec` itself has no subqueries 6. then we submit the shuffle/broadcast job, without waiting for subqueries 7. for `ShuffleExchangeExec.mapOutputStatisticsFuture`, it calls `child.execute`, which calls `executeQuery` and wait for subqueries in the query tree of `child` 8. The only missing case is: `ShuffleExchangeExec` itself may contain subqueries(repartition expression) and AQE doesn't wait for it. A simple fix would be overwriting `waitForSubqueries` in `QueryStageExec`, and forward the request to shuffle/broadcast, but this PR proposes a different and probably cleaner way: we follow `execute`/`doExecute` in `SparkPlan`, and add similar APIs in the AQE version of "execute", which gets a future from shuffle/broadcast. bug fix a query fails without the fix and can run now new test Closes #33058 from cloud-fan/aqe. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 2df67a1) Signed-off-by: Wenchen Fan <wenchen@databricks.com> * fix ut Co-authored-by: Wenchen Fan <wenchen@databricks.com>

github-actions bot added the SQL label Jun 24, 2021

cloud-fan force-pushed the aqe branch from 1f7e6c1 to c964743 Compare June 24, 2021 12:37

cloud-fan commented Jun 24, 2021

View reviewed changes

cloud-fan force-pushed the aqe branch 2 times, most recently from 00952f2 to c964743 Compare July 1, 2021 08:40

SPARK-35874: AQE Shuffle should wait for its subqueries to finish bef…

6a7e388

…ore materializing

cloud-fan force-pushed the aqe branch from c964743 to 6a7e388 Compare July 7, 2021 19:19

yaooqinn approved these changes Jul 8, 2021

View reviewed changes

cloud-fan closed this in 2df67a1 Jul 8, 2021

viirya approved these changes Jul 8, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-35874][SQL] AQE Shuffle should wait for its subqueries to finish before materializing #33058

[SPARK-35874][SQL] AQE Shuffle should wait for its subqueries to finish before materializing #33058

cloud-fan commented Jun 24, 2021 •

edited

Loading

cloud-fan commented Jun 24, 2021

SparkQA commented Jun 24, 2021

SparkQA commented Jun 24, 2021

cloud-fan Jun 24, 2021

JkSelf Jun 25, 2021

SparkQA commented Jun 24, 2021

SparkQA commented Jun 24, 2021

SparkQA commented Jun 24, 2021

SparkQA commented Jun 24, 2021

SparkQA commented Jul 1, 2021

SparkQA commented Jul 1, 2021

SparkQA commented Jul 1, 2021

SparkQA commented Jul 1, 2021

SparkQA commented Jul 7, 2021

SparkQA commented Jul 7, 2021

SparkQA commented Jul 7, 2021

yaooqinn commented Jul 8, 2021

cloud-fan commented Jul 8, 2021

[SPARK-35874][SQL] AQE Shuffle should wait for its subqueries to finish before materializing #33058

[SPARK-35874][SQL] AQE Shuffle should wait for its subqueries to finish before materializing #33058

Conversation

cloud-fan commented Jun 24, 2021 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

cloud-fan commented Jun 24, 2021

SparkQA commented Jun 24, 2021

SparkQA commented Jun 24, 2021

cloud-fan Jun 24, 2021

Choose a reason for hiding this comment

JkSelf Jun 25, 2021

Choose a reason for hiding this comment

SparkQA commented Jun 24, 2021

SparkQA commented Jun 24, 2021

SparkQA commented Jun 24, 2021

SparkQA commented Jun 24, 2021

SparkQA commented Jul 1, 2021

SparkQA commented Jul 1, 2021

SparkQA commented Jul 1, 2021

SparkQA commented Jul 1, 2021

SparkQA commented Jul 7, 2021

SparkQA commented Jul 7, 2021

SparkQA commented Jul 7, 2021

yaooqinn commented Jul 8, 2021

cloud-fan commented Jul 8, 2021

cloud-fan commented Jun 24, 2021 •

edited

Loading