-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-35874][SQL] AQE Shuffle should wait for its subqueries to finish before materializing #33058
Conversation
Kubernetes integration test starting |
Kubernetes integration test status failure |
@@ -58,7 +58,11 @@ trait BroadcastExchangeLike extends Exchange { | |||
* For registering callbacks on `relationFuture`. | |||
* Note that calling this method may not start the execution of broadcast job. | |||
*/ | |||
def completionFuture: scala.concurrent.Future[broadcast.Broadcast[Any]] | |||
final def submitBroadcastJob: scala.concurrent.Future[broadcast.Broadcast[Any]] = executeQuery { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of the AQE version of "execute", as AQE won't call execute
of shuffle/broadcast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some comments for this method?
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #140263 has finished for PR 33058 at commit
|
Test build #140265 has finished for PR 33058 at commit
|
00952f2
to
c964743
Compare
Kubernetes integration test unable to build dist. exiting with code: 1 |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #140503 has finished for PR 33058 at commit
|
…ore materializing
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #140764 has finished for PR 33058 at commit
|
LGTM |
…sh before materializing ### What changes were proposed in this pull request? Currently, AQE uses a very tricky way to trigger and wait for the subqueries: 1. submitting stage calls `QueryStageExec.materialize` 2. `QueryStageExec.materialize` calls `executeQuery` 3. `executeQuery` does some preparation works, which goes to `QueryStageExec.doPrepare` 4. `QueryStageExec.doPrepare` calls `prepare` of shuffle/broadcast, which triggers all the subqueries in this stage 5. `executeQuery` then calls `waitForSubqueries`, which does nothing because `QueryStageExec` itself has no subqueries 6. then we submit the shuffle/broadcast job, without waiting for subqueries 7. for `ShuffleExchangeExec.mapOutputStatisticsFuture`, it calls `child.execute`, which calls `executeQuery` and wait for subqueries in the query tree of `child` 8. The only missing case is: `ShuffleExchangeExec` itself may contain subqueries(repartition expression) and AQE doesn't wait for it. A simple fix would be overwriting `waitForSubqueries` in `QueryStageExec`, and forward the request to shuffle/broadcast, but this PR proposes a different and probably cleaner way: we follow `execute`/`doExecute` in `SparkPlan`, and add similar APIs in the AQE version of "execute", which gets a future from shuffle/broadcast. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? a query fails without the fix and can run now ### How was this patch tested? new test Closes #33058 from cloud-fan/aqe. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 2df67a1) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
thanks for the review, merging to master/3.2! |
…sh before materializing Currently, AQE uses a very tricky way to trigger and wait for the subqueries: 1. submitting stage calls `QueryStageExec.materialize` 2. `QueryStageExec.materialize` calls `executeQuery` 3. `executeQuery` does some preparation works, which goes to `QueryStageExec.doPrepare` 4. `QueryStageExec.doPrepare` calls `prepare` of shuffle/broadcast, which triggers all the subqueries in this stage 5. `executeQuery` then calls `waitForSubqueries`, which does nothing because `QueryStageExec` itself has no subqueries 6. then we submit the shuffle/broadcast job, without waiting for subqueries 7. for `ShuffleExchangeExec.mapOutputStatisticsFuture`, it calls `child.execute`, which calls `executeQuery` and wait for subqueries in the query tree of `child` 8. The only missing case is: `ShuffleExchangeExec` itself may contain subqueries(repartition expression) and AQE doesn't wait for it. A simple fix would be overwriting `waitForSubqueries` in `QueryStageExec`, and forward the request to shuffle/broadcast, but this PR proposes a different and probably cleaner way: we follow `execute`/`doExecute` in `SparkPlan`, and add similar APIs in the AQE version of "execute", which gets a future from shuffle/broadcast. bug fix a query fails without the fix and can run now new test Closes apache#33058 from cloud-fan/aqe. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 2df67a1) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…eries to finish before materializing (#998) * [SPARK-35874][SQL] AQE Shuffle should wait for its subqueries to finish before materializing Currently, AQE uses a very tricky way to trigger and wait for the subqueries: 1. submitting stage calls `QueryStageExec.materialize` 2. `QueryStageExec.materialize` calls `executeQuery` 3. `executeQuery` does some preparation works, which goes to `QueryStageExec.doPrepare` 4. `QueryStageExec.doPrepare` calls `prepare` of shuffle/broadcast, which triggers all the subqueries in this stage 5. `executeQuery` then calls `waitForSubqueries`, which does nothing because `QueryStageExec` itself has no subqueries 6. then we submit the shuffle/broadcast job, without waiting for subqueries 7. for `ShuffleExchangeExec.mapOutputStatisticsFuture`, it calls `child.execute`, which calls `executeQuery` and wait for subqueries in the query tree of `child` 8. The only missing case is: `ShuffleExchangeExec` itself may contain subqueries(repartition expression) and AQE doesn't wait for it. A simple fix would be overwriting `waitForSubqueries` in `QueryStageExec`, and forward the request to shuffle/broadcast, but this PR proposes a different and probably cleaner way: we follow `execute`/`doExecute` in `SparkPlan`, and add similar APIs in the AQE version of "execute", which gets a future from shuffle/broadcast. bug fix a query fails without the fix and can run now new test Closes #33058 from cloud-fan/aqe. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 2df67a1) Signed-off-by: Wenchen Fan <wenchen@databricks.com> * fix ut Co-authored-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Currently, AQE uses a very tricky way to trigger and wait for the subqueries:
QueryStageExec.materialize
QueryStageExec.materialize
callsexecuteQuery
executeQuery
does some preparation works, which goes toQueryStageExec.doPrepare
QueryStageExec.doPrepare
callsprepare
of shuffle/broadcast, which triggers all the subqueries in this stageexecuteQuery
then callswaitForSubqueries
, which does nothing becauseQueryStageExec
itself has no subqueriesShuffleExchangeExec.mapOutputStatisticsFuture
, it callschild.execute
, which callsexecuteQuery
and wait for subqueries in the query tree ofchild
ShuffleExchangeExec
itself may contain subqueries(repartition expression) and AQE doesn't wait for it.A simple fix would be overwriting
waitForSubqueries
inQueryStageExec
, and forward the request to shuffle/broadcast, but this PR proposes a different and probably cleaner way: we followexecute
/doExecute
inSparkPlan
, and add similar APIs in the AQE version of "execute", which gets a future from shuffle/broadcast.Why are the changes needed?
bug fix
Does this PR introduce any user-facing change?
a query fails without the fix and can run now
How was this patch tested?
new test