[SPARK-31475][SQL] Broadcast stage in AQE did not timeout#28250
[SPARK-31475][SQL] Broadcast stage in AQE did not timeout#28250maryannxue wants to merge 2 commits intoapache:masterfrom
Conversation
|
Test build #121429 has finished for PR 28250 at commit
|
| val e = intercept[Exception] { | ||
| testDf.collect() | ||
| } | ||
| AdaptiveTestUtils.assertExceptionMessage(e, s"Could not execute broadcast in $timeout secs.") |
There was a problem hiding this comment.
so this test runs 30 seconds? Can we make it a bit shorter?
| override def run(): Unit = { | ||
| promise.tryFailure(new SparkException(s"Could not execute broadcast in $timeout secs. " + | ||
| s"You can increase the timeout for broadcasts via ${SQLConf.BROADCAST_TIMEOUT.key} or " + | ||
| s"disable broadcast join by setting ${SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key} to -1")) |
There was a problem hiding this comment.
Cancel the job group as BroadcastExchangeExec does?
There was a problem hiding this comment.
This is done in the AQE mechanism already: after the timeout happens, this will become a StageFailure event in the AQE event queue, which will trigger a cleanup that calls the cancel() routine of each running query stage (including the broadcast stage that has timed out). And a broadcast stage's cancel() stops the broadcast thread as well as the job group.
There was a problem hiding this comment.
I see, thanks for your explanation :)
|
Test build #121534 has finished for PR 28250 at commit
|
|
Thanks! Merged to master/3.0 |
### What changes were proposed in this pull request? This PR adds a timeout for the Future of a BroadcastQueryStageExec to make sure it can have the same timeout behavior as a non-AQE broadcast exchange. ### Why are the changes needed? This is to make the broadcast timeout behavior in AQE consistent with that in non-AQE. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added UT. Closes #28250 from maryannxue/aqe-broadcast-timeout. Authored-by: Maryann Xue <maryann.xue@gmail.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com> (cherry picked from commit 44d370d) Signed-off-by: gatorsmile <gatorsmile@gmail.com>
What changes were proposed in this pull request?
This PR adds a timeout for the Future of a BroadcastQueryStageExec to make sure it can have the same timeout behavior as a non-AQE broadcast exchange.
Why are the changes needed?
This is to make the broadcast timeout behavior in AQE consistent with that in non-AQE.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added UT.