[SPARK-36414][SQL] Disable timeout for BroadcastQueryStageExec in AQE #33636

yaooqinn · 2021-08-04T10:58:16Z

What changes were proposed in this pull request?

This reverts SPARK-31475, as there are always more concurrent jobs running in AQE mode, especially when running multiple queries at the same time. Currently, the broadcast timeout does not record accurately for the BroadcastQueryStageExec only, but also including the time waiting for being scheduled. If all the resources are currently being occupied for materializing other stages, it timeouts without a chance to run actually.

The default value is 300s, and it's hard to adjust the timeout for AQE mode. Usually, you need an extremely large number for real-world cases. As you can see in the example, above, the timeout we used for it was 1800s, and obviously, it needed 3x more or something

Why are the changes needed?

AQE is default now, we can make it more stable with this PR

Does this PR introduce any user-facing change?

yes, broadcast timeout now is not used for AQE

How was this patch tested?

modified test

SparkQA · 2021-08-04T11:46:19Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46547/

SparkQA · 2021-08-04T12:37:40Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46547/

SparkQA · 2021-08-04T13:17:06Z

Test build #142037 has started for PR 33636 at commit b5870e6.

SparkQA · 2021-08-04T13:21:23Z

Test build #142035 has finished for PR 33636 at commit 88be9e9.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-08-04T14:15:05Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46549/

SparkQA · 2021-08-04T15:14:22Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46549/

HyukjinKwon · 2021-08-05T01:28:30Z

retest this please

SparkQA · 2021-08-05T02:50:30Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46568/

SparkQA · 2021-08-05T03:29:21Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46568/

cloud-fan · 2021-08-05T04:54:30Z

makes sense to me, cc @maryannxue

SparkQA · 2021-08-05T06:22:22Z

Test build #142057 has finished for PR 33636 at commit b5870e6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-08-05T13:15:30Z

thanks, merging to master/3.2!

### What changes were proposed in this pull request? This reverts SPARK-31475, as there are always more concurrent jobs running in AQE mode, especially when running multiple queries at the same time. Currently, the broadcast timeout does not record accurately for the BroadcastQueryStageExec only, but also including the time waiting for being scheduled. If all the resources are currently being occupied for materializing other stages, it timeouts without a chance to run actually. ![image](https://user-images.githubusercontent.com/8326978/128169612-4c96c8f6-6f8e-48ed-8eaf-450f87982c3b.png) The default value is 300s, and it's hard to adjust the timeout for AQE mode. Usually, you need an extremely large number for real-world cases. As you can see in the example, above, the timeout we used for it was 1800s, and obviously, it needed 3x more or something ### Why are the changes needed? AQE is default now, we can make it more stable with this PR ### Does this PR introduce _any_ user-facing change? yes, broadcast timeout now is not used for AQE ### How was this patch tested? modified test Closes #33636 from yaooqinn/SPARK-36414. Authored-by: Kent Yao <yao@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 0c94e47) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

dongjoon-hyun · 2021-08-05T16:39:42Z

Could you make a backport to branch-3.1 please, @yaooqinn ?

yaooqinn · 2021-08-06T03:17:54Z

Could you make a backport to branch-3.1 please, @yaooqinn ?

OK @dongjoon-hyun

AngersZhuuuu · 2022-11-11T02:31:34Z

Could you make a backport to branch-3.1 please, @yaooqinn ?

OK @dongjoon-hyun

Seems missed?

dongjoon-hyun · 2022-11-11T02:33:44Z

@AngersZhuuuu . It's too late if this is not backported already.
branch-3.1 is EOL as of today.

AngersZhuuuu · 2022-11-11T02:36:13Z

@AngersZhuuuu . It's too late if this is not backported already. branch-3.1 is EOL as of today.

Yea

[SPARK-36414][SQL] Disable timeout for BroadcastQueryStageExec in AQE

88be9e9

github-actions bot added the SQL label Aug 4, 2021

nit

b5870e6

yaooqinn requested a review from cloud-fan August 5, 2021 02:01

cloud-fan closed this in 0c94e47 Aug 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-36414][SQL] Disable timeout for BroadcastQueryStageExec in AQE #33636

[SPARK-36414][SQL] Disable timeout for BroadcastQueryStageExec in AQE #33636

yaooqinn commented Aug 4, 2021 •

edited

Loading

SparkQA commented Aug 4, 2021

SparkQA commented Aug 4, 2021

SparkQA commented Aug 4, 2021

SparkQA commented Aug 4, 2021

SparkQA commented Aug 4, 2021

SparkQA commented Aug 4, 2021

HyukjinKwon commented Aug 5, 2021

SparkQA commented Aug 5, 2021

SparkQA commented Aug 5, 2021

cloud-fan commented Aug 5, 2021

SparkQA commented Aug 5, 2021

cloud-fan commented Aug 5, 2021

dongjoon-hyun commented Aug 5, 2021

yaooqinn commented Aug 6, 2021

AngersZhuuuu commented Nov 11, 2022

dongjoon-hyun commented Nov 11, 2022 •

edited

Loading

AngersZhuuuu commented Nov 11, 2022

[SPARK-36414][SQL] Disable timeout for BroadcastQueryStageExec in AQE #33636

[SPARK-36414][SQL] Disable timeout for BroadcastQueryStageExec in AQE #33636

Conversation

yaooqinn commented Aug 4, 2021 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Aug 4, 2021

SparkQA commented Aug 4, 2021

SparkQA commented Aug 4, 2021

SparkQA commented Aug 4, 2021

SparkQA commented Aug 4, 2021

SparkQA commented Aug 4, 2021

HyukjinKwon commented Aug 5, 2021

SparkQA commented Aug 5, 2021

SparkQA commented Aug 5, 2021

cloud-fan commented Aug 5, 2021

SparkQA commented Aug 5, 2021

cloud-fan commented Aug 5, 2021

dongjoon-hyun commented Aug 5, 2021

yaooqinn commented Aug 6, 2021

AngersZhuuuu commented Nov 11, 2022

dongjoon-hyun commented Nov 11, 2022 • edited Loading

AngersZhuuuu commented Nov 11, 2022

yaooqinn commented Aug 4, 2021 •

edited

Loading

dongjoon-hyun commented Nov 11, 2022 •

edited

Loading