[SPARK-29910][SQL] Optimize speculation performance by adding minimum runtime limit #26541

Deegue · 2019-11-15T06:44:36Z

What changes were proposed in this pull request?

The minimum runtime to speculation used to be a fixed value 100ms. It means tasks finished in seconds will also be speculated and more executors will be required.
To improve this situation, we add spark.speculation.minRuntime to control the minimum runtime limit of speculation.
We can reduce normal tasks to be speculated by adjusting spark.speculation.minRuntime.

Example:
Tasks that don't need to be speculated:

and

Tasks are more likely to go wrong and need to be speculated:
(especially those shuffle tasks with large amount of data and will cost minutes even hours)

Why are the changes needed?

To improve speculation performance by reducing speculated tasks which don't need to be speculated actually.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit tests.

Merge

merge

Merge

jiangxb1987 · 2019-11-15T22:07:23Z

What you really want is to make MIN_TIME_TO_SPECULATION configurable? I'm not sure whether it would be a good idea because it's a fixed global value, so changing the min time should affect all the jobs. Also, if we feel speculation is not necessary then we can set spark.speculation to false.

Deegue · 2019-11-18T03:29:56Z

What you really want is to make MIN_TIME_TO_SPECULATION configurable? I'm not sure whether it would be a good idea because it's a fixed global value, so changing the min time should affect all the jobs. Also, if we feel speculation is not necessary then we can set spark.speculation to false.

Hi @jiangxb1987 , thanks for your comment. This PR aims to improve the behavior when spark.speculation is true by configuring MIN_TIME_TO_SPECULATION. The default value of spark.speculation.minRuntime is 100(ms) which is the same as before.

Besides, increasing spark.speculation.minRuntime will reduce the number of extra executors allocated for speculation and in other words, allocating those executors to the task which is more needed to be speculated.

Deegue · 2019-11-19T03:40:02Z

Gentle ping, @cloud-fan

cloud-fan · 2019-11-20T08:07:07Z

In general it's OK to make a hardcoded value configurable, but I doubt if it's really useful in this case. If you have different kinds of tasks that need different speculation min time, then a global config doesn't help. You need a per job(or even per stage) config.

Deegue · 2019-11-21T03:20:18Z

In general it's OK to make a hardcoded value configurable, but I doubt if it's really useful in this case. If you have different kinds of tasks that need different speculation min time, then a global config doesn't help. You need a per job(or even per stage) config.

Thanks for your review @cloud-fan .
Before this patch, the minimum runtime is set to 100ms which means almost all the tasks meet the condition will be speculated. After this patch and setting it to 60s, a lot of tasks finished in seconds won't be speculated and our cluster is in better performance. Thus I think it's useful in this case.

As for jobs and stages, spark.speculation.quantile and spark.speculation.multiplier will help to judge which task need to be speculated in different situations.

AmplabJenkins · 2020-01-15T14:51:05Z

Can one of the admins verify this patch?

github-actions · 2020-04-25T00:10:10Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

Deegue added 5 commits October 22, 2019 09:53

Merge pull request #1 from apache/master

90233e4

Merge

Merge pull request #2 from apache/master

7a246d5

merge

Merge pull request #3 from apache/master

783211f

Merge

Merge pull request #4 from apache/master

7625d44

Merge

Add spark.speculation.minRuntime

fbf0f0c

Deegue changed the title ~~[SPARK-29786][SQL] Optimize speculation performance by minimum runtime limit~~ [SPARK-29910][SQL] Optimize speculation performance by adding minimum runtime limit Nov 15, 2019

dongjoon-hyun added the SQL label Nov 15, 2019

github-actions bot added the Stale label Apr 25, 2020

github-actions bot closed this Apr 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-29910][SQL] Optimize speculation performance by adding minimum runtime limit #26541

[SPARK-29910][SQL] Optimize speculation performance by adding minimum runtime limit #26541

Uh oh!

Deegue commented Nov 15, 2019 •

edited

Loading

Uh oh!

jiangxb1987 commented Nov 15, 2019

Uh oh!

Deegue commented Nov 18, 2019 •

edited

Loading

Uh oh!

Deegue commented Nov 19, 2019

Uh oh!

cloud-fan commented Nov 20, 2019

Uh oh!

Deegue commented Nov 21, 2019

Uh oh!

AmplabJenkins commented Jan 15, 2020

Uh oh!

github-actions bot commented Apr 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-29910][SQL] Optimize speculation performance by adding minimum runtime limit #26541

[SPARK-29910][SQL] Optimize speculation performance by adding minimum runtime limit #26541

Uh oh!

Conversation

Deegue commented Nov 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

jiangxb1987 commented Nov 15, 2019

Uh oh!

Deegue commented Nov 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Deegue commented Nov 19, 2019

Uh oh!

cloud-fan commented Nov 20, 2019

Uh oh!

Deegue commented Nov 21, 2019

Uh oh!

AmplabJenkins commented Jan 15, 2020

Uh oh!

github-actions bot commented Apr 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Deegue commented Nov 15, 2019 •

edited

Loading

Deegue commented Nov 18, 2019 •

edited

Loading