Skip to content

Conversation

@Deegue
Copy link
Contributor

@Deegue Deegue commented Nov 15, 2019

What changes were proposed in this pull request?

The minimum runtime to speculation used to be a fixed value 100ms. It means tasks finished in seconds will also be speculated and more executors will be required.
To improve this situation, we add spark.speculation.minRuntime to control the minimum runtime limit of speculation.
We can reduce normal tasks to be speculated by adjusting spark.speculation.minRuntime.

Example:
Tasks that don't need to be speculated:
image
and
image

Tasks are more likely to go wrong and need to be speculated:
(especially those shuffle tasks with large amount of data and will cost minutes even hours)
image

Why are the changes needed?

To improve speculation performance by reducing speculated tasks which don't need to be speculated actually.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit tests.

@Deegue Deegue changed the title [SPARK-29786][SQL] Optimize speculation performance by minimum runtime limit [SPARK-29910][SQL] Optimize speculation performance by adding minimum runtime limit Nov 15, 2019
@jiangxb1987
Copy link
Contributor

What you really want is to make MIN_TIME_TO_SPECULATION configurable? I'm not sure whether it would be a good idea because it's a fixed global value, so changing the min time should affect all the jobs. Also, if we feel speculation is not necessary then we can set spark.speculation to false.

@Deegue
Copy link
Contributor Author

Deegue commented Nov 18, 2019

What you really want is to make MIN_TIME_TO_SPECULATION configurable? I'm not sure whether it would be a good idea because it's a fixed global value, so changing the min time should affect all the jobs. Also, if we feel speculation is not necessary then we can set spark.speculation to false.

Hi @jiangxb1987 , thanks for your comment. This PR aims to improve the behavior when spark.speculation is true by configuring MIN_TIME_TO_SPECULATION. The default value of spark.speculation.minRuntime is 100(ms) which is the same as before.

Besides, increasing spark.speculation.minRuntime will reduce the number of extra executors allocated for speculation and in other words, allocating those executors to the task which is more needed to be speculated.

@Deegue
Copy link
Contributor Author

Deegue commented Nov 19, 2019

Gentle ping, @cloud-fan

@cloud-fan
Copy link
Contributor

In general it's OK to make a hardcoded value configurable, but I doubt if it's really useful in this case. If you have different kinds of tasks that need different speculation min time, then a global config doesn't help. You need a per job(or even per stage) config.

@Deegue
Copy link
Contributor Author

Deegue commented Nov 21, 2019

In general it's OK to make a hardcoded value configurable, but I doubt if it's really useful in this case. If you have different kinds of tasks that need different speculation min time, then a global config doesn't help. You need a per job(or even per stage) config.

Thanks for your review @cloud-fan .
Before this patch, the minimum runtime is set to 100ms which means almost all the tasks meet the condition will be speculated. After this patch and setting it to 60s, a lot of tasks finished in seconds won't be speculated and our cluster is in better performance. Thus I think it's useful in this case.

As for jobs and stages, spark.speculation.quantile and spark.speculation.multiplier will help to judge which task need to be speculated in different situations.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Apr 25, 2020
@github-actions github-actions bot closed this Apr 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants