[SPARK-15917][CORE] Added support for number of executors in Standalone [WIP] #15405

JonathanTaws · 2016-10-09T14:55:37Z

What changes were proposed in this pull request?

Currently in standalone mode it is not possible to set the number of executors by using the --num-executors or spark.executor.instances property. Instead, as many executors as possible will be spawned based on the available resources and the properties set.
This patch corrects that to support the number of executors property.

Here's the new behavior :

If the executor.cores property isn't set, we will try to spawn one executor on each worker taking all of the cores available (like the default value) while the number of workers < number of executors requested. If we can't launch the specified number of executors, a warning is logged.
If the executor.cores property is set (repeat the same logic for executor.memory):
- and executor.instances * executor.cores <= cores.max, then executor.instances will be spawned,
- and executor.instances * executor.cores > cores.max, then as many executors will be spawned as it is possible - basically the previous behavior when only executor.cores was set - but we also log a warning saying we couldn't spawn the requested number of executors,

In the case where executor.memory is set, all constraints are taken into account based on the number of cores and memory per worker assigned (same logic as with the cores).

How was this patch tested?

I tested this patch by running a simple Spark app in standalone mode and specifying the --num-executors or spark.executor.instances property, and checking if the number of executors was coherent based on the available resources and the requested number of executors.
I plan on testing this patch by adding tests in MasterSuite and running the usual /dev/run-tests.

…can't be satisfied

… multiple times

andrewor14 · 2016-10-10T19:46:55Z

add to whitelist

andrewor14 · 2016-10-10T19:51:47Z

core/src/main/scala/org/apache/spark/deploy/master/Master.scala

+    val numExecutorsLaunched = app.executors.size
+    // Check to see if we managed to launch the requested number of executors
+    if(numUsable != 0 && numExecutorsLaunched != app.executorLimit &&
+      numExecutorsScheduled != app.executorLimit) {


How are numExecutorsLaunched and numExecutorsScheduled related to each other? Also here we probably want to do an inequality check just in case.

Also style: need space after if

Another thing is, how noisy is this? Do we log this if dynamic allocation is turned on (we shouldn't)?

numExecutorsLaunched corresponds to the actual number of executors that have been launched so far (literally that have been registered in the executors list in the ApplicationInfo), whereas numExecutorsScheduled corresponds to the number of executors that have been scheduled/allocated by scheduleExecutorsOnWorkers. This is needed because scheduleExecutorsOnWorkers is called multiple times when setting up the executors, and if we don't check the condition we will log repeatedly the same message but with incorrect information (such as "0 executors launched" even though the executors have been launched previously).
Tell me if that doesn't make sense, I did a lot of trial and error until coming up with this condition.

Regarding the noise produced, it should be quite minimal. When it's not possible to launch the number of executors requested, just one warning is logged.
With dynamic allocation on, a message is logged when the initial number of executors is specified and it couldn't be satisfied. I don't think it's too much of a problem as there isn't any warning currently for that, but I can add a check to remove the warning when dynamic allocation is enabled if you prefer.

andrewor14 · 2016-10-10T19:53:05Z

Thanks for working on this. It's great to see how small the patch turned out to be!

SparkQA · 2016-10-10T22:13:04Z

Test build #66675 has finished for PR 15405 at commit eed3ecd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-10-10T22:52:18Z

Test build #66681 has finished for PR 15405 at commit bffedac.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-10-11T10:35:40Z

Test build #3323 has finished for PR 15405 at commit bffedac.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jiangxb1987 · 2017-06-13T02:54:10Z

Are you still working on this? @JonathanTaws

JonathanTaws · 2017-06-13T11:08:13Z

Hi Jiang, I've put this on hold as I wasn't getting updates from the admins on the next steps for this. I'd definitely like to move on with this and contribute it to the codebase, as I belive it's still relevant nowadays. Let me know! Le 13 juin 2017 04:54, "Jiang Xingbo" <notifications@github.com> a écrit : Are you still working on this? @JonathanTaws <https://github.com/jonathantaws> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#15405 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFS21qYo6hs8nM2K2Yus5rM33vPjsCZSks5sDfnxgaJpZM4KSBQa> .

jiangxb1987 · 2017-06-14T01:32:37Z

I see this is WIP, when do you think it will be ready for review? Thanks!

JonathanTaws · 2017-06-15T13:46:26Z

My bad, should have removed it. I'll check it's working as expected this weekend and we can move forward on it! Le 14 juin 2017 03:33, "Jiang Xingbo" <notifications@github.com> a écrit :

…

I see this is WIP, when do you think it will be ready for review? Thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#15405 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFS21gNJxKCLiugvq3CCyVVP94DLEvfWks5sDzhXgaJpZM4KSBQa> .

jiangxb1987 · 2017-06-25T14:32:54Z

ping @JonathanTaws Please let me know once this PR is ready for review, thanks!

JonathanTaws · 2017-06-30T07:09:55Z

@jiang Quite busy at the moment, will take care of it as soon as possible. I'll ping you once it's done Le 25 juin 2017 16:33, "Jiang Xingbo" <notifications@github.com> a écrit : ping @JonathanTaws <https://github.com/jonathantaws> Please let me know once this PR is ready for review, thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#15405 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFS21riOyIG178z5LukgJj2hxxAV1Hljks5sHm-zgaJpZM4KSBQa> .

JonathanTaws added 4 commits June 24, 2016 12:23

[SPARK-15917] Added support for number of executors for Standalone mode

f45a673

[SPARK-15917] Added warning message if requested number of executors …

d0b1a71

…can't be satisfied

Added check on number of workers to avoid displaying the same message…

0af7b10

… multiple times

Improved check on num executors warning message

eed3ecd

andrewor14 reviewed Oct 10, 2016

View reviewed changes

Corrected style mistake on if statement

bffedac

HyukjinKwon mentioned this pull request Jun 25, 2017

[INFRA] Close stale PRs #18417

Closed

HyukjinKwon mentioned this pull request Jul 31, 2017

[INFRA] Close stale PRs #18780

Closed

asfgit closed this in 3a45c7f Aug 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-15917][CORE] Added support for number of executors in Standalone [WIP] #15405

[SPARK-15917][CORE] Added support for number of executors in Standalone [WIP] #15405

JonathanTaws commented Oct 9, 2016

andrewor14 commented Oct 10, 2016

andrewor14 Oct 10, 2016

andrewor14 Oct 10, 2016

JonathanTaws Oct 10, 2016

JonathanTaws Oct 10, 2016

andrewor14 commented Oct 10, 2016

SparkQA commented Oct 10, 2016

SparkQA commented Oct 10, 2016

SparkQA commented Oct 11, 2016

jiangxb1987 commented Jun 13, 2017

JonathanTaws commented Jun 13, 2017 via email

jiangxb1987 commented Jun 14, 2017

JonathanTaws commented Jun 15, 2017 via email

jiangxb1987 commented Jun 25, 2017

JonathanTaws commented Jun 30, 2017 via email

[SPARK-15917][CORE] Added support for number of executors in Standalone [WIP] #15405

[SPARK-15917][CORE] Added support for number of executors in Standalone [WIP] #15405

Conversation

JonathanTaws commented Oct 9, 2016

What changes were proposed in this pull request?

How was this patch tested?

andrewor14 commented Oct 10, 2016

andrewor14 Oct 10, 2016

Choose a reason for hiding this comment

andrewor14 Oct 10, 2016

Choose a reason for hiding this comment

JonathanTaws Oct 10, 2016

Choose a reason for hiding this comment

JonathanTaws Oct 10, 2016

Choose a reason for hiding this comment

andrewor14 commented Oct 10, 2016

SparkQA commented Oct 10, 2016

SparkQA commented Oct 10, 2016

SparkQA commented Oct 11, 2016

jiangxb1987 commented Jun 13, 2017

JonathanTaws commented Jun 13, 2017 via email

jiangxb1987 commented Jun 14, 2017

JonathanTaws commented Jun 15, 2017 via email

jiangxb1987 commented Jun 25, 2017

JonathanTaws commented Jun 30, 2017 via email