Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-15917][CORE] Added support for number of executors in Standalone [WIP] #15405

Closed
wants to merge 5 commits into from

Conversation

JonathanTaws
Copy link

What changes were proposed in this pull request?

Currently in standalone mode it is not possible to set the number of executors by using the --num-executors or spark.executor.instances property. Instead, as many executors as possible will be spawned based on the available resources and the properties set.
This patch corrects that to support the number of executors property.

Here's the new behavior :

  • If the executor.cores property isn't set, we will try to spawn one executor on each worker taking all of the cores available (like the default value) while the number of workers < number of executors requested. If we can't launch the specified number of executors, a warning is logged.
  • If the executor.cores property is set (repeat the same logic for executor.memory):
    • and executor.instances * executor.cores <= cores.max, then executor.instances will be spawned,
    • and executor.instances * executor.cores > cores.max, then as many executors will be spawned as it is possible - basically the previous behavior when only executor.cores was set - but we also log a warning saying we couldn't spawn the requested number of executors,

In the case where executor.memory is set, all constraints are taken into account based on the number of cores and memory per worker assigned (same logic as with the cores).

How was this patch tested?

I tested this patch by running a simple Spark app in standalone mode and specifying the --num-executors or spark.executor.instances property, and checking if the number of executors was coherent based on the available resources and the requested number of executors.
I plan on testing this patch by adding tests in MasterSuite and running the usual /dev/run-tests.

@andrewor14
Copy link
Contributor

add to whitelist

val numExecutorsLaunched = app.executors.size
// Check to see if we managed to launch the requested number of executors
if(numUsable != 0 && numExecutorsLaunched != app.executorLimit &&
numExecutorsScheduled != app.executorLimit) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are numExecutorsLaunched and numExecutorsScheduled related to each other? Also here we probably want to do an inequality check just in case.

Also style: need space after if

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing is, how noisy is this? Do we log this if dynamic allocation is turned on (we shouldn't)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numExecutorsLaunched corresponds to the actual number of executors that have been launched so far (literally that have been registered in the executors list in the ApplicationInfo), whereas numExecutorsScheduled corresponds to the number of executors that have been scheduled/allocated by scheduleExecutorsOnWorkers. This is needed because scheduleExecutorsOnWorkers is called multiple times when setting up the executors, and if we don't check the condition we will log repeatedly the same message but with incorrect information (such as "0 executors launched" even though the executors have been launched previously).
Tell me if that doesn't make sense, I did a lot of trial and error until coming up with this condition.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the noise produced, it should be quite minimal. When it's not possible to launch the number of executors requested, just one warning is logged.
With dynamic allocation on, a message is logged when the initial number of executors is specified and it couldn't be satisfied. I don't think it's too much of a problem as there isn't any warning currently for that, but I can add a check to remove the warning when dynamic allocation is enabled if you prefer.

@andrewor14
Copy link
Contributor

Thanks for working on this. It's great to see how small the patch turned out to be!

@SparkQA
Copy link

SparkQA commented Oct 10, 2016

Test build #66675 has finished for PR 15405 at commit eed3ecd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 10, 2016

Test build #66681 has finished for PR 15405 at commit bffedac.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 11, 2016

Test build #3323 has finished for PR 15405 at commit bffedac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor

Are you still working on this? @JonathanTaws

@JonathanTaws
Copy link
Author

JonathanTaws commented Jun 13, 2017 via email

@jiangxb1987
Copy link
Contributor

I see this is WIP, when do you think it will be ready for review? Thanks!

@JonathanTaws
Copy link
Author

JonathanTaws commented Jun 15, 2017 via email

@jiangxb1987
Copy link
Contributor

ping @JonathanTaws Please let me know once this PR is ready for review, thanks!

@JonathanTaws
Copy link
Author

JonathanTaws commented Jun 30, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants