[SPARK-22683][CORE] Add a executorAllocationRatio parameter to throttle the parallelism of the dynamic allocation #19881

jcuquemelle · 2017-12-04T18:19:28Z

What changes were proposed in this pull request?

By default, the dynamic allocation will request enough executors to maximize the
parallelism according to the number of tasks to process. While this minimizes the
latency of the job, with small tasks this setting can waste a lot of resources due to
executor allocation overhead, as some executor might not even do any work.
This setting allows to set a ratio that will be used to reduce the number of
target executors w.r.t. full parallelism.

The number of executors computed with this setting is still fenced by
spark.dynamicAllocation.maxExecutors and spark.dynamicAllocation.minExecutors

How was this patch tested?

Units tests and runs on various actual workloads on a Yarn Cluster

srowen · 2017-12-04T18:24:33Z

Please see JIRA. I don't think this is worth doing.

tgravescs · 2018-02-07T15:19:17Z

core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala

    conf.getInt("spark.executor.cores", 1) / conf.getInt("spark.task.cpus", 1)

+  private val tasksPerExecutorSlot = conf.getInt("spark.dynamicAllocation.tasksPerExecutorSlot", 1)


I think we should change the name of this config because spark doesn't have the concept of slots and I think it could be confusing to the users who might expect exactly x tasks to be processed on each executor. I am thinking more along the lines of spark.dynamicAllocation.maxExecutorsPerStageDivisor=max # of executors based on # of tasks required for that stage divided by this number. I'm open to other config names here though.

I think we would also need to define its interaction with spark.dynamicAllocation.maxExecutors as well as how it works as # of running/to be run tasks changes.

tgravescs · 2018-02-07T15:36:05Z

core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala

    conf.getInt("spark.executor.cores", 1) / conf.getInt("spark.task.cpus", 1)

+  private val tasksPerExecutorSlot = conf.getInt("spark.dynamicAllocation.tasksPerExecutorSlot", 1)
+
+  private val tasksPerExecutor = tasksPerExecutorSlot * taskSlotPerExecutor


Since we aren't using concept of slots, I think we should leave the tasksPerExecutor alone and put this functionality into maxNumExecutorsNeeded()

tgravescs · 2018-02-19T17:01:22Z

ping @jcuquemelle can you update this?

jcuquemelle · 2018-03-06T15:40:58Z

Sorry, I didn't see the ping, I will have a look shortly.

jcuquemelle · 2018-03-12T14:58:56Z

The new semantics (throttling w.r.t max possible parallelism) is actually simpler to understand. I'm proposing another name which doesn't have any ambiguity with the existing maxExecutors param, but I'm open to any other name proposal.

tgravescs · 2018-03-12T16:31:48Z

jenkins, test this please

SparkQA · 2018-03-12T16:34:44Z

Test build #88180 has finished for PR 19881 at commit 56c3f43.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

tgravescs · 2018-03-13T13:16:48Z

@jcuquemelle please fix the style

tgravescs · 2018-03-13T13:22:38Z

docs/configuration.md

+  <td>
+    By default, the dynamic allocation will request enough executors to maximize the 
+    parallelism according to the number of tasks to process. While this minimizes the 
+    latency of the job, with small tasks this setting wastes a lot of resources due to


tgravescs · 2018-03-13T13:26:13Z

docs/configuration.md

+    executor allocation overhead, as some executor might not even do any work.
+    This setting allows to set a divisor that will be used to reduce the number of
+    executors w.r.t. full parallelism
+    Defaults to 1.0


I think we should define that maxExecutors trumps this setting.

If I have 10000 tasks, divisor 2, I would expect 5000 executors, but if max executors is 1000, that is all I get.

we should add a test for this interaction as well

tgravescs · 2018-03-13T13:37:07Z

docs/configuration.md

+    latency of the job, with small tasks this setting wastes a lot of resources due to
+    executor allocation overhead, as some executor might not even do any work.
+    This setting allows to set a divisor that will be used to reduce the number of
+    executors w.r.t. full parallelism


add period at end of parallelism

tgravescs · 2018-03-13T14:13:44Z

docs/configuration.md

@@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful
    Lower bound for the number of executors if dynamic allocation is enabled.
  </td>
 </tr>
+<tr>
+  <td><code>spark.dynamicAllocation.fullParallelismDivisor</code></td>


Naming configs is really hard and lots of different opinions on it and in the end someone is going to be confused, I need to think about this some more. I see the reason to use Parallelism here rather then maxExecutors (maxExecutorsDivisor - could be confusing if people think it applies to the maxExecutors config), but I also think parallelism would be confused with the parallelism in the spark.default.parallelism, its not defining number of tasks but number of executors to allocate based on the parallelism. Another one I thought of is executorAllocationDivisor. I'll think about it some more and get back.

How about something like fullAllocationDivisor ? or fullExecutorAllocationDivisor ? I think the naming should reflect the fact that it is a divisor w.r.t. the full possible parallelism/number of executors

sorry didn't get back to this earlier, I think fullExecutorAllocationDivisor would be fine.

felixcheung

could you update the PR title and description to fit the new approach?

jcuquemelle · 2018-03-21T10:39:00Z

@felixcheung: updated PR title and description

tgravescs · 2018-03-21T15:14:34Z

jenkins, test this please

tgravescs · 2018-03-21T15:52:26Z

jenkins, ok to test

SparkQA · 2018-03-21T19:46:56Z

Test build #88472 has finished for PR 19881 at commit a40d160.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tgravescs · 2018-03-21T20:36:02Z

core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala

+    var manager = sc.executorAllocationManager.get
+    post(sc.listenerBus, SparkListenerStageSubmitted(createStageInfo(0, 20)))
+    for (i <- 0 to 5) {
+      addExecutors(manager)


this loop isn't really needed right? All we are checking is the target not the number to add?

If we want to check the capping by max / min executors, we need to actually try and add executors. The max /min capping does not occur during the computation of the target number of exes, but at the time they are added

tgravescs · 2018-03-21T20:38:27Z

just minor comment about the test otherwise looks good.

jiangxb1987 · 2018-03-22T07:35:13Z

core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala

@@ -116,9 +120,12 @@ private[spark] class ExecutorAllocationManager(
  // TODO: The default value of 1 for spark.executor.cores works right now because dynamic
  // allocation is only supported for YARN and the default number of cores per executor in YARN is
  // 1, but it might need to be attained differently for different cluster managers
-  private val tasksPerExecutor =
+  private val tasksPerExecutorForFullParallelism =


We don't really need this variable now, can we just remove it?

it is used at 2 places, one to validate arguments and the other to actually compute the target number of executors. If I remove this variable, I will need to either store spark.executor.cores and spark.task.cpus instead, or to fetch them each time we do a validation or a computation of target nbExecutors

@jiangxb1987, do you agree with my comment, or do you still want me to remove the variable ?

I was originally thinking we may avoid introducing the concept tasksPerExecutorForFullParallelism, but rather only have executorCores and taskCPUs, but I don't have a strong opinion over that.

This is not exposed, it is merely a more precise description of the actual computation. I just wanted to state more clearly that the existing default behavior is maximizing the parallelism

tgravescs · 2018-03-23T13:06:42Z

+1

jiangxb1987 · 2018-03-27T14:46:22Z

cc @rxin

tgravescs · 2018-03-28T14:22:40Z

I'll leave this a bit longer but then I'm going to merge it later today

rxin · 2018-03-28T15:59:07Z

Can you wait another day? I just find the name pretty weird. Do we have other configs that use the “divisor” suffix?

…

On Wed, Mar 28, 2018 at 7:23 AM Tom Graves ***@***.***> wrote: I'll leave this a bit longer but then I'm going to merge it later today — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#19881 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AATvPOFekjRxMQwLNeHMCtxZt92Fv3YGks5ti5z8gaJpZM4Q1Frd> .

tgravescs · 2018-03-28T19:25:51Z

Yes we can wait another day or so if you are looking at it, this discussion has been going on for a long time now though, if you have a better name suggestion let us know. No other configs have "divisor" suffix.s

jcuquemelle · 2018-04-03T16:24:14Z

@rxin , can we merge this PR ?

rxin · 2018-04-06T04:21:37Z

Maybe instead of "divisor", we just have a "rate" or "factor" that can be floating point value, and use multiplication rather than division? This way people can also make it even more aggressive.

jcuquemelle · 2018-04-06T08:15:26Z

@rxin : more aggressive must be forbidden, because the setting of 1.0 gives enough executors so that if the executor provisioning was perfect (e.g. all executors were available at the same time) and the mapping of tasks to executors was optimal, each executor core (or taskSlot as in the original naming) would process exactly one task. If you ask for more executors, you're sure they will be wasted.
Which is why it felt natural to have a divisor semantics, because it implies the parameter can only be used to reduce parallelism.
How about fullParallelismThrottlingRate ?

tgravescs · 2018-04-06T13:42:40Z

core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala

    conf.getInt("spark.executor.cores", 1) / conf.getInt("spark.task.cpus", 1)

+  private val fullExecutorAllocationDivisor =
+    conf.getDouble("spark.dynamicAllocation.fullExecutorAllocationDivisor", 1.0)


forgot about this earlier but this should really be a config similar to DYN_ALLOCATION_MIN_EXECUTORS

tgravescs · 2018-04-06T13:54:07Z

@rxin I assume you are just trying to not use divisor since its not used anywhere else? As @jcuquemelle state I don't see a use case for this to be made more aggressive if you have one please let us know, but otherwise it just wastes resources.

Personally I still like divisor because that is what you are doing. I don't think because its not in any other configs is a good reason to not use it. Looking at I don't see any public configs that have factor in the name of them either. I am not fond of rate because its not a rate (ie how quickly/slowly you are allocating), its a limit on max number of executors.

I also think its more natural for people to think of this as a divisor vs a multiplier. if I want 1/2 of the executors you divide by 2. I think we should name it based on what is most likely understood by the end user.

rxin · 2018-04-09T22:12:09Z

SGTM on divisor.

Do we need "full" there in the config?

tgravescs · 2018-04-10T13:46:15Z

No we don't strictly need it in the name, the reasoning behind it was to indicate that this was a divisor based on if you have fully allocated executors for all the tasks and were running full parallelism.
Are you suggesting just use spark.dynamicAllocation.executorAllocationDivisor? other ones thrown are were like maxExecutorAllocationDivisor. One thing we were trying to keep from doing is confusing it with the maxExecutors config as well. Opinions?

rxin · 2018-04-10T20:42:51Z

I thought about this more, and I actually think something like this makes more sense: executorAllocationRatio. Basically it is just a ratio that determines how aggressive we want Spark to request full executors. Ratio of 1.0 means fill up everything. Ratio of 0.5 means only request half of the executors.

What do you think?

tgravescs · 2018-04-11T13:33:01Z

I'm fine with that

jcuquemelle · 2018-04-12T13:03:19Z

Ok, will quickly do the change
thanks for the proposals

rxin · 2018-04-17T17:30:10Z

Thanks @jcuquemelle

SparkQA · 2018-04-17T21:45:59Z

Test build #89467 has finished for PR 19881 at commit 15732ab.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tgravescs · 2018-04-18T18:11:31Z

core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala

@@ -26,7 +26,10 @@ import scala.util.control.{ControlThrowable, NonFatal}
 import com.codahale.metrics.{Gauge, MetricRegistry}

 import org.apache.spark.internal.Logging
-import org.apache.spark.internal.config.{DYN_ALLOCATION_MAX_EXECUTORS, DYN_ALLOCATION_MIN_EXECUTORS}
+import org.apache.spark.internal.config.{


I would just make this import org.apache.spark.internal.config._

tgravescs · 2018-04-18T18:13:50Z

docs/configuration.md

@@ -1751,6 +1751,7 @@ Apart from these, the following properties are also available, and may be useful
    <code>spark.dynamicAllocation.minExecutors</code>,
    <code>spark.dynamicAllocation.maxExecutors</code>, and
    <code>spark.dynamicAllocation.initialExecutors</code>
+    <code>spark.dynamicAllocation.fullExecutorAllocationDivisor</code>


needs changed to executorAllocationRatio

Done, missed that one, sorry :-)

…executors let's say an executor has spark.executor.cores / spark.task.cpus taskSlots The current dynamic allocation policy allocates enough executors to have each taskSlot execute a single task, which wastes resources when tasks are small regarding executor allocation overhead. By adding the tasksPerExecutorSlot, it is made possible to specify how many tasks a single slot should ideally execute to mitigate the overhead of executor allocation.

This allows for a different semantic, which yields a simpler explanation and allows considering this parameter as a double for a finer control Utests have been updated to actually test the number of executors and have been refactored

…tors

fixed / updated doc

SparkQA · 2018-04-23T17:02:33Z

Test build #89712 has finished for PR 19881 at commit 3b1dddc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tgravescs · 2018-04-24T15:51:21Z

+1

tgravescs requested changes Feb 7, 2018

View reviewed changes

jcuquemelle force-pushed the AddTaskPerExecutorSlot branch 2 times, most recently from 2abd46f to 56c3f43 Compare March 12, 2018 09:30

tgravescs reviewed Mar 13, 2018

View reviewed changes

felixcheung reviewed Mar 14, 2018

View reviewed changes

jcuquemelle force-pushed the AddTaskPerExecutorSlot branch from b2e24f3 to c6641c1 Compare March 20, 2018 17:17

jcuquemelle changed the title ~~[SPARK-22683][CORE] Add tasksPerExecutorSlot parameter~~ [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor parameter to throttle the parallelism of the dynamic allocation Mar 21, 2018

tgravescs reviewed Mar 21, 2018

View reviewed changes

jiangxb1987 reviewed Mar 22, 2018

View reviewed changes

tgravescs reviewed Apr 6, 2018

View reviewed changes

tgravescs mentioned this pull request Apr 16, 2018

[SPARK-16630][YARN] Blacklist a node if executors won't launch on it #21068

Closed

jcuquemelle changed the title ~~[SPARK-22683][CORE] Add a fullExecutorAllocationDivisor parameter to throttle the parallelism of the dynamic allocation~~ [SPARK-22683][CORE] Add a executorAllocationRatio parameter to throttle the parallelism of the dynamic allocation Apr 18, 2018

tgravescs reviewed Apr 18, 2018

View reviewed changes

jcuquemelle added 9 commits April 23, 2018 11:44

[SPARK-22683][DOC] document tasksPerExecutorSlot parameter

c3460e4

fix wording and typo

e2cf46f

Document and test the behavior of DynamicAllocation wrt max/min execu…

9cea578

…tors

fix scala style

4a157c3

rename fullParallelismDivisor to fullExecutorAllocationDivisor.

51118b8

fixed / updated doc

Change semantic to execution allocator ratio

2cf2d1f

use a runtime config with default value for executorAllocationRatio

3b1dddc

jcuquemelle force-pushed the AddTaskPerExecutorSlot branch from 15732ab to 3b1dddc Compare April 23, 2018 09:54

asfgit closed this in 55c4ca8 Apr 24, 2018

		conf.getInt("spark.executor.cores", 1) / conf.getInt("spark.task.cpus", 1)

		private val tasksPerExecutorSlot = conf.getInt("spark.dynamicAllocation.tasksPerExecutorSlot", 1)

[SPARK-22683][CORE] Add a executorAllocationRatio parameter to throttle the parallelism of the dynamic allocation #19881

[SPARK-22683][CORE] Add a executorAllocationRatio parameter to throttle the parallelism of the dynamic allocation #19881

Conversation

jcuquemelle commented Dec 4, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

srowen commented Dec 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgravescs commented Feb 19, 2018

jcuquemelle commented Mar 6, 2018

jcuquemelle commented Mar 12, 2018

tgravescs commented Mar 12, 2018

SparkQA commented Mar 12, 2018

tgravescs commented Mar 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcuquemelle Mar 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felixcheung left a comment

Choose a reason for hiding this comment

jcuquemelle commented Mar 21, 2018

tgravescs commented Mar 21, 2018

tgravescs commented Mar 21, 2018

SparkQA commented Mar 21, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgravescs commented Mar 21, 2018

Choose a reason for hiding this comment

jcuquemelle Mar 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgravescs commented Mar 23, 2018

jiangxb1987 commented Mar 27, 2018

tgravescs commented Mar 28, 2018

rxin commented Mar 28, 2018 via email

tgravescs commented Mar 28, 2018

jcuquemelle commented Apr 3, 2018

rxin commented Apr 6, 2018

jcuquemelle commented Apr 6, 2018 • edited Loading

Choose a reason for hiding this comment

tgravescs commented Apr 6, 2018

rxin commented Apr 9, 2018

tgravescs commented Apr 10, 2018

rxin commented Apr 10, 2018

tgravescs commented Apr 11, 2018

jcuquemelle commented Apr 12, 2018

rxin commented Apr 17, 2018

SparkQA commented Apr 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Apr 23, 2018

tgravescs commented Apr 24, 2018

jcuquemelle commented Dec 4, 2017 •

edited

Loading

jcuquemelle Mar 20, 2018 •

edited

Loading

jcuquemelle Mar 22, 2018 •

edited

Loading

jcuquemelle commented Apr 6, 2018 •

edited

Loading