-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23285][K8S] Allow fractional values for spark.executor.cores #20460
Conversation
K8s treats CPU as a "compressible resource" and can actually assign millicpus to individual containers, e.g., 0.1 or 100m. In https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala#L94, we already parse `spark.executor.cores` as a double value. This PR simply bypasses the check for integral values for the property in Kubernetes mode.
// the Kubernetes mode. | ||
if (!master.startsWith("k8s") | ||
&& executorCores != null | ||
&& Try(executorCores.toInt).getOrElse(-1) <= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be nested if..()
statements?
That's more readable IMO but not sure what's considered more idiomatic here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the intent here is just to check that it's positive, then just try to get it as a double and check it. This doesn't require making the logic dependent on a particular resource manager. Of course it relaxes the condition from what it is now, which would reject input like "1.5". But it would reject it not with a graceful exit but an exception, so I kind of think it's not what this code intends to catch anyway right now. (It would cause an error pretty quickly in YARN et al anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm modulo one question
Kubernetes integration test starting |
Kubernetes integration test status success |
docs/configuration.md
Outdated
standalone and Mesos coarse-grained modes. | ||
</td> | ||
<td> | ||
The number of cores to use on each executor. | ||
|
||
In standalone and Mesos coarse-grained modes, for more detail, see | ||
<a href="spark-standalone.html#Executors Scheduling">this description</a>. | ||
<a href="spark-standalone.html#Executors Scheduling">this description</a>. In Kubernetes mode, | ||
a fractional value can be used, e.g., 0.1 or 100m. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100m isn't fractional though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right. Fixed.
Test build #86894 has finished for PR 20460 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #86890 has finished for PR 20460 at commit
|
Test build #86895 has finished for PR 20460 at commit
|
Jenkins, retest this please |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #86915 has finished for PR 20460 at commit
|
@@ -267,7 +267,7 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S | |||
&& Try(JavaUtils.byteStringAsBytes(executorMemory)).getOrElse(-1L) <= 0) { | |||
SparkSubmit.printErrorAndExit("Executor Memory cores must be a positive number") | |||
} | |||
if (executorCores != null && Try(executorCores.toInt).getOrElse(-1) <= 0) { | |||
if (executorCores != null && Try(executorCores.toDouble).getOrElse(-1d) <= 0d) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I think "0.0" is more readable than "0d", don't bother changing it here. Just an aside
Test build #4088 has finished for PR 20460 at commit
|
Do we also want to update the comment of |
Actually I think this may fail some check (though may not throw exceptions) for instance this one: spark/core/src/main/scala/org/apache/spark/SparkConf.scala Lines 553 to 562 in 9678941
|
@jiangxb1987 it seems |
Yea, we should either move the check to elsewhere or modify it, to avoid potential failure. |
@jiangxb1987 fixed the check in 44e489e. |
Kubernetes integration test starting |
Kubernetes integration test status success |
I think here (
Besides, I think "spark.task.cpus" should also be fixed, right? BTW, looks like dynamic allocation can not be well supported if cores are fractional. |
Test build #86949 has finished for PR 20460 at commit
|
@felixcheung is it too risky to target to 2.3, this is a fundamental behavior change. We should make sure k8s could well use fractional cores, in the mean time we should also guard other cluster manager to use it. I think we need more tests on it. |
This may require more changes than it appears to be, agree we may need more tests to ensure it don't break anything, so +1 on target it to 2.4 instead of 2.3 |
Agreed. It seems much more complicated than expected, considering the interaction between |
ah, sounds like this is more impactful than we thought |
@jiangxb1987 @felixcheung @jerryshao @srowen this PR has been updated to allow both |
Kubernetes integration test starting |
Test build #86992 has finished for PR 20460 at commit
|
Kubernetes integration test status success |
Test build #86993 has finished for PR 20460 at commit
|
Test build #86995 has finished for PR 20460 at commit
|
Test build #86998 has finished for PR 20460 at commit
|
Just realized that the update I did still won't work when dynamic resource allocation is enabled. So please ignore the update. This is definitely much more impactful than I thought. |
I would suggest to bring out a discussion or even a design on dev mail list before doing such ground changing. This may affect not only dynamic allocation, but also scheduler. It is better to collect all the feedbacks (especially those who works on scheduler side). |
Agreed. This is a fundamental change to the way Spark handles task scheduling, task parallelism, and dynamic resource allocation, etc., and it impacts every scheduler backends. I'm closing this PR for now. Thanks for reviewing and giving feedbacks! |
I suggest to introduce a new property |
What changes were proposed in this pull request?
K8s treats CPU as a "compressible resource" and can actually assign millicpus to individual containers, e.g., 0.1 (100 millicpus). In https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala#L94, we already parse
spark.executor.cores
as a double value. This PR enables use of fractional values forspark.executor.cores
. It also allows fractional values forspark.task.cpus
correspondingly. The various places wherespark.executor.cores
andspark.task.cpus
are used are updated to do a conversion from double to integer values. The conversion is needed to not introduce a significant behavioral change to the way the other backends deal with executor cores. The maximum tasks per executor is now calculated asspark.executor.cores
/spark.task.cpus
rounded to an integer using theFLOOR
mode.How was this patch tested?
Manual tests.
@foxish @vanzin