-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-6707] [CORE][MESOS]: Mesos Scheduler should allow the user to specify constraints based on slave attributes #5563
Conversation
Can one of the admins verify this patch? |
import scala.collection.JavaConversions._ | ||
|
||
|
||
package object mesos { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't ever see usage of a package object in Spark, not sure we'd like to set a precedent here.
@andrewor14 is more familiar with the style I'll let him comment on this, but I'll recommend not doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do have package objects :)
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/package.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) i see, do you recommend using it like this? every package.scala seems to be just defining an object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's a good point. I think the convention elsewhere is that we define a XUtils
object and do XUtils.methodName()
for common methods (see Utils
, JettyUtils
, AkkaUtils
etc.). It might make more sense to do the same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added MesosUtils
. I am not a big fan of this name so if you have a better one, please let me know.
Btw why only apply constraints on fine grain mode? Why not coarse grain? |
Jenkins, this is ok to test, but we will need to rebase this to master to resolve the merge conflicts. |
Test build #30693 has started for PR 5563 at commit |
Thanks all, I'll make the changes and rebase the pull request. This is my first foray into the world of any "real" scala so really appreciate the feedback. |
Test FAILed. |
@tnachen - I added support to the coarse scheduler too. I had missed that one. |
Test FAILed. |
Test build #30703 has started for PR 5563 at commit |
Test build #30703 has finished for PR 5563 at commit
|
Test FAILed. |
Test FAILed. |
Test build #30693 has finished for PR 5563 at commit
|
Test PASSed. |
Test build #30944 has started for PR 5563 at commit |
Test build #30944 has finished for PR 5563 at commit
|
Test FAILed. |
Test build #30945 has started for PR 5563 at commit |
Can one of the admins verify this patch? |
import org.apache.spark.util.{AkkaUtils, Utils} | ||
import org.apache.spark.{SparkContext, SparkEnv, SparkException, TaskState} | ||
|
||
import scala.collection.mutable.{HashMap, HashSet} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scala imports needs to be after java imports according to spark style guide, please move in between java and org.apache imports.
// need at least 1 for executor, 1 for task | ||
cpus >= (mesosExecutorCores + scheduler.CPUS_PER_TASK)) || | ||
(slaveIdsWithExecutors.contains(slaveId) && | ||
cpus >= scheduler.CPUS_PER_TASK) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for rewriting this!! The old code is unbelievably dense.
@ankurcha thanks for spending the time on this feature. This patch is very well documented and refactors the mesos integration code in a way that makes sense. LGTM from the Spark side. Unfortunately I'm not as well-versed in Mesos as @tnachen @dragos are. Any other comments from your side? Have we done more testing after the latest changes? Should we add a TODO comment somewhere for more complex operators? |
@andrewor14 - I have addressed your comments in d83801c |
To me this looks good, I just didn't have the time to run it on our Mesos cluster again. I'll try to do so ASAP, but in the meantime maybe @tnachen or @deanwampler give it a go. |
One more thing after looking at the mesos code more closely (haven't really looked and touched attributes at all while working on mesos), we basically support ranges, scalar or text. Set is not supported for attributes, not sure why but attributes hasn't been touched since 2012. |
retest this please |
Merged build triggered. |
Merged build started. |
Test build #36518 has started for PR 5563 at commit |
Test build #36518 has finished for PR 5563 at commit
|
Merged build finished. Test PASSed. |
@tnachen - Yes i did test this on my cluster (3x master + 3x slaves ) for the calculate Pi example mentioned above. |
I tried this on a 2-node Mesos cluster. I confirm that I could use numeric values, and it worked as expected.
and my two slaves had an attribute |
This looks good to me! |
We've been running this patch in production for a few weeks now. Apart from the (now fixed) bug where it would not properly decline unused offers, we've not had any issues. |
LGTM2. I am merging this into master. Sorry to all other mesos patches that this one conflicts with! |
LGTM as well |
(There's a problem with the infra that prevents me from merging this. I'll try again in a few hours) |
Currently, the mesos scheduler only looks at the 'cpu' and 'mem' resources when trying to determine the usablility of a resource offer from a mesos slave node. It may be preferable for the user to be able to ensure that the spark jobs are only started on a certain set of nodes (based on attributes).
For example, If the user sets a property, let's say
spark.mesos.constraints
is set totachyon=true;us-east-1=false
, then the resource offers will be checked to see if they meet both these constraints and only then will be accepted to start new executors.