[SPARK-6707] [CORE][MESOS]: Mesos Scheduler should allow the user to specify constraints based on slave attributes #5563

ankurcha · 2015-04-17T21:10:07Z

Currently, the mesos scheduler only looks at the 'cpu' and 'mem' resources when trying to determine the usablility of a resource offer from a mesos slave node. It may be preferable for the user to be able to ensure that the spark jobs are only started on a certain set of nodes (based on attributes).

For example, If the user sets a property, let's say spark.mesos.constraints is set to tachyon=true;us-east-1=false, then the resource offers will be checked to see if they meet both these constraints and only then will be accepted to start new executors.

AmplabJenkins · 2015-04-17T21:12:11Z

Can one of the admins verify this patch?

tnachen · 2015-04-21T19:58:44Z

core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/package.scala

+import scala.collection.JavaConversions._
+
+
+package object mesos {


I don't ever see usage of a package object in Spark, not sure we'd like to set a precedent here.
@andrewor14 is more familiar with the style I'll let him comment on this, but I'll recommend not doing this.

we do have package objects :)
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/package.scala

:) i see, do you recommend using it like this? every package.scala seems to be just defining an object

Yeah that's a good point. I think the convention elsewhere is that we define a XUtils object and do XUtils.methodName() for common methods (see Utils, JettyUtils, AkkaUtils etc.). It might make more sense to do the same here.

Added MesosUtils. I am not a big fan of this name so if you have a better one, please let me know.

tnachen · 2015-04-21T20:04:01Z

Btw why only apply constraints on fine grain mode? Why not coarse grain?

andrewor14 · 2015-04-21T20:44:16Z

Jenkins, this is ok to test, but we will need to rebase this to master to resolve the merge conflicts.

SparkQA · 2015-04-21T20:47:53Z

Test build #30693 has started for PR 5563 at commit 8895eab.

ankurcha · 2015-04-21T20:56:39Z

Thanks all, I'll make the changes and rebase the pull request. This is my first foray into the world of any "real" scala so really appreciate the feedback.

AmplabJenkins · 2015-04-21T21:52:21Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30698/
Test FAILed.

ankurcha · 2015-04-21T21:53:48Z

@tnachen - I added support to the coarse scheduler too. I had missed that one.

AmplabJenkins · 2015-04-21T21:57:22Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30699/
Test FAILed.

SparkQA · 2015-04-21T21:58:42Z

Test build #30703 has started for PR 5563 at commit 24e4793.

SparkQA · 2015-04-21T21:58:48Z

Test build #30703 has finished for PR 5563 at commit 24e4793.

This patch fails RAT tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

AmplabJenkins · 2015-04-21T21:58:49Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30703/
Test FAILed.

AmplabJenkins · 2015-04-21T22:02:22Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30701/
Test FAILed.

SparkQA · 2015-04-21T23:02:10Z

Test build #30693 has finished for PR 5563 at commit 8895eab.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.
This patch adds the following new dependencies:
- commons-math3-3.1.1.jar
- snappy-java-1.1.1.6.jar
This patch removes the following dependencies:
- commons-math3-3.4.1.jar
- snappy-java-1.1.1.7.jar

AmplabJenkins · 2015-04-21T23:02:14Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30693/
Test PASSed.

SparkQA · 2015-04-24T17:08:38Z

Test build #30944 has started for PR 5563 at commit a039c08.

SparkQA · 2015-04-24T17:08:44Z

Test build #30944 has finished for PR 5563 at commit a039c08.

This patch fails RAT tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- final class IDF extends Estimator[IDFModel] with IDFBase
This patch does not change any dependencies.

AmplabJenkins · 2015-04-24T17:08:45Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30944/
Test FAILed.

SparkQA · 2015-04-24T17:23:41Z

Test build #30945 has started for PR 5563 at commit b344392.

AmplabJenkins · 2015-04-27T18:18:28Z

Can one of the admins verify this patch?

tnachen · 2015-05-15T07:59:06Z

core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala

 import org.apache.spark.util.{AkkaUtils, Utils}
 import org.apache.spark.{SparkContext, SparkEnv, SparkException, TaskState}

+import scala.collection.mutable.{HashMap, HashSet}


scala imports needs to be after java imports according to spark style guide, please move in between java and org.apache imports.

andrewor14 · 2015-07-01T23:43:51Z

core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala

-          // need at least 1 for executor, 1 for task
-          cpus >= (mesosExecutorCores + scheduler.CPUS_PER_TASK)) ||
-          (slaveIdsWithExecutors.contains(slaveId) &&
-            cpus >= scheduler.CPUS_PER_TASK)


thanks for rewriting this!! The old code is unbelievably dense.

andrewor14 · 2015-07-02T00:09:08Z

@ankurcha thanks for spending the time on this feature. This patch is very well documented and refactors the mesos integration code in a way that makes sense. LGTM from the Spark side.

Unfortunately I'm not as well-versed in Mesos as @tnachen @dragos are. Any other comments from your side? Have we done more testing after the latest changes? Should we add a TODO comment somewhere for more complex operators?

ankurcha · 2015-07-02T01:12:13Z

@andrewor14 - I have addressed your comments in d83801c

dragos · 2015-07-03T15:19:32Z

To me this looks good, I just didn't have the time to run it on our Mesos cluster again. I'll try to do so ASAP, but in the meantime maybe @tnachen or @deanwampler give it a go.

tnachen · 2015-07-03T21:13:33Z

One more thing after looking at the mesos code more closely (haven't really looked and touched attributes at all while working on mesos), we basically support ranges, scalar or text. Set is not supported for attributes, not sure why but attributes hasn't been touched since 2012.
I'll be updating the documentation in mesos. TODO for complex operators or even supporting more than text sounds fine with me. I haven't try out the PR yet, I can report back when I get to. @ankurcha I assume you tried this with a real mesos cluster as well right?

andrewor14 · 2015-07-04T01:25:22Z

retest this please

AmplabJenkins · 2015-07-04T01:28:10Z

Merged build triggered.

AmplabJenkins · 2015-07-04T01:28:19Z

Merged build started.

SparkQA · 2015-07-04T01:31:33Z

Test build #36518 has started for PR 5563 at commit 902535b.

SparkQA · 2015-07-04T03:38:08Z

Test build #36518 has finished for PR 5563 at commit 902535b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-07-04T03:38:54Z

Merged build finished. Test PASSed.

ankurcha · 2015-07-06T02:06:56Z

@tnachen - Yes i did test this on my cluster (3x master + 3x slaves ) for the calculate Pi example mentioned above.

dragos · 2015-07-06T10:15:20Z

I tried this on a 2-node Mesos cluster. I confirm that I could use numeric values, and it worked as expected.

spark.mesos.constraints     color:2,3

and my two slaves had an attribute color:2 and 3 respectively. It correctly picked up both of them, and correctly picked up only one when I changed to constraint to be only one number.

dragos · 2015-07-06T10:22:00Z

This looks good to me!

nollbit · 2015-07-06T10:25:33Z

We've been running this patch in production for a few weeks now. Apart from the (now fixed) bug where it would not properly decline unused offers, we've not had any issues.

andrewor14 · 2015-07-06T18:45:07Z

LGTM2. I am merging this into master. Sorry to all other mesos patches that this one conflicts with!

tnachen · 2015-07-06T18:50:22Z

LGTM as well

andrewor14 · 2015-07-06T20:18:12Z

(There's a problem with the infra that prevents me from merging this. I'll try again in a few hours)

ankurcha changed the title ~~[SPARK-6707] - Mesos Scheduler should allow the user to specify constraints based on slave attributes~~ [CORE] SPARK-6707: Mesos Scheduler should allow the user to specify constraints based on slave attributes Apr 17, 2015

ankurcha changed the title ~~[CORE] SPARK-6707: Mesos Scheduler should allow the user to specify constraints based on slave attributes~~ [SPARK-6707] [core]: Mesos Scheduler should allow the user to specify constraints based on slave attributes Apr 17, 2015

tnachen reviewed Apr 21, 2015
View reviewed changes

ankurcha force-pushed the mesos_attribs branch from 3d38ff7 to 2a100a8 Compare April 21, 2015 21:42

tnachen reviewed May 15, 2015
View reviewed changes

andrewor14 reviewed Jul 1, 2015
View reviewed changes

Update code as per code review comments

d83801c

Fix line length

902535b

asfgit closed this in 1165b17 Jul 6, 2015

atongen mentioned this pull request Jan 27, 2016

[SPARK-12832][MESOS] mesos scheduler respect agent attributes #10949

Closed

		import scala.collection.JavaConversions._


		package object mesos {

[SPARK-6707] [CORE][MESOS]: Mesos Scheduler should allow the user to specify constraints based on slave attributes #5563

[SPARK-6707] [CORE][MESOS]: Mesos Scheduler should allow the user to specify constraints based on slave attributes #5563

Conversation

ankurcha commented Apr 17, 2015

AmplabJenkins commented Apr 17, 2015

tnachen Apr 21, 2015

Choose a reason for hiding this comment

andrewor14 Apr 21, 2015

Choose a reason for hiding this comment

tnachen Apr 21, 2015

Choose a reason for hiding this comment

andrewor14 Apr 21, 2015

Choose a reason for hiding this comment

ankurcha Apr 21, 2015

Choose a reason for hiding this comment

tnachen commented Apr 21, 2015

andrewor14 commented Apr 21, 2015

SparkQA commented Apr 21, 2015

ankurcha commented Apr 21, 2015

AmplabJenkins commented Apr 21, 2015

ankurcha commented Apr 21, 2015

AmplabJenkins commented Apr 21, 2015

SparkQA commented Apr 21, 2015

SparkQA commented Apr 21, 2015

AmplabJenkins commented Apr 21, 2015

AmplabJenkins commented Apr 21, 2015

SparkQA commented Apr 21, 2015

AmplabJenkins commented Apr 21, 2015

SparkQA commented Apr 24, 2015

SparkQA commented Apr 24, 2015

AmplabJenkins commented Apr 24, 2015

SparkQA commented Apr 24, 2015

AmplabJenkins commented Apr 27, 2015

tnachen May 15, 2015

Choose a reason for hiding this comment

andrewor14 Jul 1, 2015

Choose a reason for hiding this comment

andrewor14 commented Jul 2, 2015

ankurcha commented Jul 2, 2015

dragos commented Jul 3, 2015

tnachen commented Jul 3, 2015

andrewor14 commented Jul 4, 2015

AmplabJenkins commented Jul 4, 2015

AmplabJenkins commented Jul 4, 2015

SparkQA commented Jul 4, 2015

SparkQA commented Jul 4, 2015

AmplabJenkins commented Jul 4, 2015

ankurcha commented Jul 6, 2015

dragos commented Jul 6, 2015

dragos commented Jul 6, 2015

nollbit commented Jul 6, 2015

andrewor14 commented Jul 6, 2015

tnachen commented Jul 6, 2015

andrewor14 commented Jul 6, 2015