Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6707] [CORE][MESOS]: Mesos Scheduler should allow the user to specify constraints based on slave attributes #5563

Closed
wants to merge 24 commits into from

Conversation

ankurcha
Copy link
Contributor

Currently, the mesos scheduler only looks at the 'cpu' and 'mem' resources when trying to determine the usablility of a resource offer from a mesos slave node. It may be preferable for the user to be able to ensure that the spark jobs are only started on a certain set of nodes (based on attributes).

For example, If the user sets a property, let's say spark.mesos.constraints is set to tachyon=true;us-east-1=false, then the resource offers will be checked to see if they meet both these constraints and only then will be accepted to start new executors.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@ankurcha ankurcha changed the title [SPARK-6707] - Mesos Scheduler should allow the user to specify constraints based on slave attributes [CORE] SPARK-6707: Mesos Scheduler should allow the user to specify constraints based on slave attributes Apr 17, 2015
@ankurcha ankurcha changed the title [CORE] SPARK-6707: Mesos Scheduler should allow the user to specify constraints based on slave attributes [SPARK-6707] [core]: Mesos Scheduler should allow the user to specify constraints based on slave attributes Apr 17, 2015
import scala.collection.JavaConversions._


package object mesos {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't ever see usage of a package object in Spark, not sure we'd like to set a precedent here.
@andrewor14 is more familiar with the style I'll let him comment on this, but I'll recommend not doing this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) i see, do you recommend using it like this? every package.scala seems to be just defining an object

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a good point. I think the convention elsewhere is that we define a XUtils object and do XUtils.methodName() for common methods (see Utils, JettyUtils, AkkaUtils etc.). It might make more sense to do the same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added MesosUtils. I am not a big fan of this name so if you have a better one, please let me know.

@tnachen
Copy link
Contributor

tnachen commented Apr 21, 2015

Btw why only apply constraints on fine grain mode? Why not coarse grain?

@andrewor14
Copy link
Contributor

Jenkins, this is ok to test, but we will need to rebase this to master to resolve the merge conflicts.

@SparkQA
Copy link

SparkQA commented Apr 21, 2015

Test build #30693 has started for PR 5563 at commit 8895eab.

@ankurcha
Copy link
Contributor Author

Thanks all, I'll make the changes and rebase the pull request. This is my first foray into the world of any "real" scala so really appreciate the feedback.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30698/
Test FAILed.

@ankurcha
Copy link
Contributor Author

@tnachen - I added support to the coarse scheduler too. I had missed that one.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30699/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Apr 21, 2015

Test build #30703 has started for PR 5563 at commit 24e4793.

@SparkQA
Copy link

SparkQA commented Apr 21, 2015

Test build #30703 has finished for PR 5563 at commit 24e4793.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30703/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30701/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Apr 21, 2015

Test build #30693 has finished for PR 5563 at commit 8895eab.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.
  • This patch adds the following new dependencies:
    • commons-math3-3.1.1.jar
    • snappy-java-1.1.1.6.jar
  • This patch removes the following dependencies:
    • commons-math3-3.4.1.jar
    • snappy-java-1.1.1.7.jar

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30693/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Apr 24, 2015

Test build #30944 has started for PR 5563 at commit a039c08.

@SparkQA
Copy link

SparkQA commented Apr 24, 2015

Test build #30944 has finished for PR 5563 at commit a039c08.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class IDF extends Estimator[IDFModel] with IDFBase
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30944/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Apr 24, 2015

Test build #30945 has started for PR 5563 at commit b344392.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

import org.apache.spark.util.{AkkaUtils, Utils}
import org.apache.spark.{SparkContext, SparkEnv, SparkException, TaskState}

import scala.collection.mutable.{HashMap, HashSet}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scala imports needs to be after java imports according to spark style guide, please move in between java and org.apache imports.

// need at least 1 for executor, 1 for task
cpus >= (mesosExecutorCores + scheduler.CPUS_PER_TASK)) ||
(slaveIdsWithExecutors.contains(slaveId) &&
cpus >= scheduler.CPUS_PER_TASK)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for rewriting this!! The old code is unbelievably dense.

@andrewor14
Copy link
Contributor

@ankurcha thanks for spending the time on this feature. This patch is very well documented and refactors the mesos integration code in a way that makes sense. LGTM from the Spark side.

Unfortunately I'm not as well-versed in Mesos as @tnachen @dragos are. Any other comments from your side? Have we done more testing after the latest changes? Should we add a TODO comment somewhere for more complex operators?

@ankurcha
Copy link
Contributor Author

ankurcha commented Jul 2, 2015

@andrewor14 - I have addressed your comments in d83801c

@dragos
Copy link
Contributor

dragos commented Jul 3, 2015

To me this looks good, I just didn't have the time to run it on our Mesos cluster again. I'll try to do so ASAP, but in the meantime maybe @tnachen or @deanwampler give it a go.

@tnachen
Copy link
Contributor

tnachen commented Jul 3, 2015

One more thing after looking at the mesos code more closely (haven't really looked and touched attributes at all while working on mesos), we basically support ranges, scalar or text. Set is not supported for attributes, not sure why but attributes hasn't been touched since 2012.
I'll be updating the documentation in mesos. TODO for complex operators or even supporting more than text sounds fine with me. I haven't try out the PR yet, I can report back when I get to. @ankurcha I assume you tried this with a real mesos cluster as well right?

@andrewor14
Copy link
Contributor

retest this please

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jul 4, 2015

Test build #36518 has started for PR 5563 at commit 902535b.

@SparkQA
Copy link

SparkQA commented Jul 4, 2015

Test build #36518 has finished for PR 5563 at commit 902535b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@ankurcha
Copy link
Contributor Author

ankurcha commented Jul 6, 2015

@tnachen - Yes i did test this on my cluster (3x master + 3x slaves ) for the calculate Pi example mentioned above.

@dragos
Copy link
Contributor

dragos commented Jul 6, 2015

I tried this on a 2-node Mesos cluster. I confirm that I could use numeric values, and it worked as expected.

spark.mesos.constraints     color:2,3

and my two slaves had an attribute color:2 and 3 respectively. It correctly picked up both of them, and correctly picked up only one when I changed to constraint to be only one number.

@dragos
Copy link
Contributor

dragos commented Jul 6, 2015

This looks good to me!

@nollbit
Copy link

nollbit commented Jul 6, 2015

We've been running this patch in production for a few weeks now. Apart from the (now fixed) bug where it would not properly decline unused offers, we've not had any issues.

@andrewor14
Copy link
Contributor

LGTM2. I am merging this into master. Sorry to all other mesos patches that this one conflicts with!

@tnachen
Copy link
Contributor

tnachen commented Jul 6, 2015

LGTM as well

@andrewor14
Copy link
Contributor

(There's a problem with the infra that prevents me from merging this. I'll try again in a few hours)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants