[SPARK-10749][MESOS] Support multiple roles with mesos cluster mode. #8872

tnachen · 2015-09-22T21:31:01Z

Currently the Mesos cluster dispatcher is not using offers from multiple roles correctly, as it simply aggregates all the offers resource values into one, but doesn't apply them correctly before calling the driver as Mesos needs the resources from the offers to be specified which role it originally belongs to. Multiple roles is already supported with fine/coarse grain scheduler, so porting that logic here to the cluster scheduler.

https://issues.apache.org/jira/browse/SPARK-10749

tnachen · 2015-09-22T21:31:23Z

@dragos @andrewor14 PTAL

SparkQA · 2015-09-23T00:14:58Z

Test build #42862 has finished for PR 8872 at commit 211a3d2.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class WriterThread(

dragos · 2015-09-30T17:53:43Z

core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala

@@ -189,7 +189,7 @@ private[mesos] trait MesosSchedulerUtils extends Logging {
    val filteredResources =
      remainingResources.filter(r => r.getType != Value.Type.SCALAR || r.getScalar.getValue > 0.0)

-    (filteredResources.toList, requestedResources.toList)
+    (filteredResources.toList.asJava, requestedResources.toList.asJava)


You don't need toList.asJava. asJava is enough, and saves a round of copying.

dragos · 2015-09-30T17:56:45Z

The code looks good, but unfortunately I'm travelling and not able to run this on a Mesos cluster for end to end testing.

Could you add a unit test?

dragos · 2015-10-06T10:04:31Z

@tnachen this looks ok, but we need a better description. It's not clear what exactly is the problem, so not sure how to manually test it.

Also, please add a unit test.

tnachen · 2015-10-14T01:35:27Z

@dragos added unit test and fixed desc and comments now.

SparkQA · 2015-10-14T01:41:22Z

Test build #43695 has finished for PR 8872 at commit 44b2ad4.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- def getPath = path.getOrElse(sys.error("Constructors must start at a class type"))
- case class ExprId(id: Long, jvmId: UUID)
- case class WrapOption(optionType: DataType, child: Expression)
- class GenericArrayData(val array: Array[Any]) extends ArrayData
- trait QueryExecutionListener
- class ExecutionListenerManager extends Logging

dragos · 2015-10-22T08:46:51Z

core/src/test/scala/org/apache/spark/scheduler/mesos/MesosClusterSchedulerSuite.scala

+      new BlackHoleMesosClusterPersistenceEngineFactory, conf) {
+      override def start(): Unit = { ready = true }
+    }
+    scheduler.start()


The lines up to this one are boilerplate and repeated in each test. Do you think you could refactor this in a way that makes it less repetitive?

BrickXu · 2015-10-29T08:49:45Z

@tnachen thanks very much, I will merge this PR in our private spark repo, and report issues ASAP if it does not work well.
Thanks again!

andrewor14 · 2015-10-29T11:25:10Z

@tnachen can you address the comments? I would like to get this merged. Also it's still failing style tests.

tnachen · 2015-11-02T05:26:57Z

@andrewor14 @dragos all comments should be addressed now

SparkQA · 2015-11-02T05:33:34Z

Test build #44794 has finished for PR 8872 at commit ee6ad85.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * case class Corr(\n * case class Corr(left: Expression, right: Expression)\n * case class RepartitionByExpression(\n

SparkQA · 2015-11-02T08:06:39Z

Test build #44796 has finished for PR 8872 at commit cb876cf.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * case class Corr(\n * case class Corr(left: Expression, right: Expression)\n * case class RepartitionByExpression(\n

SparkQA · 2015-11-15T21:52:43Z

Test build #45958 has finished for PR 8872 at commit 7a16052.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tnachen · 2015-11-27T08:53:35Z

@andrewor14 PTAL, it should be ready.

SparkQA · 2016-01-28T20:10:36Z

Test build #50292 has finished for PR 8872 at commit 0998f41.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-28T20:15:09Z

Test build #50293 has finished for PR 8872 at commit 0998f41.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-29T04:03:25Z

Test build #50326 has finished for PR 8872 at commit 7a7daea.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tnachen · 2016-01-29T06:23:36Z

Looks like there are more rules to scala style now, it's finally passing! @andrewor14 PTAL

andrewor14 · 2016-02-01T21:06:11Z

core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala

-      currentOffers: List[ResourceOffer],
-      tasks: mutable.HashMap[OfferID, ArrayBuffer[TaskInfo]]): Unit = {
+      currentOffers: JList[ResourceOffer],
+      tasks: mutable.HashMap[OfferID, ArrayBuffer[TaskInfo]]): JList[ResourceOffer] = {


what does this return? You need to document this in the javadoc

andrewor14 · 2016-02-01T21:15:15Z

@tnachen echoing my comment from another PR: it seems that this feature is already supported in client mode but not in cluster mode. Is there something we can do about this divergence in behavior? Is there some duplicate code to clean up so we don't run into something like this in the future?

dragos · 2016-02-02T10:35:53Z

For the record, SPARK-10444

tnachen · 2016-02-14T14:23:42Z

We already have some shared logic of using multiple resources from different roles, it just wasn't plugged in when I wrote the cluster scheduler.
I think now we have a refactored coarse grained scheduler, we can take a step back and refactor both of these so a scheduler can have pluggable pieces when it comes to launching tasks and placing resources, etc. For now I think this PR is still valid that we just want to plug in the existing multiple role logic into cluster scheduler.

SparkQA · 2016-02-15T16:46:10Z

Test build #51312 has finished for PR 8872 at commit 8630442.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tnachen · 2016-02-16T00:59:56Z

@andrewor14 PTAL

andrewor14 · 2016-02-19T22:48:43Z

core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala

@@ -452,38 +456,42 @@ private[spark] class MesosClusterScheduler(
   * This method takes all the possible candidates and attempt to schedule them with Mesos offers.
   * Every time a new task is scheduled, the afterLaunchCallback is called to perform post scheduled
   * logic on each task.
+   *
+   * @return tasks Remaining resources after scheduling tasks


you should proof-read your own comments. This shouldn't say @return tasks...

actually you don't read the return value anywhere. You can keep this as Unit.

andrewor14 · 2016-02-19T23:05:47Z

@tnachen I don't think this patch is quite ready to be merged yet; the code quality can still improve in a few places I pointed out in my comments. In the future it would be good if you can do a self-review first; I bet you'll be able to catch most of the things I pointed out in my review yourself.

tnachen · 2016-02-19T23:41:13Z

retest this please

SparkQA · 2016-02-20T02:10:07Z

Test build #51582 has finished for PR 8872 at commit ebadaf3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tnachen · 2016-02-20T07:27:06Z

@andrewor14 Sorry for the mess up, I kept thinking the code was ready just needed to rebase and address comments. The rebase and comments did cause some style problems in the end, will be more careful next time. I took a pass again and I don't see anything any more. PTAL when you can.

andrewor14 · 2016-02-22T19:10:42Z

No worries, thanks for addressing them. I'm merging this into master.

Currently the Mesos cluster dispatcher is not using offers from multiple roles correctly, as it simply aggregates all the offers resource values into one, but doesn't apply them correctly before calling the driver as Mesos needs the resources from the offers to be specified which role it originally belongs to. Multiple roles is already supported with fine/coarse grain scheduler, so porting that logic here to the cluster scheduler. https://issues.apache.org/jira/browse/SPARK-10749 Author: Timothy Chen <tnachen@gmail.com> Closes apache#8872 from tnachen/cluster_multi_roles.

dragos reviewed Sep 30, 2015
View reviewed changes

tnachen mentioned this pull request Oct 5, 2015

[SPARK-6284][MESOS] Add mesos role, principal and secret #4960

Closed

tnachen force-pushed the cluster_multi_roles branch 2 times, most recently from 8d61153 to 4e53922 Compare October 14, 2015 01:18

tnachen force-pushed the cluster_multi_roles branch from 4e53922 to 44b2ad4 Compare October 14, 2015 01:36

dragos reviewed Oct 22, 2015
View reviewed changes

tnachen force-pushed the cluster_multi_roles branch from 44b2ad4 to ee6ad85 Compare November 2, 2015 05:26

tnachen force-pushed the cluster_multi_roles branch from ee6ad85 to cb876cf Compare November 2, 2015 05:59

mgummelt mentioned this pull request Nov 3, 2015

spark 1.5.0 with patch for multiple roles mesosphere/universe#245

Merged

BrickXu mentioned this pull request Nov 11, 2015

Set principal value when build FrameworkInfo mesosphere-backup/hdfs-deprecated#229

Open

tnachen force-pushed the cluster_multi_roles branch from cb876cf to 7a16052 Compare November 15, 2015 19:37

tnachen force-pushed the cluster_multi_roles branch from 0998f41 to 7a7daea Compare January 29, 2016 01:53

andrewor14 reviewed Feb 1, 2016
View reviewed changes

tnachen force-pushed the cluster_multi_roles branch from 7a7daea to 8630442 Compare February 15, 2016 14:33

andrewor14 reviewed Feb 19, 2016
View reviewed changes

tnachen force-pushed the cluster_multi_roles branch from 8630442 to d8a2f71 Compare February 19, 2016 23:21

tnachen added 2 commits February 19, 2016 15:21

Support multiple roles with mesos cluster mode.

934727b

Add unit test.

ebadaf3

tnachen force-pushed the cluster_multi_roles branch from d8a2f71 to ebadaf3 Compare February 19, 2016 23:21

asfgit closed this in 00461bb Feb 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-10749][MESOS] Support multiple roles with mesos cluster mode. #8872

[SPARK-10749][MESOS] Support multiple roles with mesos cluster mode. #8872

tnachen commented Sep 22, 2015

tnachen commented Sep 22, 2015

SparkQA commented Sep 23, 2015

dragos Sep 30, 2015

dragos commented Sep 30, 2015

dragos commented Oct 6, 2015

tnachen commented Oct 14, 2015

SparkQA commented Oct 14, 2015

dragos Oct 22, 2015

BrickXu commented Oct 29, 2015

andrewor14 commented Oct 29, 2015

tnachen commented Nov 2, 2015

SparkQA commented Nov 2, 2015

SparkQA commented Nov 2, 2015

SparkQA commented Nov 15, 2015

tnachen commented Nov 27, 2015

SparkQA commented Jan 28, 2016

SparkQA commented Jan 28, 2016

SparkQA commented Jan 29, 2016

tnachen commented Jan 29, 2016

andrewor14 Feb 1, 2016

andrewor14 commented Feb 1, 2016

dragos commented Feb 2, 2016

tnachen commented Feb 14, 2016

SparkQA commented Feb 15, 2016

tnachen commented Feb 16, 2016

andrewor14 Feb 19, 2016

andrewor14 Feb 19, 2016

andrewor14 commented Feb 19, 2016

tnachen commented Feb 19, 2016

SparkQA commented Feb 20, 2016

tnachen commented Feb 20, 2016

andrewor14 commented Feb 22, 2016

[SPARK-10749][MESOS] Support multiple roles with mesos cluster mode. #8872

[SPARK-10749][MESOS] Support multiple roles with mesos cluster mode. #8872

Conversation

tnachen commented Sep 22, 2015

tnachen commented Sep 22, 2015

SparkQA commented Sep 23, 2015

dragos Sep 30, 2015

Choose a reason for hiding this comment

dragos commented Sep 30, 2015

dragos commented Oct 6, 2015

tnachen commented Oct 14, 2015

SparkQA commented Oct 14, 2015

dragos Oct 22, 2015

Choose a reason for hiding this comment

BrickXu commented Oct 29, 2015

andrewor14 commented Oct 29, 2015

tnachen commented Nov 2, 2015

SparkQA commented Nov 2, 2015

SparkQA commented Nov 2, 2015

SparkQA commented Nov 15, 2015

tnachen commented Nov 27, 2015

SparkQA commented Jan 28, 2016

SparkQA commented Jan 28, 2016

SparkQA commented Jan 29, 2016

tnachen commented Jan 29, 2016

andrewor14 Feb 1, 2016

Choose a reason for hiding this comment

andrewor14 commented Feb 1, 2016

dragos commented Feb 2, 2016

tnachen commented Feb 14, 2016

SparkQA commented Feb 15, 2016

tnachen commented Feb 16, 2016

andrewor14 Feb 19, 2016

Choose a reason for hiding this comment

andrewor14 Feb 19, 2016

Choose a reason for hiding this comment

andrewor14 commented Feb 19, 2016

tnachen commented Feb 19, 2016

SparkQA commented Feb 20, 2016

tnachen commented Feb 20, 2016

andrewor14 commented Feb 22, 2016