Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-10749][MESOS] Support multiple roles with mesos cluster mode. #8872

Closed
wants to merge 2 commits into from

Conversation

tnachen
Copy link
Contributor

@tnachen tnachen commented Sep 22, 2015

Currently the Mesos cluster dispatcher is not using offers from multiple roles correctly, as it simply aggregates all the offers resource values into one, but doesn't apply them correctly before calling the driver as Mesos needs the resources from the offers to be specified which role it originally belongs to. Multiple roles is already supported with fine/coarse grain scheduler, so porting that logic here to the cluster scheduler.

https://issues.apache.org/jira/browse/SPARK-10749

@tnachen
Copy link
Contributor Author

tnachen commented Sep 22, 2015

@dragos @andrewor14 PTAL

@SparkQA
Copy link

SparkQA commented Sep 23, 2015

Test build #42862 has finished for PR 8872 at commit 211a3d2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class WriterThread(

@@ -189,7 +189,7 @@ private[mesos] trait MesosSchedulerUtils extends Logging {
val filteredResources =
remainingResources.filter(r => r.getType != Value.Type.SCALAR || r.getScalar.getValue > 0.0)

(filteredResources.toList, requestedResources.toList)
(filteredResources.toList.asJava, requestedResources.toList.asJava)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need toList.asJava. asJava is enough, and saves a round of copying.

@dragos
Copy link
Contributor

dragos commented Sep 30, 2015

The code looks good, but unfortunately I'm travelling and not able to run this on a Mesos cluster for end to end testing.

Could you add a unit test?

@dragos
Copy link
Contributor

dragos commented Oct 6, 2015

@tnachen this looks ok, but we need a better description. It's not clear what exactly is the problem, so not sure how to manually test it.

Also, please add a unit test.

@tnachen tnachen force-pushed the cluster_multi_roles branch 2 times, most recently from 8d61153 to 4e53922 Compare October 14, 2015 01:18
@tnachen
Copy link
Contributor Author

tnachen commented Oct 14, 2015

@dragos added unit test and fixed desc and comments now.

@SparkQA
Copy link

SparkQA commented Oct 14, 2015

Test build #43695 has finished for PR 8872 at commit 44b2ad4.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • def getPath = path.getOrElse(sys.error("Constructors must start at a class type"))
    • case class ExprId(id: Long, jvmId: UUID)
    • case class WrapOption(optionType: DataType, child: Expression)
    • class GenericArrayData(val array: Array[Any]) extends ArrayData
    • trait QueryExecutionListener
    • class ExecutionListenerManager extends Logging

new BlackHoleMesosClusterPersistenceEngineFactory, conf) {
override def start(): Unit = { ready = true }
}
scheduler.start()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lines up to this one are boilerplate and repeated in each test. Do you think you could refactor this in a way that makes it less repetitive?

@BrickXu
Copy link

BrickXu commented Oct 29, 2015

@tnachen thanks very much, I will merge this PR in our private spark repo, and report issues ASAP if it does not work well.
Thanks again!

@andrewor14
Copy link
Contributor

@tnachen can you address the comments? I would like to get this merged. Also it's still failing style tests.

@tnachen
Copy link
Contributor Author

tnachen commented Nov 2, 2015

@andrewor14 @dragos all comments should be addressed now

@SparkQA
Copy link

SparkQA commented Nov 2, 2015

Test build #44794 has finished for PR 8872 at commit ee6ad85.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * case class Corr(\n * case class Corr(left: Expression, right: Expression)\n * case class RepartitionByExpression(\n

@SparkQA
Copy link

SparkQA commented Nov 2, 2015

Test build #44796 has finished for PR 8872 at commit cb876cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * case class Corr(\n * case class Corr(left: Expression, right: Expression)\n * case class RepartitionByExpression(\n

@SparkQA
Copy link

SparkQA commented Nov 15, 2015

Test build #45958 has finished for PR 8872 at commit 7a16052.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tnachen
Copy link
Contributor Author

tnachen commented Nov 27, 2015

@andrewor14 PTAL, it should be ready.

@SparkQA
Copy link

SparkQA commented Jan 28, 2016

Test build #50292 has finished for PR 8872 at commit 0998f41.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 28, 2016

Test build #50293 has finished for PR 8872 at commit 0998f41.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 29, 2016

Test build #50326 has finished for PR 8872 at commit 7a7daea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tnachen
Copy link
Contributor Author

tnachen commented Jan 29, 2016

Looks like there are more rules to scala style now, it's finally passing! @andrewor14 PTAL

currentOffers: List[ResourceOffer],
tasks: mutable.HashMap[OfferID, ArrayBuffer[TaskInfo]]): Unit = {
currentOffers: JList[ResourceOffer],
tasks: mutable.HashMap[OfferID, ArrayBuffer[TaskInfo]]): JList[ResourceOffer] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this return? You need to document this in the javadoc

@andrewor14
Copy link
Contributor

@tnachen echoing my comment from another PR: it seems that this feature is already supported in client mode but not in cluster mode. Is there something we can do about this divergence in behavior? Is there some duplicate code to clean up so we don't run into something like this in the future?

@dragos
Copy link
Contributor

dragos commented Feb 2, 2016

For the record, SPARK-10444

@tnachen
Copy link
Contributor Author

tnachen commented Feb 14, 2016

We already have some shared logic of using multiple resources from different roles, it just wasn't plugged in when I wrote the cluster scheduler.
I think now we have a refactored coarse grained scheduler, we can take a step back and refactor both of these so a scheduler can have pluggable pieces when it comes to launching tasks and placing resources, etc. For now I think this PR is still valid that we just want to plug in the existing multiple role logic into cluster scheduler.

@SparkQA
Copy link

SparkQA commented Feb 15, 2016

Test build #51312 has finished for PR 8872 at commit 8630442.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tnachen
Copy link
Contributor Author

tnachen commented Feb 16, 2016

@andrewor14 PTAL

@@ -452,38 +456,42 @@ private[spark] class MesosClusterScheduler(
* This method takes all the possible candidates and attempt to schedule them with Mesos offers.
* Every time a new task is scheduled, the afterLaunchCallback is called to perform post scheduled
* logic on each task.
*
* @return tasks Remaining resources after scheduling tasks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should proof-read your own comments. This shouldn't say @return tasks...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually you don't read the return value anywhere. You can keep this as Unit.

@andrewor14
Copy link
Contributor

@tnachen I don't think this patch is quite ready to be merged yet; the code quality can still improve in a few places I pointed out in my comments. In the future it would be good if you can do a self-review first; I bet you'll be able to catch most of the things I pointed out in my review yourself.

@tnachen
Copy link
Contributor Author

tnachen commented Feb 19, 2016

retest this please

@SparkQA
Copy link

SparkQA commented Feb 20, 2016

Test build #51582 has finished for PR 8872 at commit ebadaf3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tnachen
Copy link
Contributor Author

tnachen commented Feb 20, 2016

@andrewor14 Sorry for the mess up, I kept thinking the code was ready just needed to rebase and address comments. The rebase and comments did cause some style problems in the end, will be more careful next time. I took a pass again and I don't see anything any more. PTAL when you can.

@andrewor14
Copy link
Contributor

No worries, thanks for addressing them. I'm merging this into master.

@asfgit asfgit closed this in 00461bb Feb 22, 2016
sttts pushed a commit to mesosphere/spark that referenced this pull request Mar 15, 2016
Currently the Mesos cluster dispatcher is not using offers from multiple roles correctly, as it simply aggregates all the offers resource values into one, but doesn't apply them correctly before calling the driver as Mesos needs the resources from the offers to be specified which role it originally belongs to. Multiple roles is already supported with fine/coarse grain scheduler, so porting that logic here to the cluster scheduler.

https://issues.apache.org/jira/browse/SPARK-10749

Author: Timothy Chen <tnachen@gmail.com>

Closes apache#8872 from tnachen/cluster_multi_roles.
mgummelt pushed a commit to mesosphere/spark that referenced this pull request Aug 2, 2016
Currently the Mesos cluster dispatcher is not using offers from multiple roles correctly, as it simply aggregates all the offers resource values into one, but doesn't apply them correctly before calling the driver as Mesos needs the resources from the offers to be specified which role it originally belongs to. Multiple roles is already supported with fine/coarse grain scheduler, so porting that logic here to the cluster scheduler.

https://issues.apache.org/jira/browse/SPARK-10749

Author: Timothy Chen <tnachen@gmail.com>

Closes apache#8872 from tnachen/cluster_multi_roles.
mgummelt pushed a commit to mesosphere/spark that referenced this pull request Mar 7, 2017
Currently the Mesos cluster dispatcher is not using offers from multiple roles correctly, as it simply aggregates all the offers resource values into one, but doesn't apply them correctly before calling the driver as Mesos needs the resources from the offers to be specified which role it originally belongs to. Multiple roles is already supported with fine/coarse grain scheduler, so porting that logic here to the cluster scheduler.

https://issues.apache.org/jira/browse/SPARK-10749

Author: Timothy Chen <tnachen@gmail.com>

Closes apache#8872 from tnachen/cluster_multi_roles.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants