-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-10749][MESOS] Support multiple roles with mesos cluster mode. #8872
Conversation
@dragos @andrewor14 PTAL |
Test build #42862 has finished for PR 8872 at commit
|
@@ -189,7 +189,7 @@ private[mesos] trait MesosSchedulerUtils extends Logging { | |||
val filteredResources = | |||
remainingResources.filter(r => r.getType != Value.Type.SCALAR || r.getScalar.getValue > 0.0) | |||
|
|||
(filteredResources.toList, requestedResources.toList) | |||
(filteredResources.toList.asJava, requestedResources.toList.asJava) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need toList.asJava
. asJava
is enough, and saves a round of copying.
The code looks good, but unfortunately I'm travelling and not able to run this on a Mesos cluster for end to end testing. Could you add a unit test? |
@tnachen this looks ok, but we need a better description. It's not clear what exactly is the problem, so not sure how to manually test it. Also, please add a unit test. |
8d61153
to
4e53922
Compare
@dragos added unit test and fixed desc and comments now. |
4e53922
to
44b2ad4
Compare
Test build #43695 has finished for PR 8872 at commit
|
new BlackHoleMesosClusterPersistenceEngineFactory, conf) { | ||
override def start(): Unit = { ready = true } | ||
} | ||
scheduler.start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lines up to this one are boilerplate and repeated in each test. Do you think you could refactor this in a way that makes it less repetitive?
@tnachen thanks very much, I will merge this PR in our private spark repo, and report issues ASAP if it does not work well. |
@tnachen can you address the comments? I would like to get this merged. Also it's still failing style tests. |
44b2ad4
to
ee6ad85
Compare
@andrewor14 @dragos all comments should be addressed now |
Test build #44794 has finished for PR 8872 at commit
|
ee6ad85
to
cb876cf
Compare
Test build #44796 has finished for PR 8872 at commit
|
cb876cf
to
7a16052
Compare
Test build #45958 has finished for PR 8872 at commit
|
@andrewor14 PTAL, it should be ready. |
Test build #50292 has finished for PR 8872 at commit
|
Test build #50293 has finished for PR 8872 at commit
|
0998f41
to
7a7daea
Compare
Test build #50326 has finished for PR 8872 at commit
|
Looks like there are more rules to scala style now, it's finally passing! @andrewor14 PTAL |
currentOffers: List[ResourceOffer], | ||
tasks: mutable.HashMap[OfferID, ArrayBuffer[TaskInfo]]): Unit = { | ||
currentOffers: JList[ResourceOffer], | ||
tasks: mutable.HashMap[OfferID, ArrayBuffer[TaskInfo]]): JList[ResourceOffer] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this return? You need to document this in the javadoc
@tnachen echoing my comment from another PR: it seems that this feature is already supported in client mode but not in cluster mode. Is there something we can do about this divergence in behavior? Is there some duplicate code to clean up so we don't run into something like this in the future? |
For the record, SPARK-10444 |
We already have some shared logic of using multiple resources from different roles, it just wasn't plugged in when I wrote the cluster scheduler. |
7a7daea
to
8630442
Compare
Test build #51312 has finished for PR 8872 at commit
|
@andrewor14 PTAL |
@@ -452,38 +456,42 @@ private[spark] class MesosClusterScheduler( | |||
* This method takes all the possible candidates and attempt to schedule them with Mesos offers. | |||
* Every time a new task is scheduled, the afterLaunchCallback is called to perform post scheduled | |||
* logic on each task. | |||
* | |||
* @return tasks Remaining resources after scheduling tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should proof-read your own comments. This shouldn't say @return tasks
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually you don't read the return value anywhere. You can keep this as Unit.
@tnachen I don't think this patch is quite ready to be merged yet; the code quality can still improve in a few places I pointed out in my comments. In the future it would be good if you can do a self-review first; I bet you'll be able to catch most of the things I pointed out in my review yourself. |
8630442
to
d8a2f71
Compare
d8a2f71
to
ebadaf3
Compare
retest this please |
Test build #51582 has finished for PR 8872 at commit
|
@andrewor14 Sorry for the mess up, I kept thinking the code was ready just needed to rebase and address comments. The rebase and comments did cause some style problems in the end, will be more careful next time. I took a pass again and I don't see anything any more. PTAL when you can. |
No worries, thanks for addressing them. I'm merging this into master. |
Currently the Mesos cluster dispatcher is not using offers from multiple roles correctly, as it simply aggregates all the offers resource values into one, but doesn't apply them correctly before calling the driver as Mesos needs the resources from the offers to be specified which role it originally belongs to. Multiple roles is already supported with fine/coarse grain scheduler, so porting that logic here to the cluster scheduler. https://issues.apache.org/jira/browse/SPARK-10749 Author: Timothy Chen <tnachen@gmail.com> Closes apache#8872 from tnachen/cluster_multi_roles.
Currently the Mesos cluster dispatcher is not using offers from multiple roles correctly, as it simply aggregates all the offers resource values into one, but doesn't apply them correctly before calling the driver as Mesos needs the resources from the offers to be specified which role it originally belongs to. Multiple roles is already supported with fine/coarse grain scheduler, so porting that logic here to the cluster scheduler. https://issues.apache.org/jira/browse/SPARK-10749 Author: Timothy Chen <tnachen@gmail.com> Closes apache#8872 from tnachen/cluster_multi_roles.
Currently the Mesos cluster dispatcher is not using offers from multiple roles correctly, as it simply aggregates all the offers resource values into one, but doesn't apply them correctly before calling the driver as Mesos needs the resources from the offers to be specified which role it originally belongs to. Multiple roles is already supported with fine/coarse grain scheduler, so porting that logic here to the cluster scheduler. https://issues.apache.org/jira/browse/SPARK-10749 Author: Timothy Chen <tnachen@gmail.com> Closes apache#8872 from tnachen/cluster_multi_roles.
Currently the Mesos cluster dispatcher is not using offers from multiple roles correctly, as it simply aggregates all the offers resource values into one, but doesn't apply them correctly before calling the driver as Mesos needs the resources from the offers to be specified which role it originally belongs to. Multiple roles is already supported with fine/coarse grain scheduler, so porting that logic here to the cluster scheduler.
https://issues.apache.org/jira/browse/SPARK-10749