-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-6284][MESOS] Add mesos role, principal and secret #4960
Conversation
LGTM, from a Mesos perspective. 👍 |
(JIRA please?) |
Test build #28424 has finished for PR 4960 at commit
|
Have a look at the other PRs -- write |
Test build #28425 has finished for PR 4960 at commit
|
@srowen Sorry missed your notification, opened a JIRA now |
How have you tested this? I get ExecutorLostFailure and tasks are not launched on mesos slave configured with Slave resources: cpus(prod):2; cpus(dev):2; mem():29655; disk():745874; ports(*):[31000-32000] and trying to run for e.g. |
Also if Slave resources are all of the default type, i.e. *. The framework should be still be able to use those resources even with spark.mesos.role != * |
@realoptimal you did indeed found a problem about roles, I only tried it with seeing the framework registered with the right role and tasks launched, but didn't try it in your case where multiple roles with different resources and no wildcard resources are available. The scheduler currently just uses * in the role everywhere. I'm having a fix and will push a fix plus tests to this PR, thanks again! |
@realoptimal Can you try again? I've updated the PR that includes test and I've tested myself with fine-grained and coarse-grained mode, and it is correctly launching the tasks. |
Test build #28715 has finished for PR 4960 at commit
|
@tnachen You are much faster than I am; was working on modifying your previous code to dole out reserved offers first and then to use unreserved offers (those with role="*"), but you beat me to it. Tested your code with my configuration and this works when you have specified spark.mesos.role in the spark-defaults.conf file. Not sure why it won't pick it up via --conf on the command line but I think that is an issue with SparkSubmit, not in your your code changes. |
@realoptimal thanks for testing this. We can file another issue for the --conf problem, and I'm sure there are more rough edges to smooth out. @andrewor14 can you take a look of this PR? |
Test build #28958 timed out for PR 4960 at commit |
retest this please. This needs to be rebased to master |
<td>Role for the Spark framework</td> | ||
<td> | ||
Set the role of this Spark framework for Mesos. Roles are used in Mesos for reservations | ||
and resource weigth sharing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weight
retest this please |
Test build #14 has finished for PR 4960 at commit
|
Test build #37220 has finished for PR 4960 at commit
|
@andrewor14 I think I figured out what's going on with the test. It's hard to figure out from the logs since it was just hitting a System.exit(1). Hopefully this goes through! |
Test build #37431 has finished for PR 4960 at commit
|
@@ -183,6 +186,18 @@ private[spark] class MesosSchedulerBackend( | |||
|
|||
override def reregistered(d: SchedulerDriver, masterInfo: MasterInfo) {} | |||
|
|||
def getTasksSummary(tasks: JArrayList[MesosTaskInfo]): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make this private when I merge. This is only used in 1 place.
LGTM I'm merging this into master. Thanks @tnachen! |
@tnachen @andrewor14 thanks for getting this done! Much needed for mesos deployments. |
Thanks! 🎉 🍰 ✌️ |
Just discovering this. Great jobs guys |
…or.cores This is a regression introduced in #4960, this commit fixes it and adds a test. tnachen andrewor14 please review, this should be an easy one. Author: Iulian Dragos <jaguarul@gmail.com> Closes #8653 from dragos/issue/mesos/fine-grained-maxExecutorCores. (cherry picked from commit f0562e8) Signed-off-by: Andrew Or <andrew@databricks.com>
@tnachen |
Oleh, did you specify On Mon, Sep 21, 2015 at 10:11 AM, Oleh Halenok notifications@github.com
|
yes, sure, all roles are configured on masters as |
also, it looks like dispatcher does not receive resource offers other then "*" role:
|
Hi @ohal, you'll need to set spark.mesos.role when you launch the On Mon, Sep 21, 2015 at 10:44 AM, Oleh Halenok notifications@github.com
|
actually, it was done before:
when job is starting I've got some output:
|
That's the spark properties for the job, but might not for the dispatcher. The easiest way to check is to go to the Mesos UI and look at the Tim On Mon, Sep 21, 2015 at 11:19 AM, Oleh Halenok notifications@github.com
|
looks driver starts with some defaults - cpu, mem, below from spark ui: |
Ah this is indeed a bug, need to port the multiple roles logic that's in coarse and fine grain scheduler to cluster scheduler. Will fix this asap |
@tnachen Hi Tim, |
Hi @AndriiOmelianenko, I have a PR out to fix that here #8872 |
Mesos supports framework authentication and role to be set per framework, which the role is used to identify the framework's role which impacts the sharing weight of resource allocation and optional authentication information to allow the framework to be connected to the master.