-
Notifications
You must be signed in to change notification settings - Fork 29k
[YARN] SPARK-6470. Add support for YARN node labels. #5242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #29349 has started for PR 5242 at commit
|
|
Test build #29349 has finished for PR 5242 at commit
|
|
Test FAILed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, import ordering here.
This looks like a pretty straightforward change, yeah. Too bad this requires reflection, but it's clean. I look forward to being able to assume later versions of YARN / Hadoop to undo some of this. (Although requiring 2.6 probably won't happen too soon.)
The key use case this enables is, for example, running a Spark app on a subset of a cluster's machines that are labeled as having a GPU. I like this given the relatively low impact simple change and the upside to what it enables.
|
Test build #29408 has started for PR 5242 at commit |
|
Test build #29408 has finished for PR 5242 at commit
|
|
Test PASSed. |
|
mostly looks good. Seems like we may want to add this to the ApplicationSubmissionContext also to control where the AM can go. And the question there is do we want that to be able to go to different label then executors. If so we need two configs. |
|
public static ApplicationSubmissionContext newInstance( I think we could potentially use the appLabelExpression for the executors also without having to set it on each ResourceRequest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sryza, I think the constructor parameter here is not correct, it should be either
public static ApplicationSubmissionContext newInstance(
ApplicationId applicationId, String applicationName, String queue,
Priority priority, ContainerLaunchContext amContainer,
boolean isUnmanagedAM, boolean cancelTokensWhenComplete,
int maxAppAttempts, Resource resource, String applicationType,
boolean keepContainers, String appLabelExpression,
String amContainerLabelExpression)
Or
public static ApplicationSubmissionContext newInstance(
ApplicationId applicationId, String applicationName, String queue,
ContainerLaunchContext amContainer, boolean isUnmanagedAM,
boolean cancelTokensWhenComplete, int maxAppAttempts,
String applicationType, boolean keepContainers,
String appLabelExpression, ResourceRequest resourceRequest)
In 2.6.0 release or after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks correct to me. From the hadoop code:
public ContainerRequest(Resource capability, String[] nodes,
String[] racks, Priority priority, boolean relaxLocality,
String nodeLabelsExpression)
|
@sryza could you update to do application master as well? |
|
@sryza are you still looking at this one? it's a good feature. This guy needs a rebase and Thomas has a decent suggestion here too. |
|
@srowen @tgravescs sorry for the delay here. Have been working on a new patch that adds support or AM label expressions, but ran into some YARN API issues that makes it difficult. Will try to post something soon. |
|
Updated patch still only supports executor node labels, but renames the property so that adding driver/am node labels in the future won't look weird. I also switched to the API that sets the label expression upon app submission. |
|
Test build #30252 has started for PR 5242 at commit |
|
Test build #30252 has finished for PR 5242 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should error out more catastrophically if the user is trying to use a feature that doesn't exist.
|
It turns out that setting the label expression when creating the |
|
Test build #30284 has started for PR 5242 at commit |
|
Test build #30284 has finished for PR 5242 at commit
|
|
Test PASSed. |
|
Test build #30323 has started for PR 5242 at commit |
|
Test build #30323 has finished for PR 5242 at commit
|
|
Test FAILed. |
|
retest this please |
|
@tgravescs the first will apply to the AM as well if there's not one for the AM specifically set: @vanzin yes, came across that as well. My concern was mainly about corner cases with how these get instantiated - do we need to check getAMContainerResourceRequest for null or does it get lazily instantiated. Not big deals at all, but my bandwidth is low right now, and I'm not convinced that we should add AM node label expressions until we hear of a use case for them, so wanted to put that off for a separate JIRA. |
|
@sryza so to clarify this pr is only going to do this for executor? is there a jira for doing it for the application master? Based on above is this ready for review? |
|
That's correct. Only for executor, but I updated the config naming after your earlier comment so that we have room to add it for the AM in the future. I just filed SPARK-7173 for this. This is ready for review. |
|
Test build #31037 has started for PR 5242 at commit |
|
@sryza I think the only remaining comment is to change logInfo to logWarn, otherwise looks good. |
|
Jenkins, test this please |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #32395 has started for PR 5242 at commit |
|
Test build #32395 has finished for PR 5242 at commit
|
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Sorry, missed that. Just updated the patch. |
|
Merged build triggered. |
|
Merged build started. |
|
LGTM pending tests. You can tap it in when you're ready. |
|
Test build #32408 has started for PR 5242 at commit |
|
Test build #32408 has finished for PR 5242 at commit
|
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Checked this in. |
This is difficult to write a test for because it relies on the latest version of YARN, but I verified manually that the patch does pass along the label expression on this version and containers are successfully launched. Author: Sandy Ryza <sandy@cloudera.com> Closes apache#5242 from sryza/sandy-spark-6470 and squashes the following commits: 6af87b9 [Sandy Ryza] Change info to warning 6e22d99 [Sandy Ryza] [YARN] SPARK-6470. Add support for YARN node labels.
This is difficult to write a test for because it relies on the latest version of YARN, but I verified manually that the patch does pass along the label expression on this version and containers are successfully launched. Author: Sandy Ryza <sandy@cloudera.com> Closes apache#5242 from sryza/sandy-spark-6470 and squashes the following commits: 6af87b9 [Sandy Ryza] Change info to warning 6e22d99 [Sandy Ryza] [YARN] SPARK-6470. Add support for YARN node labels.
This is difficult to write a test for because it relies on the latest version of YARN, but I verified manually that the patch does pass along the label expression on this version and containers are successfully launched. Author: Sandy Ryza <sandy@cloudera.com> Closes apache#5242 from sryza/sandy-spark-6470 and squashes the following commits: 6af87b9 [Sandy Ryza] Change info to warning 6e22d99 [Sandy Ryza] [YARN] SPARK-6470. Add support for YARN node labels.
This is difficult to write a test for because it relies on the latest version of YARN, but I verified manually that the patch does pass along the label expression on this version and containers are successfully launched.