Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37461][YARN] YARN-CLIENT mode client.appId is always null #34710

Closed
wants to merge 1 commit into from

Conversation

AngersZhuuuu
Copy link
Contributor

@AngersZhuuuu AngersZhuuuu commented Nov 25, 2021

What changes were proposed in this pull request?

In yarn-client mode, Client.appId variable is not assigned, it is always null, in cluster mode, this variable will be assigned to the true value. In this patch, we assign true application id to appId too.

For client mode, it directly call submitApplication and return appId to YarnClientSchedulerBackend.buildtoYarn(). So for client mode, we only can assign ApplicationId to appId in submitApplication. Then since this value is assigned. so We don't need to add a new variable appId in createContainerLaunchContext(). and don need assign this.appId in run().

Why are the changes needed?

  1. Refactor the code to avoid define appId three tines in each function, we can just use this private variable.
  2. In client mode, user can use this value to get the application id in their internal code.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Manuel tested.

We have a internal proxy server to replace yarn tracking url, here use appId, with this patch it's not null.

21/11/26 12:38:44 INFO Client:
	 client token: N/A
	 diagnostics: AM container is launched, waiting for AM container to Register with RM
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: user_queue
	 start time: 1637901520956
	 final status: UNDEFINED
	 tracking URL: http://internal-proxy-server/proxy?applicationId=application_1635856758535_4209064
	 user: user_name

@github-actions github-actions bot added the YARN label Nov 25, 2021
@AngersZhuuuu
Copy link
Contributor Author

ping @tgravescs @mridulm

@SparkQA
Copy link

SparkQA commented Nov 25, 2021

Test build #145628 has finished for PR 34710 at commit d4a48d8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50099/

@SparkQA
Copy link

SparkQA commented Nov 25, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50099/

@HyukjinKwon
Copy link
Member

@AngersZhuuuu can you fill "How was this patch tested?"?

@@ -181,7 +180,7 @@ private[spark] class Client(
// Get a new application from our RM
val newApp = yarnClient.createApplication()
val newAppResponse = newApp.getNewApplicationResponse()
appId = newAppResponse.getApplicationId()
this.appId = newAppResponse.getApplicationId()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea that setting it here means its new value is visible in the catch clauses below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea that setting it here means its new value is visible in the catch clauses below?

There are two reason to do this:

  1. It's a refactor that we don't need to define each appId since we can use this one this.appId. In current code it define tree appId, not necessary.
  2. We can just use this appId for both yarn-client, yarn-cluster mode. Since when I build some internal plugin, when yarn-client mode here the value is null but cluster mode it have the right value.

@AngersZhuuuu
Copy link
Contributor Author

ping @tgravescs @mridulm

@AngersZhuuuu can you fill "How was this patch tested?"?

Updated

@sarutak
Copy link
Member

sarutak commented Nov 26, 2021

@AngersZhuuuu Could you describe what you propose in What changes were proposed in this pull request??
YARN-CLIENT mode client.appId is always null doesn't describe what you propose, but what the problem is (maybe, Why are the changes needed? needs to be modified together).

@AngersZhuuuu
Copy link
Contributor Author

@AngersZhuuuu Could you describe what you proposes in What changes were proposed in this pull request?? YARN-CLIENT mode client.appId is always null doesn't describe what you proposes, but what the problem is (maybe, Why are the changes needed? needs to be modified together).

How about current?

@AngersZhuuuu
Copy link
Contributor Author

gentle ping @tgravescs @srowen @HyukjinKwon @mridulm

@srowen srowen closed this in db9a982 Nov 28, 2021
@srowen
Copy link
Member

srowen commented Nov 28, 2021

Merged to master

@tgravescs
Copy link
Contributor

I don't follow this issue - how is the user using Client.appId? Client is an private spark class and appId is currently private member. The tracking url in client mode comes out the same as in cluster mode. So how is the user seeing this?

The Jira issue has no description either, please fix that.

The function submitApplication returns the appId and now that isn't used in this case which seems a bit odd. All we did here was move the assignment to be a little bit sooner and there is no explanation as to why and nothing here keeping someone from changing it back accidentally.

@AngersZhuuuu
Copy link
Contributor Author

The function submitApplication returns the appId and now that isn't used in this case which seems a bit odd. All we did here was move the assignment to be a little bit sooner and there is no explanation as to why and nothing here keeping someone from changing it back accidentally.

For client mode, it directly call submitApplication and return appId to YarnClientSchedulerBackend.buildtoYarn(). So for client mode, we only can assign ApplicationId to appId in submitApplication. Then since this value is assigned. so We don't need to add a new variable appId in createContainerLaunchContext. and don need assign this.appId in run().

The intention of this PR is not to assign values in advance.

@tgravescs
Copy link
Contributor

tgravescs commented Nov 30, 2021

The above doesn't answer my main question as to why this change is needed? what is broken that needs this change?
The description mentions:

In client mode, user can use this value to get the application id.

How is the user getting the application Id from a spark private variable?

I assume you mean YarnClientSchedulerBackend.bindToYarn(). yes we return appId to this and it uses it, and I can see where this perhaps cleans up the code by setting this.appId directly, but it only changes it very slightly to set it a bit earlier. From just a brief look I don't see why it needs to be set earlier, so a description as to why is needed.

@AngersZhuuuu
Copy link
Contributor Author

AngersZhuuuu commented Nov 30, 2021

The above doesn't answer my main question as to why this change is needed? what is broken that needs this change? The description mentions:

In client mode, user can use this value to get the application id.

How is the user getting the application Id from a spark private variable?

Some internal code, we add some plugin code in internal code and need this value.

I assume you mean YarnClientSchedulerBackend.bindToYarn(). yes we return appId to this and it uses it, and I can see where this perhaps cleans up the code by setting this.appId directly, but it only changes it very slightly to set it a bit earlier. From just a brief look I don't see why it needs to be set earlier, so a description as to why is needed.

I thinks it's also a code refactor so I raise this. In current code it defines appId for three times and seems unnecessary. Also we don't left it as null value.

@tgravescs
Copy link
Contributor

Some internal code, we add some plugin code in internal code and need this value.

This is not a public api, it has been brought up before about allowing this to be public but did not go anywhere. My concern here is making changes that make it partially supported but not really and it would be easily broken as devs don't know that use case. That variable is only used in functions for cluster mode so its not obvious at all it needs to be set for client mode.

That Client class is unfortunately kind of a mix of things, where many of the functions are more like utility functions used in multiple places. This submitApplication I see as similar, where its called from multiple places and returns the appId that can be used however caller wants. There are other functions in there that take an appId, which doesn't really match, why not just use the appId in the class, its again kind of more like utility function setup, or just changes overtime that made it this way.

While I don't agree with supporting this as a public api like it seems you are being used, I'm fine with doing some cleanup here but would like to see it done fully.
I don't think submitApplication needs to return the appId, but we add method to get it. Then the YarnClientSchedulerBackend calls submitApplication() and then after calling that, it calls client.getApplicationId to pass into the bindToYarn function. There should also be a comment on the Client.appId saying setting to ensure works for both cluster and client mode. I also think monitorApplication and getApplicationReport should be changed to not take the appId but use the one in Client class. If they are truly utility functions that take the appId they should be moved to the Client object, but I think both of those functions use other internal Client variables or functions.
The createContainerLaunchContext change made here, one could argue we should be looking at the app response and make sure it matches the appId the Client thinks it has rather then blindly assuming its from this Client. I think it was safer before then it is now.

@srowen
Copy link
Member

srowen commented Nov 30, 2021

Yeah it's not a public API - using it that way isn't supported or a reasonable motivation. But as a simple refactor it seemed fine enough, and that seemed like OK justification

@tgravescs
Copy link
Contributor

@AngersZhuuuu are you willing to do the refactor mentioned, otherwise I will revert this?

@srowen
Copy link
Member

srowen commented Nov 30, 2021

That just sounds like more refactor, which is fine, but why revert this? it just doesn't do much either way

@AngersZhuuuu
Copy link
Contributor Author

This is not a public api, it has been brought up before about allowing this to be public but did not go anywhere. My concern here is making changes that make it partially supported but not really and it would be easily broken as devs don't know that use case. That variable is only used in functions for cluster mode so its not obvious at all it needs to be set for client mode.

That Client class is unfortunately kind of a mix of things, where many of the functions are more like utility functions used in multiple places. This submitApplication I see as similar, where its called from multiple places and returns the appId that can be used however caller wants. There are other functions in there that take an appId, which doesn't really match, why not just use the appId in the class, its again kind of more like utility function setup, or just changes overtime that made it this way.

While I don't agree with supporting this as a public api like it seems you are being used, I'm fine with doing some cleanup here but would like to see it done fully. I don't think submitApplication needs to return the appId, but we add method to get it. Then the YarnClientSchedulerBackend calls submitApplication() and then after calling that, it calls client.getApplicationId to pass into the bindToYarn function. There should also be a comment on the Client.appId saying setting to ensure works for both cluster and client mode. I also think monitorApplication and getApplicationReport should be changed to not take the appId but use the one in Client class. If they are truly utility functions that take the appId they should be moved to the Client object, but I think both of those functions use other internal Client variables or functions. The createContainerLaunchContext change made here, one could argue we should be looking at the app response and make sure it matches the appId the Client thinks it has rather then blindly assuming its from this Client. I think it was safer before then it is now.

Got your concern. Thanks for your explain. I will follow up this issue and recheck the whole API to make it more reasonable.

@AngersZhuuuu
Copy link
Contributor Author

AngersZhuuuu commented Nov 30, 2021

@AngersZhuuuu are you willing to do the refactor mentioned, otherwise I will revert this?

Sure will do the refactor.

@AngersZhuuuu
Copy link
Contributor Author

That just sounds like more refactor, which is fine, but why revert this? it just doesn't do much either way

As @tgravescs mentioned, current refactor is not complete yet. Need to do more refactor.

@tgravescs
Copy link
Contributor

@srowen Happy to discuss more, though I don't think it matters to much but my opinion is the patch as is adds no technical benefit other then to support something we don't support (ie user calling private function).
I would actually argue it also isn't cleanup as this made the code less consistent in the api and harder to maintain (obscure "use case" that could easily be broken).

@tgravescs
Copy link
Contributor

Note even with a refactor, we won't officially support using that, but I think those changes make more sense from an api point of view and are much less likely to be broken.

asfgit pushed a commit that referenced this pull request Dec 2, 2021
…unnecessary parameter of `appId`

### What changes were proposed in this pull request?
In #34710, we assign ApplicationId to `appId` in client mode too. After this change we can refactor more code:

1. We add a method `getApplicationId` to get `appId` from `Client`, and avoid it can be changed outside of `Client`.
2. `submitApplication()` don't return `appId` now. we need to call `getApplicationId` instead.
3. Remove `appId` argument from `monitorApplication()` and `getApplicationReport()`.

### Why are the changes needed?
Refactor code.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existed UT

Closes #34767 from AngersZhuuuu/SPARK-37461-FOLLOWUP.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants