New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-37461][YARN] YARN-CLIENT mode client.appId is always null #34710
Conversation
ping @tgravescs @mridulm |
Test build #145628 has finished for PR 34710 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
@AngersZhuuuu can you fill "How was this patch tested?"? |
@@ -181,7 +180,7 @@ private[spark] class Client( | |||
// Get a new application from our RM | |||
val newApp = yarnClient.createApplication() | |||
val newAppResponse = newApp.getNewApplicationResponse() | |||
appId = newAppResponse.getApplicationId() | |||
this.appId = newAppResponse.getApplicationId() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea that setting it here means its new value is visible in the catch clauses below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea that setting it here means its new value is visible in the catch clauses below?
There are two reason to do this:
- It's a refactor that we don't need to define each
appId
since we can use this onethis.appId
. In current code it define treeappId
, not necessary. - We can just use this appId for both
yarn-client
,yarn-cluster
mode. Since when I build some internal plugin, when yarn-client mode here the value is null but cluster mode it have the right value.
ping @tgravescs @mridulm
Updated |
@AngersZhuuuu Could you describe what you propose in |
How about current? |
gentle ping @tgravescs @srowen @HyukjinKwon @mridulm |
Merged to master |
I don't follow this issue - how is the user using Client.appId? Client is an private spark class and appId is currently private member. The tracking url in client mode comes out the same as in cluster mode. So how is the user seeing this? The Jira issue has no description either, please fix that. The function submitApplication returns the appId and now that isn't used in this case which seems a bit odd. All we did here was move the assignment to be a little bit sooner and there is no explanation as to why and nothing here keeping someone from changing it back accidentally. |
For client mode, it directly call The intention of this PR is not to assign values in advance. |
The above doesn't answer my main question as to why this change is needed? what is broken that needs this change?
How is the user getting the application Id from a spark private variable? I assume you mean |
Some internal code, we add some plugin code in internal code and need this value.
I thinks it's also a code refactor so I raise this. In current code it defines |
This is not a public api, it has been brought up before about allowing this to be public but did not go anywhere. My concern here is making changes that make it partially supported but not really and it would be easily broken as devs don't know that use case. That variable is only used in functions for cluster mode so its not obvious at all it needs to be set for client mode. That Client class is unfortunately kind of a mix of things, where many of the functions are more like utility functions used in multiple places. This submitApplication I see as similar, where its called from multiple places and returns the appId that can be used however caller wants. There are other functions in there that take an appId, which doesn't really match, why not just use the appId in the class, its again kind of more like utility function setup, or just changes overtime that made it this way. While I don't agree with supporting this as a public api like it seems you are being used, I'm fine with doing some cleanup here but would like to see it done fully. |
Yeah it's not a public API - using it that way isn't supported or a reasonable motivation. But as a simple refactor it seemed fine enough, and that seemed like OK justification |
@AngersZhuuuu are you willing to do the refactor mentioned, otherwise I will revert this? |
That just sounds like more refactor, which is fine, but why revert this? it just doesn't do much either way |
Got your concern. Thanks for your explain. I will follow up this issue and recheck the whole API to make it more reasonable. |
Sure will do the refactor. |
As @tgravescs mentioned, current refactor is not complete yet. Need to do more refactor. |
@srowen Happy to discuss more, though I don't think it matters to much but my opinion is the patch as is adds no technical benefit other then to support something we don't support (ie user calling private function). |
Note even with a refactor, we won't officially support using that, but I think those changes make more sense from an api point of view and are much less likely to be broken. |
…unnecessary parameter of `appId` ### What changes were proposed in this pull request? In #34710, we assign ApplicationId to `appId` in client mode too. After this change we can refactor more code: 1. We add a method `getApplicationId` to get `appId` from `Client`, and avoid it can be changed outside of `Client`. 2. `submitApplication()` don't return `appId` now. we need to call `getApplicationId` instead. 3. Remove `appId` argument from `monitorApplication()` and `getApplicationReport()`. ### Why are the changes needed? Refactor code. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existed UT Closes #34767 from AngersZhuuuu/SPARK-37461-FOLLOWUP. Authored-by: Angerszhuuuu <angers.zhu@gmail.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
What changes were proposed in this pull request?
In yarn-client mode,
Client.appId
variable is not assigned, it is alwaysnull
, in cluster mode, this variable will be assigned to the true value. In this patch, we assign true application id toappId
too.For client mode, it directly call submitApplication and return appId to YarnClientSchedulerBackend.buildtoYarn(). So for client mode, we only can assign ApplicationId to appId in submitApplication. Then since this value is assigned. so We don't need to add a new variable appId in
createContainerLaunchContext()
. and don need assignthis.appId
inrun()
.Why are the changes needed?
appId
three tines in each function, we can just use this private variable.Does this PR introduce any user-facing change?
No
How was this patch tested?
Manuel tested.
We have a internal proxy server to replace yarn tracking url, here use
appId
, with this patch it's not null.