-
Notifications
You must be signed in to change notification settings - Fork 9.2k
YARN-11566. Yarn app kill command can not kill the application in sec… #6068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
There are three ways to solve this problem:
Solution 1: Solution 2: Here I will choose solution 3. UAM is managed by NM, so this solution is most reasonable. This PR is about solution 3. |
@zhengchenyu Thanks for your contribution! Can we describe what |
| */ | ||
| @Override | ||
| public void shutdown() { | ||
| public void shutdown(boolean stop) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YARN-6848 plans to unify the Interceptor interface. Can we not change the shutdown parameters? Are there any other ways we can solve the issue?
| if (response == null) { | ||
| throw new YarnException( | ||
| "Failed Force-killing UAM id " + uamId + " for application " + appId); | ||
| LOG.error("Failed Force-killing UAM id " + uamId + " for application " + appId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOG.error("Failed Force-killing UAM id {} for application {}.", appId);
| Assert.assertEquals(2, unmanagedAppMasterMap.size()); | ||
|
|
||
| // threadpool may be shutdown before finishApplicationThread is executed. | ||
| threadpool.shutdownNow(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In actual situations, what situations would cause the thread pool to be terminated?
| Assert.assertEquals(2, interceptor.getUnmanagedAMPoolSize()); | ||
|
|
||
| // Allocate the second batch of containers, with sc1 and sc3 active | ||
| deRegisterSubCluster(SubClusterId.newInstance("SC-2")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In unit test, what is the purpose of deRegisterS SC-2 offline to verify?
I did not describe this clearly. It happen when the stop is true for shutdown. It will happen just after this PR. Firstly I will find a better way of shutdown for kill application. Then resolve other problem. |
|
💔 -1 overall
This message was automatically generated. |
…ondary sub cluster.
6325294 to
b995fa8
Compare
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
Quick review, LGTM, read this pr carefully later. |
|
We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
Description of PR
When AMRMProxy is enable, the application may allocate container among multi sub cluster. The application in secondary sub cluster will be labeled as unmananged application. When we run 'yarn app -kill {appid}', the unmananged application will not be killed in secondary sub cluster.
The unmanaged application will be removed util app attempt is expired after 15 minute.
How was this patch tested?
unit test and test in real cluster.
For code changes:
interceptor.shutdown. So UnmanagedAMPoolManager::unmanagedAppMasterMap may be removed by interceptor.shutdown. And threadpool may be shutdown by interceptor.shutdown. So I use a copy of unmanagedAppMasterMap. And do not use threadpool. Because ForceFinishApplicationThread is a async thread, will not stuck serviceStop, to avoid to allocate new resource, here I just force kill application sequentially.