-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YARN-6667. Handle containerId duplicate without failing the heartbeat in Federation Interceptor. #4810
Conversation
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@goiri Can you help review the code? Thank you very much! |
LOG.info("Use Previous Allocated Container's subCluster. " + | ||
"ContainerId: {} ApplicationId: {} From RM: {}.", this.attemptId, | ||
container.getId(), existingSubClusterId); | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a continue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the end of the for loop, there is a piece of code like this
this.containerIdToSubClusterIdMap.put(container.getId(), subClusterId);
If a container uses the previous subcluster, it does not need to execute this line, so use continue, jump out of the loop, and let the for loop continue to execute the next round.
This code may not be easy to read, I will refactor this part of the code.
|
||
// If the previous RM which allocated Container is normal, | ||
// the previous RM will be used first | ||
if ((existAllocatedScHealth && !newAllocatedScHealth) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this condition repetitive? Wouldn't it just be if (existAllocatedScHealth)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion, your suggestion is correct, I will modify the code.
boolean newAllocatedScHealth = true; | ||
|
||
// Previous SubCluster Time Out Or Unusable, Can't Continue to use. | ||
if (timeOutScs.contains(existingSubClusterId) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could unify these two ifs a little.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for helping to review the code, I will modify the code.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@goiri Please help to review the code again, Thank you very much! |
Set<SubClusterId> timeOutScs = getTimedOutSCs(true); | ||
SubClusterInfo subClusterInfo = federationFacade.getSubCluster(subClusterId); | ||
if (timeOutScs.contains(subClusterId) || | ||
subClusterInfo == null || subClusterInfo.getState().isUnusable()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix it.
SubClusterInfo subClusterInfo = federationFacade.getSubCluster(subClusterId); | ||
if (timeOutScs.contains(subClusterId) || | ||
subClusterInfo == null || subClusterInfo.getState().isUnusable()) { | ||
isSCHealth = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just do return true and return false at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion, I will modify the code.
🎊 +1 overall
This message was automatically generated. |
@goiri Please help to review the code again, Thank you very much! |
🎊 +1 overall
This message was automatically generated. |
… in Federation Interceptor. (apache#4810)
JIRA: YARN-6667. Handle containerId duplicate without failing the heartbeat in Federation Interceptor.
From the actual situation, the probability of this happening is very low.
It can only be caused by the master-slave fail-hover of YARN and the wrong Epoch parameter configuration.
We will try to be compatible with this situation and let the Application run as much as possible, using the following measures: