New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YARN-8900. [Follow Up] Fix FederationInterceptorREST#invokeConcurrent Inaccurate Order of Subclusters. #5260
Conversation
…s transparently to multiple RMs.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@goiri Can you help review this PR? Thank you very much! |
@@ -2512,36 +2513,41 @@ private <R> Map<SubClusterInfo, R> invokeConcurrent(Collection<SubClusterInfo> c | |||
getMethod(request.getMethodName(), request.getTypes()); | |||
Object retObj = method.invoke(interceptor, request.getParams()); | |||
R ret = clazz.cast(retObj); | |||
return Pair.of(ret, null); | |||
return Triple.of(info, ret, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This starts to get too obfuscated.
Let's create some temp class like SubClusterResult
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for your suggestion, I will modify the code.
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@goiri Can you help review this PR again? Thank you very much! |
@@ -2496,7 +2497,8 @@ private <R> Map<SubClusterInfo, R> invokeConcurrent(Collection<SubClusterInfo> c | |||
// If there is a sub-cluster access error, | |||
// we should choose whether to throw exception information according to user configuration. | |||
// Send the requests in parallel. | |||
CompletionService<Pair<R, Exception>> compSvc = new ExecutorCompletionService<>(threadpool); | |||
CompletionService<SubClusterResult<R>> compSvc = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for helping to review the code, I will modify the code.
@@ -45,7 +45,7 @@ | |||
|
|||
import org.apache.commons.lang3.NotImplementedException; | |||
import org.apache.commons.lang3.StringUtils; | |||
import org.apache.commons.lang3.tuple.Pair; | |||
import org.apache.commons.lang3.tuple.Triple; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FederationInterceptorREST.java:48:import org.apache.commons.lang3.tuple.Triple;:8: Unused import - org.apache.commons.lang3.tuple.Triple. [UnusedImports]
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@goiri Can you help to merge this pr into the trunk branch? Thank you very much! |
@goiri Thank you very much for helping to review the code! |
JIRA: YARN-8900. [Router] Federation: routing getContainers REST invocations transparently to multiple RMs.
BackGround
In
FederationInterceptorREST#invokeConcurrent
, we define theinvokeConcurrent
method, which is used to call multiple sub-clusters concurrently to improve performance.ExecutorCompletionService
can help us execute Tasks concurrently, but it cannot guarantee the return order of results, and the return order of results may be inconsistent.Examples are as follows:
We have 4
subClusters
, usinginvokeConcurrent
to call concurrently, our call sequence issubCluster1
,subCluster2
,subCluster3
,subCluster4
. But because theexecution time
of interfaces of subclusters is different, we can get the following order.subCluster2
,subCluster1
,subCluster4
,subCluster3
.This part of the code may get inaccurate results and sub-cluster mapping.
Influences
This problem will not affect the accuracy of the overall returned results of the interface, because we will integrate the returned results for the interfaces involved in
FederationInterceptorREST
. The known impact is that when a certain sub-cluster is abnormal, it may cause the printed error log is inaccurate.Examples are as follows:
When we use the results, we ignore the subcluster information.
FederationClientInterceptor#invokeConcurrent
AndFederationRMAdminInterceptor#invokeConcurrent
will not have this problem, because the result contains subClusterId.Examples are as follows: