HDDS-5050. Add retry policy for ratis requests in SCM HA.#2116
HDDS-5050. Add retry policy for ratis requests in SCM HA.#2116bshashikant merged 5 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
one space before extends
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMHAUtils.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMHAUtils.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMHAUtils.java
Outdated
Show resolved
Hide resolved
...rk/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
Show resolved
Hide resolved
...rc/main/java/org/apache/hadoop/hdds/scm/proxy/SCMContainerLocationFailoverProxyProvider.java
Show resolved
Hide resolved
| "Unknown command type: " + request.getCmdType()); | ||
| } | ||
| } catch (IOException e) { | ||
| if (SCMHAUtils.isRetriableWithNoFailoverException(e)) { |
There was a problem hiding this comment.
Do we need this RetriableWithNoFailoverException ?
Say a ServiceException containing ReconfigurationInProgressException should return true by isRetriableWithNoFailoverException.
There was a problem hiding this comment.
Given client side has already done the check, the server side can just return a ServiceException containing the real exception.
There was a problem hiding this comment.
Do we want to address the suggested leader in a new Jira?
Yes, it will be addressed in a separate jira.
There was a problem hiding this comment.
Given client side has already done the check, the server side can just return a
ServiceExceptioncontaining the real exception.
The Service exception is not thrown for all exceptions back from server. Only for certain exceptions, it throws service exception. Only StorageContainerLocation server seems to be sending back service exception back to client for each failure. I have removed the check in StorageContainerLocationServer.
bharatviswa504
left a comment
There was a problem hiding this comment.
Do we want to address the suggested leader in a new Jira?
Yes. |
|
+1. Thanks for the clarification. |
| if (failovers >= maxRetryCount) { | ||
| public RetryPolicy.RetryAction getRetryAction(int failovers, int retry, | ||
| Exception e) { | ||
| if (SCMHAUtils.isRetriableWithNoFailoverException(e)) { |
There was a problem hiding this comment.
by the way, can this method replaced with SCMHAUtils#getRetryAction ?
|
Committing this. Will take care of other suggested changes in https://issues.apache.org/jira/browse/HDDS-5051. |
What changes were proposed in this pull request?
Added failover and retry logic based on exception received from server.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-5050
How was this patch tested?
Existing UT.