Skip to content

HDDS-5050. Add retry policy for ratis requests in SCM HA.#2116

Merged
bshashikant merged 5 commits intoapache:masterfrom
bshashikant:HDDS-5050
Apr 9, 2021
Merged

HDDS-5050. Add retry policy for ratis requests in SCM HA.#2116
bshashikant merged 5 commits intoapache:masterfrom
bshashikant:HDDS-5050

Conversation

@bshashikant
Copy link
Contributor

@bshashikant bshashikant commented Apr 6, 2021

What changes were proposed in this pull request?

Added failover and retry logic based on exception received from server.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-5050

How was this patch tested?

Existing UT.

Copy link
Contributor

@GlenGeng-awx GlenGeng-awx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one space before extends

"Unknown command type: " + request.getCmdType());
}
} catch (IOException e) {
if (SCMHAUtils.isRetriableWithNoFailoverException(e)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this RetriableWithNoFailoverException ?
Say a ServiceException containing ReconfigurationInProgressException should return true by isRetriableWithNoFailoverException.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given client side has already done the check, the server side can just return a ServiceException containing the real exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to address the suggested leader in a new Jira?

Yes, it will be addressed in a separate jira.

Copy link
Contributor Author

@bshashikant bshashikant Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given client side has already done the check, the server side can just return a ServiceException containing the real exception.

The Service exception is not thrown for all exceptions back from server. Only for certain exceptions, it throws service exception. Only StorageContainerLocation server seems to be sending back service exception back to client for each failure. I have removed the check in StorageContainerLocationServer.

Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to address the suggested leader in a new Jira?

@bshashikant
Copy link
Contributor Author

Do we want to address the suggested leader in a new Jira?

Yes.

@GlenGeng-awx
Copy link
Contributor

+1. Thanks for the clarification.

if (failovers >= maxRetryCount) {
public RetryPolicy.RetryAction getRetryAction(int failovers, int retry,
Exception e) {
if (SCMHAUtils.isRetriableWithNoFailoverException(e)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by the way, can this method replaced with SCMHAUtils#getRetryAction ?

@bshashikant
Copy link
Contributor Author

Committing this. Will take care of other suggested changes in https://issues.apache.org/jira/browse/HDDS-5051.

@bshashikant bshashikant merged commit 8ea5be9 into apache:master Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments