HDDS-3291. Write operation when both OM followers are shutdown. #733

bharatviswa504 · 2020-03-27T23:02:29Z

What changes were proposed in this pull request?

Add IPC client time out, so that the client will fail with socket time out exception in cases of 2 OM node failures.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3291

Please replace this section with the link to the Apache JIRA)

How was this patch tested?

Tested it in docker compose cluster with 1 minute, and see that it failed finally, instead of hanging.

To repro this test, we need to change leader.election.time.out value also to large value, as we need this request to be submitted to ratis, and as ratis server keeps on retry then only we will see this issue.

2020-03-27 16:25:27,625 [main] INFO RetryInvocationHandler:411 - com.google.protobuf.ServiceException: java.net.SocketTimeoutException: Call From c5263a1df1ad/172.22.0.3 to om2:9862 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.22.0.3:56460 remote=om2/172.22.0.4:9862]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout, while invoking $Proxy19.submitRequest over nodeId=om2,nodeAddress=om2:9862 after 15 failover attempts. Trying to failover immediately.
2020-03-27 16:25:27,626 [main] ERROR OMFailoverProxyProvider:285 - Failed to connect to OMs: [nodeId=om1,nodeAddress=om1:9862, nodeId=om3,nodeAddress=om3:9862, nodeId=om2,nodeAddress=om2:9862]. Attempted 15 failovers.

For now, the changed config affects all rpc clients in the system, in the future if we want to add a new RPC client time out, we can add a new config. For now, using the same config of ipc client.

arp7

+1

bharatviswa504 · 2020-03-30T20:55:54Z

Thank You @arp7 for the review.

…n. (#733)" This reverts commit 5e23b25.

…n. (#733)" (#803) This reverts commit 5e23b25.

bharatviswa504 requested review from arp7 and hanishakoneru March 27, 2020 23:02

bharatviswa504 added 2 commits March 30, 2020 10:27

HDDS-3291. Write operation when both OM followers are shutdown.

9a6b1db

fix test

024ed51

bharatviswa504 force-pushed the HDDS-3291 branch from da6fd56 to 024ed51 Compare March 30, 2020 18:17

arp7 approved these changes Mar 30, 2020

View reviewed changes

bharatviswa504 merged commit 5e23b25 into apache:master Mar 30, 2020

mukul1987 added a commit that referenced this pull request Apr 9, 2020

Revert "HDDS-3291. Write operation when both OM followers are shutdow…

cbf650f

…n. (#733)" This reverts commit 5e23b25.

mukul1987 mentioned this pull request Apr 9, 2020

Revert "HDDS-3291. Write operation when both OM followers are shutdown." #803

Merged

bharatviswa504 pushed a commit that referenced this pull request Apr 9, 2020

Revert "HDDS-3291. Write operation when both OM followers are shutdow…

e31a4ce

…n. (#733)" (#803) This reverts commit 5e23b25.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-3291. Write operation when both OM followers are shutdown. #733

HDDS-3291. Write operation when both OM followers are shutdown. #733

bharatviswa504 commented Mar 27, 2020 •

edited

Loading

arp7 left a comment

bharatviswa504 commented Mar 30, 2020

HDDS-3291. Write operation when both OM followers are shutdown. #733

HDDS-3291. Write operation when both OM followers are shutdown. #733

Conversation

bharatviswa504 commented Mar 27, 2020 • edited Loading

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

arp7 left a comment

Choose a reason for hiding this comment

bharatviswa504 commented Mar 30, 2020

bharatviswa504 commented Mar 27, 2020 •

edited

Loading