Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-3291. Write operation when both OM followers are shutdown. #733

Merged
merged 2 commits into from
Mar 30, 2020

Conversation

bharatviswa504
Copy link
Contributor

@bharatviswa504 bharatviswa504 commented Mar 27, 2020

What changes were proposed in this pull request?

Add IPC client time out, so that the client will fail with socket time out exception in cases of 2 OM node failures.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3291

Please replace this section with the link to the Apache JIRA)

How was this patch tested?

Tested it in docker compose cluster with 1 minute, and see that it failed finally, instead of hanging.

To repro this test, we need to change leader.election.time.out value also to large value, as we need this request to be submitted to ratis, and as ratis server keeps on retry then only we will see this issue.

2020-03-27 16:25:27,625 [main] INFO RetryInvocationHandler:411 - com.google.protobuf.ServiceException: java.net.SocketTimeoutException: Call From c5263a1df1ad/172.22.0.3 to om2:9862 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.22.0.3:56460 remote=om2/172.22.0.4:9862]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout, while invoking $Proxy19.submitRequest over nodeId=om2,nodeAddress=om2:9862 after 15 failover attempts. Trying to failover immediately.
2020-03-27 16:25:27,626 [main] ERROR OMFailoverProxyProvider:285 - Failed to connect to OMs: [nodeId=om1,nodeAddress=om1:9862, nodeId=om3,nodeAddress=om3:9862, nodeId=om2,nodeAddress=om2:9862]. Attempted 15 failovers.

For now, the changed config affects all rpc clients in the system, in the future if we want to add a new RPC client time out, we can add a new config. For now, using the same config of ipc client.

Copy link
Contributor

@arp7 arp7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@bharatviswa504
Copy link
Contributor Author

Thank You @arp7 for the review.

@bharatviswa504 bharatviswa504 merged commit 5e23b25 into apache:master Mar 30, 2020
mukul1987 added a commit that referenced this pull request Apr 9, 2020
bharatviswa504 pushed a commit that referenced this pull request Apr 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants