Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RATIS-1656. Leftover usage of ForkJoinPool.commonPool() in RaftServerImpl #702

Merged
merged 4 commits into from
Aug 7, 2022

Conversation

adoroszlai
Copy link
Contributor

What changes were proposed in this pull request?

RaftServerImpl#appendEntriesAsync is still using the common pool here:

return JavaUtils.allOf(futures).whenCompleteAsync(
(r, t) -> followerState.ifPresent(fs -> fs.updateLastRpcTime(FollowerState.UpdateType.APPEND_COMPLETE))

Preceding steps in appendEntriesAsync are already running on serverExecutor, so I think there is no need to further defer this step.

https://issues.apache.org/jira/browse/RATIS-1656

How was this patch tested?

Regular CI:
https://github.com/adoroszlai/incubator-ratis/actions/runs/2789205160

@adoroszlai adoroszlai self-assigned this Aug 3, 2022
Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adoroszlai , thanks a lot for catching this! Please see the comment inlined.

@@ -1413,7 +1413,7 @@ leaderId, getMemberId(), currentTerm, followerCommit, state.getNextIndex(), NOT_
getRaftServer().getPeer());
}
}
return JavaUtils.allOf(futures).whenCompleteAsync(
return JavaUtils.allOf(futures).whenComplete(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actions supplied for dependent completions of non-async methods may be performed by the thread that completes the current CompletableFuture, or by any other caller of a completion method.

According to the above javadoc https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletableFuture.html , it uses the thread, which is SegmentedRaftLogWorker in this case, completing futures passed to allOf(..) but not the thread running appendEntriesAsync. Therefore, we should pass serverExecutor as below

    return JavaUtils.allOf(futures).whenCompleteAsync(
        (r, t) -> followerState.ifPresent(fs -> fs.updateLastRpcTime(FollowerState.UpdateType.APPEND_COMPLETE)),
        serverExecutor
    ).thenApply(v -> {

Copy link
Contributor Author

@adoroszlai adoroszlai Aug 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @szetszwo for the review. Passing serverExecutor was my first attempt, but it caused timeout in TestInstallSnapshotNotificationWithGrpc.

It may indicate that the test or some other part of Ratis needs further tweak. I'll take another look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestInstallSnapshotNotificationWithGrpc may be flaky after all, though I haven't found recent commits with the same failure.

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 the change looks good.

@adoroszlai
Copy link
Contributor Author

Tested TestInstallSnapshotNotificationWithGrpc repeatedly both without and with this patch.

  1. without the patch
    • timeout: 1%
    • IllegalArgumentException: ...-SegmentedRaftLog is expected to be opened but it is CLOSED: 10%
  2. with the patch
    • timeout: 8%
    • IllegalArgumentException: ...-SegmentedRaftLog is expected to be opened but it is CLOSED: none

Timeout happens while cluster is shutting down:

TestTimedOutException: test timed out after 100 seconds
	at org.apache.ratis.server.impl.RaftServerProxy$ImplMap.toString(RaftServerProxy.java:159)
	at java.lang.String.valueOf(String.java:2994)
	at java.lang.StringBuilder.append(StringBuilder.java:136)
	at org.apache.ratis.server.impl.RaftServerProxy.toString(RaftServerProxy.java:637)
	...
	at org.apache.ratis.server.impl.MiniRaftCluster.printServers(MiniRaftCluster.java:534)
	at org.apache.ratis.server.impl.MiniRaftCluster.shutdown(MiniRaftCluster.java:832)
	at org.apache.ratis.server.impl.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:144)
	at org.apache.ratis.server.impl.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:118)
	at org.apache.ratis.InstallSnapshotNotificationTests.testInstallSnapshotDuringBootstrap(InstallSnapshotNotificationTests.java:501)

Looking into that I've found parallelStream in RaftServerProxy: adoroszlai@374396d

With that additional change

  • testInstallSnapshotDuringBootstrap timed out 1%
  • testInstallSnapshotInstalledEvent timed out 1%
  • testInstallSnapshotInstalledEvent failed 1%
  • testRestartFollower failed with IllegalArgumentException 1%
TestTimedOutException: test timed out after 100 seconds
	...
	at org.apache.ratis.grpc.client.GrpcClientProtocolClient.setConfiguration(GrpcClientProtocolClient.java:200)
	at org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:102)
	at org.apache.ratis.client.impl.BlockingImpl.sendRequest(BlockingImpl.java:134)
	at org.apache.ratis.client.impl.BlockingImpl.sendRequestWithRetry(BlockingImpl.java:99)
	at org.apache.ratis.client.impl.AdminImpl.setConfiguration(AdminImpl.java:46)
	at org.apache.ratis.client.api.AdminApi.setConfiguration(AdminApi.java:51)
	at org.apache.ratis.client.api.AdminApi.setConfiguration(AdminApi.java:45)
	at org.apache.ratis.server.impl.MiniRaftCluster.setConfiguration(MiniRaftCluster.java:816)
	at org.apache.ratis.InstallSnapshotNotificationTests.testInstallSnapshotInstalledEvent(InstallSnapshotNotificationTests.java:463)

@adoroszlai
Copy link
Contributor Author

TestInstallSnapshotNotificationWithGrpc passed in 100/100 runs with f642b14:
https://github.com/adoroszlai/incubator-ratis/actions/runs/2804703023

@szetszwo
Copy link
Contributor

szetszwo commented Aug 6, 2022

@adoroszlai , thanks a lot for working hard on this! How about we merge the current change?

@adoroszlai adoroszlai merged commit 9bbb440 into apache:master Aug 7, 2022
@adoroszlai adoroszlai deleted the RATIS-1656 branch August 7, 2022 06:28
@adoroszlai
Copy link
Contributor Author

Thanks @szetszwo for the review.

JoeCqupt pushed a commit to JoeCqupt/ratis that referenced this pull request Aug 7, 2022
codings-dan pushed a commit that referenced this pull request Aug 16, 2022
symious pushed a commit to symious/ratis that referenced this pull request Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants