Skip to content

RATIS-1895. Replace increaseNextIndex with updateNextIndex in GrpcLogAppender#926

Merged
szetszwo merged 1 commit intoapache:masterfrom
SzyWilliam:jira1895
Sep 25, 2023
Merged

RATIS-1895. Replace increaseNextIndex with updateNextIndex in GrpcLogAppender#926
szetszwo merged 1 commit intoapache:masterfrom
SzyWilliam:jira1895

Conversation

@SzyWilliam
Copy link
Member

When testing for 3.0.0 release, we found unexpected error messages introduced by a recent PR #914.

27951 [grpc-default-executor-6] ERROR o.a.r.grpc.server.GrpcLogAppender - Failed onNext request=AppendEntriesRequest:cid=13,entriesCount=3,lastEntry=(t:2, i:340), reply=1<-2#13:OK-t2,SUCCESS,nextIndex=341,followerCommit=339,matchIndex=340 
java.lang.IllegalStateException: Failed to updateIncreasingly for nextIndex: 343 -> 341
	at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:73)
	at org.apache.ratis.server.raftlog.RaftLogIndex.updateIncreasingly(RaftLogIndex.java:68)
	at org.apache.ratis.server.impl.FollowerInfoImpl.increaseNextIndex(FollowerInfoImpl.java:96)
	at org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNextImpl(GrpcLogAppender.java:423)
	at org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNext(GrpcLogAppender.java:402)
	at org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNext(GrpcLogAppender.java:377)
	at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:474)
	at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:662)
	at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:647)
	at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)

see https://issues.apache.org/jira/browse/RATIS-1895.

@SzyWilliam SzyWilliam requested a review from szetszwo September 25, 2023 02:19
@OneSizeFitsQuorum
Copy link
Contributor

OneSizeFitsQuorum commented Sep 25, 2023

I'm pondering why we should update nextIndex when we receive a successful response. At this point, the GrpcLogAppender thread's streaming send may have already sent it to a larger index. If we reduce nextIndex in response to a smaller matchIndex + 1, even we fix this exception in this pr, it would still lead to the retransmission of many logs that have already been sent, which may affect write performance seriously when pipeline is full.

I checked the example in the issue that introduced this problem, and I feel that the 7th step in the middle might be a typo; it should probably indicate that it was log [5].
image

As far as I recall, gRPC's streaming interface can guarantee sequential semantics, so there shouldn't be a situation where responses sent later arrive earlier. Therefore, could it be that the example in that issue doesn't exist actually?

If the above statement is correct, maybe we can still lower nextIndex as before, only when the snapshot is transferred or some inconsistency is found, and we can leave nextIndex unchanged when we receive a correct response.

@SzyWilliam
Copy link
Member Author

@OneSizeFitsQuorum thanks a lot for the reviews!

I checked the example in the issue that introduced this problem, and I feel that the 7th step in the middle might be a typo; it should probably indicate that it was log [5].

It's a bit tricky. The log[4] follower received in 7th step is sent by step 3, not by step 6.

If we reduce nextIndex in response to a smaller matchIndex + 1, even we fix this exception in this pr, it would still lead to the retransmission of many logs that have already been sent, which may affect write performance seriously when pipeline is full.

I agree that we should not reduce nextIndex, and that's why original pr uses increaseNextIndex. updateNextIndex also will not reduce nextIndex. However, the difference between increaseNextIndex and updateNextIndex is that,

  • increaseNextIndex requires caller to make sure the updated value is larger, or an IllegalStateException will be thrown.
  • updateNextIndex will only update when the provided value is larger than original one.

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 the change looks good.

Thanks for testing 3.0.0 and catching this bug!

@szetszwo szetszwo merged commit d461a01 into apache:master Sep 25, 2023
@OneSizeFitsQuorum
Copy link
Contributor

OneSizeFitsQuorum commented Sep 26, 2023

The log[4] follower received in 7th step is sent by step 3, not by step 6.

@SzyWilliam Thanks a lot for the detailed explanation! Now I understand the process completely. LGTM~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants