Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RATIS-982. Fix RaftServerImpl illegal transition from RUNNING to RUNNING #134

Merged
merged 5 commits into from Jun 24, 2020
Merged

RATIS-982. Fix RaftServerImpl illegal transition from RUNNING to RUNNING #134

merged 5 commits into from Jun 24, 2020

Conversation

runzhiwang
Copy link
Contributor

What changes were proposed in this pull request?

What's the problem ?
This happens in test, but it maybe also happen in production.

For example, leader is s3 and follower is s4.

  1. kill s4, and restart s4.
2020-06-19 07:03:18,095 [Thread-6194] INFO  ratis.MiniRaftCluster (MiniRaftCluster.java:killServer(458)) - killServer s4
2020-06-19 07:03:18,095 [Thread-6194] INFO  ratis.MiniRaftCluster (MiniRaftCluster.java:newRaftServer(330)) - newRaftServer: s4, group-5BD7E8A01610:[s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, s1:0.0.0.0:33783, s2:0.0.0.0:40473], format? false
  1. s4 start and set configuration from storage at setRaftConf(raftConf.getLogEntryIndex(), raftConf) and s4 will change to RUNNING at lifeCycle.transition(RUNNING)
2020-06-19 07:03:18,127 [pool-16-thread-1] INFO  impl.RaftServerImpl (ServerState.java:setRaftConf(356)) - s4@group-5BD7E8A01610: set configuration 0: [s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, s1:0.0.0.0:33783, s2:0.0.0.0:40473], old=null at 0
2020-06-19 07:03:18,153 [Thread-6194] INFO  impl.RaftServerImpl (RaftServerImpl.java:start(185)) - s4@group-5BD7E8A01610: start as a follower, conf=0: [s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, s1:0.0.0.0:33783, s2:0.0.0.0:40473], old=null
2020-06-19 07:03:18,153 [Thread-6194] INFO  impl.RaftServerImpl (RaftServerImpl.java:setRole(174)) - s4@group-5BD7E8A01610: changes role from      null to FOLLOWER at term 1 for startAsFollower
  1. s3 send append entry request to s4, and s4 change to RUNNING at lifeCycle.compareAndTransition(STARTING, RUNNING)

2020-06-19 07:03:18,162 [nioEventLoopGroup-59-1] DEBUG impl.RaftServerImpl (RaftServerImpl.java:logAppendEntries(918)) - s4@group-5BD7E8A01610: receive appendEntries(s3, 1, (t:1, i:0), 0, false, commits[s3:c0, s4:c0, s0:c0, s1:c0, s2:c0], entries: (t:1, i:1), STATEMACHINELOGENTRY, client-9414EC4E73DA, cid=3000

  1. If change to RUNNING in step3 happens before step2, then step2 will throw exception.
2020-06-19 07:03:18,169 [Thread-6194] INFO  impl.RoleInfo (RoleInfo.java:updateAndGet(143)) - s4: start FollowerState
2020-06-19 07:03:18,174 [Thread-6194] ERROR netty.TestRaftWithNetty (ExitUtils.java:terminate(133)) - Terminating with exit status -1: Failed to kill/restart server: s4
2020-06-19T07:03:18.1918474Z java.lang.IllegalStateException: ILLEGAL TRANSITION: In s4, RUNNING -> RUNNING
2020-06-19T07:03:18.1918899Z 	at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:63)
2020-06-19T07:03:18.1919240Z 	at org.apache.ratis.util.LifeCycle$State.validate(LifeCycle.java:115)
2020-06-19T07:03:18.1919558Z 	at org.apache.ratis.util.LifeCycle.transition(LifeCycle.java:155)
2020-06-19T07:03:18.1919878Z 	at org.apache.ratis.server.impl.RaftServerImpl.startAsFollower(RaftServerImpl.java:214)
2020-06-19T07:03:18.1920206Z 	at org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:186)
2020-06-19T07:03:18.1920520Z 	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
2020-06-19T07:03:18.1920839Z 	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
2020-06-19T07:03:18.1921330Z 	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
2020-06-19T07:03:18.1921639Z 	at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
2020-06-19T07:03:18.1921951Z 	at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
2020-06-19T07:03:18.1922261Z 	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
2020-06-19T07:03:18.1922575Z 	at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401)
2020-06-19T07:03:18.1922885Z 	at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734)
2020-06-19T07:03:18.1925464Z 	at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
2020-06-19T07:03:18.1940816Z 	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
2020-06-19T07:03:18.1953283Z 	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
2020-06-19T07:03:18.1967610Z 	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
2020-06-19T07:03:18.1980549Z 	at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:650)
2020-06-19T07:03:18.1991620Z 	at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:301)
2020-06-19T07:03:18.1991958Z 	at org.apache.ratis.MiniRaftCluster.restartServer(MiniRaftCluster.java:312)
2020-06-19T07:03:18.1992275Z 	at org.apache.ratis.MiniRaftCluster.restartServer(MiniRaftCluster.java:304)
2020-06-19T07:03:18.1992609Z 	at org.apache.ratis.RaftBasicTests.lambda$killAndRestartServer$2(RaftBasicTests.java:100)
2020-06-19T07:03:18.1992920Z 	at java.lang.Thread.run(Thread.java:748)

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/RATIS-982

How was this patch tested?

Existed tests.

@runzhiwang runzhiwang changed the title RATIS-982. Fix RaftServerImpl failed to change from RUNNING to RUNNING RATIS-982. Fix RaftServerImpl illegal transition from RUNNING to RUNNING Jun 19, 2020
@runzhiwang
Copy link
Contributor Author

@bshashikant Could you help review this patch ? Thank you very much.

Copy link
Contributor

@bshashikant bshashikant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @runzhiwang for working on this.

@bshashikant bshashikant merged commit b2d367f into apache:master Jun 24, 2020
symious pushed a commit to symious/ratis that referenced this pull request Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants