Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node does not catch up after restart in RAFT cluster #978

Closed
Zelldon opened this issue Jan 17, 2019 · 2 comments · Fixed by #984
Closed

Node does not catch up after restart in RAFT cluster #978

Zelldon opened this issue Jan 17, 2019 · 2 comments · Fixed by #984
Labels
archived Archived issues from the legacy Java implementation of Atomix legacy Issues from the legacy Java implementation of Atomix

Comments

@Zelldon
Copy link
Contributor

Zelldon commented Jan 17, 2019

Expected behavior

I expected that the node with id member2 catches up with the other nodes after restart.

Actual behavior

Currently the following exceptions are thrown:

11:21:17.887 [raft-server-raft-partition-2] ERROR io.atomix.utils.concurrent.SingleThreadContext - An uncaught exception occurred
io.atomix.storage.StorageException: java.nio.channels.ClosedChannelException
	at io.atomix.storage.journal.FileChannelJournalSegmentWriter.flush(FileChannelJournalSegmentWriter.java:328) ~[atomix-storage-3.1.2.jar:?]
	at io.atomix.storage.journal.FileChannelJournalSegmentWriter.close(FileChannelJournalSegmentWriter.java:334) ~[atomix-storage-3.1.2.jar:?]
	at io.atomix.storage.journal.MappableJournalSegmentWriter.close(MappableJournalSegmentWriter.java:160) ~[atomix-storage-3.1.2.jar:?]
	at io.atomix.storage.journal.JournalSegment.close(JournalSegment.java:255) ~[atomix-storage-3.1.2.jar:?]
	at io.atomix.storage.journal.SegmentedJournal.resetSegments(SegmentedJournal.java:314) ~[atomix-storage-3.1.2.jar:?]
	at io.atomix.storage.journal.SegmentedJournalWriter.reset(SegmentedJournalWriter.java:55) ~[atomix-storage-3.1.2.jar:?]
	at io.atomix.storage.journal.DelegatingJournalWriter.reset(DelegatingJournalWriter.java:62) ~[atomix-storage-3.1.2.jar:?]
	at io.atomix.protocols.raft.roles.PassiveRole.appendEntries(PassiveRole.java:197) ~[atomix-raft-3.1.2.jar:?]
	at io.atomix.protocols.raft.roles.PassiveRole.handleAppend(PassiveRole.java:99) ~[atomix-raft-3.1.2.jar:?]
	at io.atomix.protocols.raft.roles.ActiveRole.onAppend(ActiveRole.java:53) ~[atomix-raft-3.1.2.jar:?]
	at io.atomix.protocols.raft.roles.FollowerRole.onAppend(FollowerRole.java:208) ~[atomix-raft-3.1.2.jar:?]
	at io.atomix.protocols.raft.impl.RaftContext.lambda$null$24(RaftContext.java:692) ~[atomix-raft-3.1.2.jar:?]
	at io.atomix.protocols.raft.impl.RaftContext.lambda$runOnContext$35(RaftContext.java:714) ~[atomix-raft-3.1.2.jar:?]
	at io.atomix.utils.concurrent.SingleThreadContext$1.lambda$execute$0(SingleThreadContext.java:53) ~[atomix-utils-3.1.2.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_192]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_192]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_192]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_192]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_192]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_192]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_192]
Caused by: java.nio.channels.ClosedChannelException
	at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110) ~[?:1.8.0_192]
	at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:379) ~[?:1.8.0_192]
	at io.atomix.storage.journal.FileChannelJournalSegmentWriter.flush(FileChannelJournalSegmentWriter.java:326) ~[atomix-storage-3.1.2.jar:?]
	... 20 more

Steps to reproduce

  • 3 nodes and one client
  • start cluster with all members to init cluster
  • stop member 2 (there is still quorum)
  • run client and produce lot of events until snapshots are written
  • stop client

On high load (where member2 is stopped):

.
├── member1
│   ├── logstream
│   ├── raft
│   │   └── partitions
│   │       ├── 1
│   │       │   ├── raft-partition-1-1.log
│   │       │   ├── raft-partition-1.conf
│   │       │   └── raft-partition-1.meta
│   │       ├── 2
│   │       │   ├── raft-partition-2-145.snapshot
│   │       │   ├── raft-partition-2-1.log
│   │       │   ├── raft-partition-2-2.log
│   │       │   ├── raft-partition-2-3.log
│   │       │   ├── raft-partition-2.conf
│   │       │   └── raft-partition-2.meta
│   │       └── 3
│   │           ├── raft-partition-3-1.log
│   │           ├── raft-partition-3.conf
│   │           └── raft-partition-3.meta
│   └── system
│       └── partitions
│           └── 1
│               ├── system-partition-1-1.log
│               ├── system-partition-1.conf
│               └── system-partition-1.meta
├── member2
│   ├── logstream
│   ├── raft
│   │   └── partitions
│   │       ├── 1
│   │       │   ├── raft-partition-1-1.log
│   │       │   ├── raft-partition-1.conf
│   │       │   └── raft-partition-1.meta
│   │       ├── 2
│   │       │   ├── raft-partition-2-1.log
│   │       │   ├── raft-partition-2.conf
│   │       │   └── raft-partition-2.meta
│   │       └── 3
│   │           ├── raft-partition-3-1.log
│   │           ├── raft-partition-3.conf
│   │           └── raft-partition-3.meta
│   └── system
│       └── partitions
│           └── 1
│               ├── system-partition-1-1.log
│               ├── system-partition-1.conf
│               └── system-partition-1.meta
└── member3
    ├── logstream
    ├── raft
    │   └── partitions
    │       ├── 1
    │       │   ├── raft-partition-1-1.log
    │       │   ├── raft-partition-1.conf
    │       │   └── raft-partition-1.meta
    │       ├── 2
    │       │   ├── raft-partition-2-199.snapshot
    │       │   ├── raft-partition-2-1.log
    │       │   ├── raft-partition-2-2.log
    │       │   ├── raft-partition-2-3.log
    │       │   ├── raft-partition-2.conf
    │       │   └── raft-partition-2.meta
    │       └── 3
    │           ├── raft-partition-3-1.log
    │           ├── raft-partition-3.conf
    │           └── raft-partition-3.meta
    └── system
        └── partitions
            └── 1
                ├── system-partition-1-1.log
                ├── system-partition-1.conf
                └── system-partition-1.meta

27 directories, 45 files

After client is stopped and cluster is on idle again:

├── member1
│   ├── logstream
│   ├── raft
│   │   └── partitions
│   │       ├── 1
│   │       │   ├── raft-partition-1-1.log
│   │       │   ├── raft-partition-1.conf
│   │       │   └── raft-partition-1.meta
│   │       ├── 2
│   │       │   ├── raft-partition-2-298.snapshot
│   │       │   ├── raft-partition-2-3.log
│   │       │   ├── raft-partition-2.conf
│   │       │   └── raft-partition-2.meta
│   │       └── 3
│   │           ├── raft-partition-3-1.log
│   │           ├── raft-partition-3.conf
│   │           └── raft-partition-3.meta
│   └── system
│       └── partitions
│           └── 1
│               ├── system-partition-1-1.log
│               ├── system-partition-1.conf
│               └── system-partition-1.meta
├── member2
│   ├── logstream
│   ├── raft
│   │   └── partitions
│   │       ├── 1
│   │       │   ├── raft-partition-1-1.log
│   │       │   ├── raft-partition-1.conf
│   │       │   └── raft-partition-1.meta
│   │       ├── 2
│   │       │   ├── raft-partition-2-1.log
│   │       │   ├── raft-partition-2.conf
│   │       │   └── raft-partition-2.meta
│   │       └── 3
│   │           ├── raft-partition-3-1.log
│   │           ├── raft-partition-3.conf
│   │           └── raft-partition-3.meta
│   └── system
│       └── partitions
│           └── 1
│               ├── system-partition-1-1.log
│               ├── system-partition-1.conf
│               └── system-partition-1.meta
└── member3
    ├── logstream
    ├── raft
    │   └── partitions
    │       ├── 1
    │       │   ├── raft-partition-1-1.log
    │       │   ├── raft-partition-1.conf
    │       │   └── raft-partition-1.meta
    │       ├── 2
    │       │   ├── raft-partition-2-298.snapshot
    │       │   ├── raft-partition-2-3.log
    │       │   ├── raft-partition-2.conf
    │       │   └── raft-partition-2.meta
    │       └── 3
    │           ├── raft-partition-3-1.log
    │           ├── raft-partition-3.conf
    │           └── raft-partition-3.meta
    └── system
        └── partitions
            └── 1
                ├── system-partition-1-1.log
                ├── system-partition-1.conf
                └── system-partition-1.meta

Compaction is done. Oldest logs are deleted and new same snapshots are written on nodes.
I assume leader sends follower snapshot. Because on member 3 (follower of partition 2) this is printed:

11:19:02.785 [raft-server-raft-partition-2-state] INFO  de.zell.logstream.DefaultDistributedLogstreamService - #restore(BackupInput): current index 298

On the leader only the backup log statement is printed.

If member 2 is now restarted, it gots a lot of exceptions:

11:21:17.887 [raft-server-raft-partition-2] ERROR io.atomix.utils.concurrent.SingleThreadContext - An uncaught exception occurred
io.atomix.storage.StorageException: java.nio.channels.ClosedChannelException
	at io.atomix.storage.journal.FileChannelJournalSegmentWriter.flush(FileChannelJournalSegmentWriter.java:328) ~[atomix-storage-3.1.2.jar:?]
	at io.atomix.storage.journal.FileChannelJournalSegmentWriter.close(FileChannelJournalSegmentWriter.java:334) ~[atomix-storage-3.1.2.jar:?]
	at io.atomix.storage.journal.MappableJournalSegmentWriter.close(MappableJournalSegmentWriter.java:160) ~[atomix-storage-3.1.2.jar:?]
	at io.atomix.storage.journal.JournalSegment.close(JournalSegment.java:255) ~[atomix-storage-3.1.2.jar:?]
	at io.atomix.storage.journal.SegmentedJournal.resetSegments(SegmentedJournal.java:314) ~[atomix-storage-3.1.2.jar:?]
	at io.atomix.storage.journal.SegmentedJournalWriter.reset(SegmentedJournalWriter.java:55) ~[atomix-storage-3.1.2.jar:?]
	at io.atomix.storage.journal.DelegatingJournalWriter.reset(DelegatingJournalWriter.java:62) ~[atomix-storage-3.1.2.jar:?]
	at io.atomix.protocols.raft.roles.PassiveRole.appendEntries(PassiveRole.java:197) ~[atomix-raft-3.1.2.jar:?]
	at io.atomix.protocols.raft.roles.PassiveRole.handleAppend(PassiveRole.java:99) ~[atomix-raft-3.1.2.jar:?]
	at io.atomix.protocols.raft.roles.ActiveRole.onAppend(ActiveRole.java:53) ~[atomix-raft-3.1.2.jar:?]
	at io.atomix.protocols.raft.roles.FollowerRole.onAppend(FollowerRole.java:208) ~[atomix-raft-3.1.2.jar:?]
	at io.atomix.protocols.raft.impl.RaftContext.lambda$null$24(RaftContext.java:692) ~[atomix-raft-3.1.2.jar:?]
	at io.atomix.protocols.raft.impl.RaftContext.lambda$runOnContext$35(RaftContext.java:714) ~[atomix-raft-3.1.2.jar:?]
	at io.atomix.utils.concurrent.SingleThreadContext$1.lambda$execute$0(SingleThreadContext.java:53) ~[atomix-utils-3.1.2.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_192]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_192]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_192]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_192]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_192]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_192]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_192]
Caused by: java.nio.channels.ClosedChannelException
	at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110) ~[?:1.8.0_192]
	at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:379) ~[?:1.8.0_192]
	at io.atomix.storage.journal.FileChannelJournalSegmentWriter.flush(FileChannelJournalSegmentWriter.java:326) ~[atomix-storage-3.1.2.jar:?]
	... 20 more

It seems that the snapshot was replicated correctly, but the appending on the log fails.

.
├── member1
│   ├── logstream
│   ├── raft
│   │   └── partitions
│   │       ├── 1
│   │       │   ├── raft-partition-1-1.log
│   │       │   ├── raft-partition-1.conf
│   │       │   └── raft-partition-1.meta
│   │       ├── 2
│   │       │   ├── raft-partition-2-298.snapshot
│   │       │   ├── raft-partition-2-3.log
│   │       │   ├── raft-partition-2.conf
│   │       │   └── raft-partition-2.meta
│   │       └── 3
│   │           ├── raft-partition-3-1.log
│   │           ├── raft-partition-3.conf
│   │           └── raft-partition-3.meta
│   └── system
│       └── partitions
│           └── 1
│               ├── system-partition-1-1.log
│               ├── system-partition-1.conf
│               └── system-partition-1.meta
├── member2
│   ├── logstream
│   ├── raft
│   │   └── partitions
│   │       ├── 1
│   │       │   ├── raft-partition-1-1.log
│   │       │   ├── raft-partition-1.conf
│   │       │   └── raft-partition-1.meta
│   │       ├── 2
│   │       │   ├── raft-partition-2-1.log
│   │       │   ├── raft-partition-2-298.snapshot
│   │       │   ├── raft-partition-2.conf
│   │       │   └── raft-partition-2.meta
│   │       └── 3
│   │           ├── raft-partition-3-1.log
│   │           ├── raft-partition-3.conf
│   │           └── raft-partition-3.meta
│   └── system
│       └── partitions
│           └── 1
│               ├── system-partition-1-1.log
│               ├── system-partition-1.conf
│               └── system-partition-1.meta
└── member3
    ├── logstream
    ├── raft
    │   └── partitions
    │       ├── 1
    │       │   ├── raft-partition-1-1.log
    │       │   ├── raft-partition-1.conf
    │       │   └── raft-partition-1.meta
    │       ├── 2
    │       │   ├── raft-partition-2-298.snapshot
    │       │   ├── raft-partition-2-3.log
    │       │   ├── raft-partition-2.conf
    │       │   └── raft-partition-2.meta
    │       └── 3
    │           ├── raft-partition-3-1.log
    │           ├── raft-partition-3.conf
    │           └── raft-partition-3.meta
    └── system
        └── partitions
            └── 1
                ├── system-partition-1-1.log
                ├── system-partition-1.conf
                └── system-partition-1.meta

Please checkout the branch bug-catch-up on my repository to produce this.

Environment

  • Atomix: 3.1.5
  • OS:
    Linux zell-arch 4.20.1-arch1-1-ARCH #1 SMP PREEMPT Wed Jan 9 20:25:43 UTC 2019 x86_64 GNU/Linux
  • JVM:
openjdk version "1.8.0_192"
OpenJDK Runtime Environment (build 1.8.0_192-b26)
OpenJDK 64-Bit Server VM (build 25.192-b26, mixed mode)
@Zelldon
Copy link
Contributor Author

Zelldon commented Jan 18, 2019

The problem is indepent of the used primitive (reproduced this also with DistributedMap). The key point is that the two nodes wrote lot of events and compacted there log, while the thrid node is unavailable.

My assumption is, that the thrid node wants to append the received events from the leader, but can't find the corresponding file (for example raft-partition-2-3.log) because it does not exist in his partition.

@Zelldon
Copy link
Contributor Author

Zelldon commented Jan 23, 2019

I'm able to reproduce this behavior with the following unit test in the RaftTest

  @Test
  public void shouldCatchup() throws Throwable
  {
    // given 
    createServers(3);

    servers.get(0).shutdown();
    RaftClient client = createClient();
    TestPrimitive primitive = createPrimitive(client);

    final int entries = 10;
    final int entrySize = 1024;
    final String entry = RandomStringUtils.random(entrySize);
    for (int i = 0; i < entries; i++)
    {
      primitive.write(entry)
               .get(1_000, TimeUnit.MILLISECONDS);
    }

    // when
    CompletableFuture
        .allOf(servers.get(1).compact(),
               servers.get(2).compact())
        .get(15_000, TimeUnit.MILLISECONDS);

    // then
    final RaftServer server = createServer(members.get(0).memberId());
    List<MemberId> members =
        this.members
            .stream()
            .map(RaftMember::memberId)
            .collect(Collectors.toList());
    server
        .join(members).get(10_000, TimeUnit.MILLISECONDS);
  }

Zelldon added a commit to Zelldon/atomix that referenced this issue Jan 23, 2019
Zelldon added a commit to Zelldon/atomix that referenced this issue Jan 25, 2019
Zelldon added a commit to Zelldon/atomix that referenced this issue Jan 26, 2019
odl-github pushed a commit to opendaylight/releng-autorelease that referenced this issue Apr 6, 2020
* Update controller from branch 'stable/magnesium'
  to 31dd70488c7396e91abc51f9a9d594eaaac7f849
  - Bump atomix to 3.1.7
    
    No changelogs, but the following issue is fixed:
    atomix/atomix#978
    
    Change-Id: Ib9248d7b9cc5e50d1789778b18650f5b7804e802
    Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
    
  - Use JavaConverters to eliminate Scala compat workarounds
    
    We are forced to use two workarounds to deal with Scala class
    hierarchy. Use JavaConverters to convert to the appropriate
    implementation, which does what we need it to do.
    
    Change-Id: Ib66b73f8d67dd1084328c8b2bc6f2f29a199a3ba
    Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
odl-github pushed a commit to opendaylight/controller that referenced this issue Apr 6, 2020
No changelogs, but the following issue is fixed:
atomix/atomix#978

Change-Id: Ib9248d7b9cc5e50d1789778b18650f5b7804e802
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
@kuujo kuujo added archived Archived issues from the legacy Java implementation of Atomix legacy Issues from the legacy Java implementation of Atomix labels Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
archived Archived issues from the legacy Java implementation of Atomix legacy Issues from the legacy Java implementation of Atomix
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants