IGNITE-23019 Research before IGNITE-23010 #4477

JAkutenshi · 2024-09-30T09:55:42Z

The goal

The goal of this PR is to fix tests that are failed due to getLeader flag removing.

The reason

For now org.apache.ignite.internal.raft.RaftGroupServiceImpl#start consume getLeader flag that leads to:

Awaiting for refreshLeader inside.
CompletableFuture as a return type.

But this refreshLeader call is not nesseccary because in all raft-client's public methods refreshLeader triggers inside. Besides, when we're starting a cluster a numerous of nodes and its replication groups are starting and awaiting raft leader election while replication groups are started. This awaition doesn't need for starting, but it is slowing down the starting process.

Thus, there are reasons for such flag removal:

Performance reason: replication gropus startup time reducing.
Code simplification:
- less async chaining on CompletableFuture result;
- we shouldn't collect and then handling raft-clients starting futures on replication groups stoppings.

The solution

As a temporal solution before IGNITE-23010 is to set getLeader flag as false inside org.apache.ignite.internal.raft.RaftGroupServiceImpl#start. Then, several tests are failed. Below is a list of the failed tests that are fixed in this PR.

`ItLearnersTest#testAddLearners`

Reasons

We're waiting after raft-client's start that the leader will be the follower node, but as a result we got a null value.

Solution

Instead of getting the leader directly, let's do refreshLeader before:

  assertThat(
          service1.thenCompose(service -> service.refreshLeader()
                  .thenApply(v -> service.leader())),
          willBe(follower.asPeer())
  );

`ItLearnersTest#testLearnersOnTheSameNodeAsPeers`

Reasons

Two raft-clients are started and they're related to 2 nodes: one for the peer and another for the learner nodes respectively. The test checks that peer's client knows that it is related with the leader, but not learner. On the other side the learner's client checks that it has the peer as leader but not the learner. But the test fails because on each check we will get null value as a leader.

Solution

As the previous test case we will do refresh leader before each check (peer & learner cases) begins. Follow check has already refreshed leader.

        assertThat(peerService.thenCompose(
                service -> service.refreshLeader()
                    .thenApply(v -> service.leader())),
                willBe(peer)
        );
        assertThat(
                // the leader is already refreshed
                peerService.thenApply(RaftGroupService::leader),
                willBe(not(learner))
        );

        assertThat(learnerService.thenCompose(
                        service -> service.refreshLeader()
                                .thenApply(v -> service.leader())),
                willBe(peer)
        );
        assertThat(
                // the leader is already refreshed
                learnerService.thenApply(RaftGroupService::leader),
                willBe(not(learner))
        );

`ItRaftGroupServiceTest#testTransferLeadership`

Reasons

We will get an NPE in the test's beginning while we're trying to get the first node's leader. But there is no leader yet and then we will write null to leader variable and will trigger NPE when will try to call consistentId from the null-value leader.

Solution

Let's refresh leader before it's usage:

  Peer leader = nodes.get(0).raftGroupService
          .thenCompose(service -> service.refreshLeader().thenApply(v -> service.leader()))
          .join();

`ItLozaTest#testRaftServiceUsingSharedExecutor`

Reasons

Raft-client's start isn't the primary scope for the test, but there is a mock calls counter check of invoke inside of sendWithRetry, but the method won't be called because there are no more refreshLeader calls that are triggering sendWithRetry.

Solution

Let's add refreshLeader after the client's start with successful completion check then:

  RaftGroupService client = startClient(
          new TestReplicationGroupId(Integer.toString(i)),
          spyService.topologyService().localMember(),
          partitionsConfigurer
  );

  assertThat(client.refreshLeader(), willCompleteSuccessfully());

`ItDisasterRecoverySystemViewTest`

There are 2 tests that are failed, but with a common reason:

testGlobalPartitionStatesSystemView
testLocalPartitionStatesSystemView

Reasons

After a leader acquiring an log replication some partitions have LocalPartitionStateEnum#INITIALIZING state instead of HEALTHY (see LocalPartitionStateEnumWithLogIndex#of), that leads to UNAVAILABLE state in a system table for the corresponding partition instead of AVAILABLE (see DisasterRecoveryManager:734). That leads for a check fail on status column.

Solution

The observation actually is correct, the issue is in test's expectations there: test wants AVAILABLE and it will observe it, but the test isn't waiting for it now, it rely on a soon refresh leader completion. The solution there is just to add refresh leader on each partition with successful completion waiting. Then, the test will obtain the desired state:

private static void waitLeaderOnAllPartitions(String tableName, int partitionsCount) {
        IgniteImpl node = ((RestartProofIgnite) CLUSTER.node(0)).unwrap(IgniteImpl.class);

        TableImpl table = ((PublicApiThreadingTable) node.tables().table(tableName)).unwrap(TableImpl.class);

        int tableId = table.tableId();

        IntStream.range(0, partitionsCount).forEach(partId -> assertThat(
                node.replicaManager()
                        .replica(new TablePartitionId(tableId, partId))
                        .thenCompose(replica -> replica.raftClient().refreshLeader()),
                willCompleteSuccessfully()
        ));
    }

`ActiveActorTest#testChangeLeaderWhenActualLeft`

Reasons

There are 2 issues that aren't related to getLeader directly:

For a reason "localhost" check will fail (I got different 192.168.*.* addresses instead that wasn't resolved as localhost), but there no actual need to check hosts. Moreover without fix we will step to the block just because the host issue, but the leader is already changed.
Then, when leader was changed, we can't get it from leaderRef -- it isn't available there already and won't be updated.

Solution

The first should be fixed with port check only. The second one is fixed with leaderRefNoInitialNotify instead of leaderRef.

`RaftGroupServiceTest`

Reasons

There are unit tests that actively use getLeader flag explicitly for different tests invariant. It leads to invariants failure.

Solutions

The =getLeader= flag for raft clients creation method is removed, but new method startRaftGroupServiceWithRefreshLeader is introduced. The method creates a raft client with mock on leader requests and then it calls refreshLeader. So, in tests that are checking leader we will use the last one, and the original startRaftGroupService without the flag otherwise.

`ItTableRaftSnapshotsTest`

The most complicated test class, but in general there 2 major issues. They were happened in the follow tests list, but potentially every test in the class could be failed:

nodeCanInstallSnapshotsAfterSnapshotInstalledToIt
testChangeLeaderOnInstallSnapshotInMiddle
entriesKeepAppendedAfterSnapshotInstallation
snapshotInstallationRepeatsOnTimeout

AssertionError: Unexpected leadership change on group: 10_part_0

Reasons

This problem emerges in doSnapshotOn processing with the follow scenario:

We do transfer leadership to a node e.g. 0.
Then we want to do a snapshot on the leader node 0.
Before snapshoting is beginning we're checking that the leader wasn't changed and the assertion fails.

It happens because in parallel there is primary replica acceptance is in process. The situation happens:

We have nodes [0, 1, 2]
Transfer leadership on 0
LeaseGrantMessage goes to 1 with force flag (0 nor 1 doesn't know actual RAFT-leader yet, so we can't collocate lease and the leader, that's why node 1 will be forsed chosen).
0 is a leader
LeaseGrantMessage with force == true triggers transfer leadership to the leaseholder -- node 1.
Assert in doSnapshotOn fails because the leader was changed due to force lease acceptance on node 1.

Solution

The described scenario is legal for Ignite, but for the tests scenario it interferes with desired invariant checks. Thus, the simple and correct solution is just to wait while lease will be accepted and then doing the transfer leadership with waitForPrimaryReplica that was introduced. It should be placed after cluster initialization for test work in each case.

`PrimaryReplicaMissException` with `currentEnlistmentConsistencyToken == 1`

Reasons

In failed tests (fail rate for nodeCanInstallSnapshotsAfterSnapshotInstalledToIt is ~25%) we can see the picture:

Starting nodes [0, 1, 2]
Lease is granted for 1
RAFT leader is transfered for 0
Shutdown 2
Put a row into a table and making a snapshot
Restart node 2
Transfer leadership to 2
Stopping node 0
Trying to put a new row again on node 2 -> failing with the titled exception.

The reason is that internally we have leaseStartTime that is currentEnlistmentConsistencyToken in UpdateCommandResponse. This request-response pair is a raft command. When we're trying to put a value, it's going to node 1 as a primary replica, it creates the command and send to the leader (after p.7 it is node 2). But the leader see that internally it still has uninitialized leaseStartTime == 1 that doesn't match with unxi-time-like token from the message from node 1. That's the reason why 0th sends the response with token mismatch flag turned on and its currentEnlistmentConsistencyToken == 1. The last question is why it is uninitialized: because some meta information (as leaseStartTime) that is stored in raft log wasn't restored from a snapshot installation on node 2 startup process. There is the corresponding issue: IGNITE-23379.

Solution

Due to all tests are checking snapshots are installed (e.g. with snapshotInstalledLatch) -- they're all may be affected before IGNITE-23379 will be fixed. So, this test class is disabled for a while.

JVM crash on `ItPlacementDriverReplicaSideTest`

Reasons

The main reason is an unhandled exception from raftClient.refreshLeader().get() that is actually TimeoutException on leader election process. We have retryTimeout=2000, but leader election may took up a little more time. Then, uncaught exception is thrown and we doesn't stop replicas that leads to closed rocksDB entity access attempt, that crash JVM.

Solution

There are 2 actions:

Increase the timer up to 4000 ms.
Move nodes set to @BeforeEach's ends and stopReplicationGroup(GROUP_ID, grpNodes) to @AfterEach's beginning.

…n in case if a corresponding raft-node hasn't a leader and didn't apply any log actions yet

…mViewTest passing because tests are waiting for AVAILABLE state of partitions

modules/raft/src/main/java/org/apache/ignite/internal/raft/RaftGroupServiceImpl.java

...raft/src/main/java/org/apache/ignite/internal/raft/storage/impl/RocksDbSharedLogStorage.java

...grationTest/java/org/apache/ignite/internal/replicator/ItPlacementDriverReplicaSideTest.java

sanpwc · 2024-10-11T09:06:40Z

...tures/java/org/apache/ignite/internal/raft/client/AbstractTopologyAwareGroupServiceTest.java

        assertNotNull(raftClient);

-        raftClient.refreshLeader().get();
+        raftClient.refreshLeader().get(WAIT_TIMEOUT_MILLIS, TimeUnit.MILLISECONDS);


Why do we need external timeout here? I mean that refreshLeader internally has time bounded limitations:

if (currentTimeMillis() >= stopTime) { fut.completeExceptionally( new TimeoutException(format("Send with retry timed out [retryCount = {}, groupId = {}].", retryCount, groupId))); return; }

?

Didn't notice internal limitation, just was triggered with get() without a timeout. Removed.

Please check other places, e.g. on line 237.

...raft/src/main/java/org/apache/ignite/internal/raft/storage/impl/RocksDbSharedLogStorage.java

JAkutenshi added 11 commits September 26, 2024 18:41

IGNITE-23109 experiment with getLEader == false

c6d60e9

IGNITE-23109 tests are fixed mostly

1646fe8

Merge branch 'refs/heads/main' into ignite-23019

7001e92

IGNITE-23019 Experimental removal of INITIALIZING state of a partitio…

b8dfc2d

…n in case if a corresponding raft-node hasn't a leader and didn't apply any log actions yet

IGNITE-23019 Fixing leader waiting for proper ItDisasterRecoverySyste…

a519149

…mViewTest passing because tests are waiting for AVAILABLE state of partitions

IGNITE-23019 Fixing testChangeLeaderWhenActualLeft

31ae3f3

IGNITE-23019 Fixing "Unexpected leadership change on group: 10_part_0"

ea8cf3d

IGNITE-23019 Disabling ItTableRaftSnapshotsTest

5cff600

Merge branch 'refs/heads/main' into ignite-23019

d91ef67

IGNITE-23019 Adding a comment for waitLeaderOnAllPartitions

bd97079

Merge branch 'refs/heads/main' into ignite-23019

8c7813c

JAkutenshi marked this pull request as ready for review October 8, 2024 14:29

JAkutenshi added 2 commits October 10, 2024 15:43

IGNITE-23019 JVM crash fix

c173355

Merge branch 'refs/heads/main' into ignite-23019

e638ce9

sanpwc reviewed Oct 11, 2024

View reviewed changes

JAkutenshi added 2 commits October 11, 2024 15:59

IGNITE-23019 review fix

68757e4

Merge branch 'refs/heads/main' into ignite-23019

c32b39d

sanpwc reviewed Oct 11, 2024

View reviewed changes

...raft/src/main/java/org/apache/ignite/internal/raft/storage/impl/RocksDbSharedLogStorage.java Outdated Show resolved Hide resolved

JAkutenshi added 2 commits October 11, 2024 16:03

IGNITE-23010 Returning dot to a home

f84decf

IGNITE-23010 Excessive timeouts are removed

036400d

sanpwc approved these changes Oct 11, 2024

View reviewed changes

sanpwc merged commit 89094b9 into apache:main Oct 11, 2024

sanpwc deleted the ignite-23019 branch October 11, 2024 11:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IGNITE-23019 Research before IGNITE-23010 #4477

IGNITE-23019 Research before IGNITE-23010 #4477

Uh oh!

JAkutenshi commented Sep 30, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sanpwc Oct 11, 2024

Uh oh!

JAkutenshi Oct 11, 2024 •

edited

Loading

Uh oh!

sanpwc Oct 11, 2024

Uh oh!

JAkutenshi Oct 11, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

IGNITE-23019 Research before IGNITE-23010 #4477

IGNITE-23019 Research before IGNITE-23010 #4477

Uh oh!

Conversation

JAkutenshi commented Sep 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The goal

The reason

The solution

ItLearnersTest#testAddLearners

Reasons

Solution

ItLearnersTest#testLearnersOnTheSameNodeAsPeers

Reasons

Solution

ItRaftGroupServiceTest#testTransferLeadership

Reasons

Solution

ItLozaTest#testRaftServiceUsingSharedExecutor

Reasons

Solution

ItDisasterRecoverySystemViewTest

Reasons

Solution

ActiveActorTest#testChangeLeaderWhenActualLeft

Reasons

Solution

RaftGroupServiceTest

Reasons

Solutions

ItTableRaftSnapshotsTest

AssertionError: Unexpected leadership change on group: 10_part_0

Reasons

Solution

PrimaryReplicaMissException with currentEnlistmentConsistencyToken == 1

Reasons

Solution

JVM crash on ItPlacementDriverReplicaSideTest

Reasons

Solution

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sanpwc Oct 11, 2024

Choose a reason for hiding this comment

Uh oh!

JAkutenshi Oct 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanpwc Oct 11, 2024

Choose a reason for hiding this comment

Uh oh!

JAkutenshi Oct 11, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JAkutenshi commented Sep 30, 2024 •

edited

Loading

`ItLearnersTest#testAddLearners`

`ItLearnersTest#testLearnersOnTheSameNodeAsPeers`

`ItRaftGroupServiceTest#testTransferLeadership`

`ItLozaTest#testRaftServiceUsingSharedExecutor`

`ItDisasterRecoverySystemViewTest`

`ActiveActorTest#testChangeLeaderWhenActualLeft`

`RaftGroupServiceTest`

`ItTableRaftSnapshotsTest`

`PrimaryReplicaMissException` with `currentEnlistmentConsistencyToken == 1`

JVM crash on `ItPlacementDriverReplicaSideTest`

JAkutenshi Oct 11, 2024 •

edited

Loading