Not seeing expected improvement in throughput of RaftCluster.ReplicateAsync method when cluster minority is inaccessible #233

LarsWithCA · 2024-04-19T06:27:42Z

This plots shows the timing (in ms) of RaftCluster.ReplicateAsync - at the vertical green line 1 node is disconnected (out of a cluster of 6 nodes in total):
(Linux ARM + .NET6 + DotNext.Net.Cluster 4.14.1)

From the change log of DotNext.Net.Cluster 4.15.0:

Raft performance: improved throughput of IRaftCluster.ReplicateAsync method when cluster minority is not accessible (faulty node). Now the leader waits for replication from majority of nodes instead of all nodes

This made us hope that we would no longer see these kinds of longer timings in case of inaccessible cluster minority. However, we see a pretty similar plot - at the green line 1 node is disconnected (out of a cluster of 6 nodes):
(Linux ARM + .NET8 + DotNext.Net.Cluster 5.3.0)

Did we have wrong expectations, or are we doing something wrong?

sakno · 2024-04-20T10:51:56Z

(in ms) of RaftCluster.ReplicateAsync

It is a measurement of latency, not throughput. By calling ForceReplicationAsync on every write, you choose better latency by the cost of low throughput. With ForceReplicateAsync you don't need to wait for the next heartbeat round, that's why latency improves. However, if you want to have good throughput, you need to accumulate as much uncommitted log entries as possible and wait for replication. It can be done without explicit call of ForceReplicationAsync. If you have just one writer, prefer latency over throughput.

LarsWithCA · 2024-04-22T10:42:26Z

Hi @sakno

I can see that we messed up the terms a bit 😊

From the wording in the changelog:

Now the leader waits for replication from majority of nodes instead of all nodes.

we were hoping to also see an improvement on latency (in case of inaccessible cluster minority). Are you saying that is not the case? Or are you saying we should just call ForceReplicationAsync instead?

We have a single writer in our system (and we do prefer latency over throughput).

sakno · 2024-04-23T12:11:54Z

we were hoping to also see an improvement on latency

Not exactly, it's improvement over throughput from client perspective. However, the underlying state machine commits changes earlier. A small recap:

ForceReplicationAsync resumes when all nodes replicated (or some of them detected as unavailable)
AppendAsync from WAL invokes before ForceReplication because only majority is needed to mark log entries as committed

sakno · 2024-04-23T12:13:40Z

If you want to improve latency in presence of unavailable nodes, you can use the following techniques:

Reduce connect timeout
Enable automatic failure detection so the leader can remove unhealthy nodes automatically from the list of cluster members

LarsWithCA · 2024-04-24T08:44:26Z

Reduce connect timeout

Our ConnectTimeout is only 10ms. Did you mean another timeout? We have 'RequestTimeout' set to 1 second.

sakno · 2024-04-24T12:19:50Z

The leader exposes broadcast-time counter in milliseconds. You can measure it with dotnet-counters. What's the average value of this counter with and without unavailable nodes?

LarsWithCA · 2024-04-29T08:07:36Z

Average of broadcast-time before/after disconnecting a node:

before: 5ms
after: 920ms

(Linux ARM + .NET8 + DotNext.Net.Cluster 5.4.0)

sakno · 2024-05-02T10:05:33Z

How many nodes are disconnected? 920 ms with 1 disconnected node?

LarsWithCA · 2024-05-02T10:30:56Z

A single disconnected node (out of a cluster of 6 in total).

sakno · 2024-05-02T10:55:21Z

Very suspicious, 920ms - 5ms doesn't give 10ms for connection timeout.

sakno · 2024-05-02T10:58:41Z

One more way to investigate the issue. There is response-time counter that shows response time for each node individually. Could you share this value for each node? Every counter has a tag indicating IP address of the node (you can replace IP addresses with anything else, it's needed just to distinguish values).

A group of metrics is DotNext.Net.Cluster.Consensus.Raft.Client, not DotNext.Net.Cluster.Consensus.Raft.Server.

LarsWithCA · 2024-05-02T12:06:33Z

Example before disconnecting 202:

[dotnext.raft.client.address=232;dotnext.raft.client.message=AppendEntries;Percentile=50] | 6.375
[dotnext.raft.client.address=202;dotnext.raft.client.message=AppendEntries;Percentile=50] | 1.296875
[dotnext.raft.client.address=149;dotnext.raft.client.message=AppendEntries;Percentile=50] | 10.375
[dotnext.raft.client.address=154;dotnext.raft.client.message=AppendEntries;Percentile=50] | 7.671875
[dotnext.raft.client.address=31;dotnext.raft.client.message=AppendEntries;Percentile=50] | 6.765625

Example after disconnecting 202:

[dotnext.raft.client.address=232;dotnext.raft.client.message=AppendEntries;Percentile=50] | 1.41796875
[dotnext.raft.client.address=202;dotnext.raft.client.message=AppendEntries;Percentile=50] | 996
[dotnext.raft.client.address=149;dotnext.raft.client.message=AppendEntries;Percentile=50] | 1.990234375
[dotnext.raft.client.address=154;dotnext.raft.client.message=AppendEntries;Percentile=50] | 2.3828125
[dotnext.raft.client.address=31;dotnext.raft.client.message=AppendEntries;Percentile=50] | 7.375

And approx 1,5 minutes later the message changes to InstallSnapshot for 202:

[dotnext.raft.client.address=232;dotnext.raft.client.message=AppendEntries;Percentile=50] | 7.3125
[dotnext.raft.client.address=202;dotnext.raft.client.message=InstallSnapshot;Percentile=50] | 1000
[dotnext.raft.client.address=149;dotnext.raft.client.message=AppendEntries;Percentile=50] | 4.171875
[dotnext.raft.client.address=154;dotnext.raft.client.message=AppendEntries;Percentile=50] | 4.484375
[dotnext.raft.client.address=31;dotnext.raft.client.message=AppendEntries;Percentile=50] | 4.6328125

sakno · 2024-05-02T12:31:42Z

~~It seems like ConnectTimeout is not equal to 10ms, it's equal to 1000ms (as for RequestTimeout). How do you set ConnectTimeout in the code?~~

Omg, I found a root cause. It's trivial, one-liner fix.

LarsWithCA · 2024-05-02T12:42:43Z

Awesome! :)

sakno · 2024-05-02T12:42:49Z

Could you check develop branch? It contains both fixes: incorrect usage of ConnectTimeout and accidental snapshot installation.

sakno · 2024-05-02T12:48:49Z

Also, the upcoming release introduces new WaitForLeadershipAsync method that waits for the local node to be elected as a leader of the cluster and returns leadership token. It is very convenient if your code relies on LeaderChanged event to determine, which node allows writes.

LarsWithCA · 2024-05-02T13:51:14Z

The issue seems to be fixed, i.e. the time of RaftCluster.ReplicateAsync immediately goes back to something very low (after disconnecting one node):

This is great, thanks a lot!
I'll keep my cluster running and get back to you tomorrow regarding the accidental snapshot installation.

LarsWithCA · 2024-05-03T06:21:35Z

@sakno the "snapshot installation" messages are also gone 👍

sakno · 2024-05-03T07:20:55Z

I'll prepare a new release today

sakno · 2024-05-05T09:55:48Z

Release 5.5.0 has been published.

sakno self-assigned this Apr 19, 2024

sakno added the enhancement New feature or request label Apr 19, 2024

sakno added a commit that referenced this issue May 2, 2024

Fixed #233

699de1d

LarsWithCA closed this as completed May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not seeing expected improvement in throughput of RaftCluster.ReplicateAsync method when cluster minority is inaccessible #233

Not seeing expected improvement in throughput of RaftCluster.ReplicateAsync method when cluster minority is inaccessible #233

LarsWithCA commented Apr 19, 2024

sakno commented Apr 20, 2024

LarsWithCA commented Apr 22, 2024

sakno commented Apr 23, 2024

sakno commented Apr 23, 2024

LarsWithCA commented Apr 24, 2024

sakno commented Apr 24, 2024

LarsWithCA commented Apr 29, 2024

sakno commented May 2, 2024

LarsWithCA commented May 2, 2024

sakno commented May 2, 2024 •

edited

Loading

sakno commented May 2, 2024 •

edited

Loading

LarsWithCA commented May 2, 2024 •

edited

Loading

sakno commented May 2, 2024 •

edited

Loading

LarsWithCA commented May 2, 2024

sakno commented May 2, 2024

sakno commented May 2, 2024

LarsWithCA commented May 2, 2024

LarsWithCA commented May 3, 2024

sakno commented May 3, 2024

sakno commented May 5, 2024

Not seeing expected improvement in throughput of RaftCluster.ReplicateAsync method when cluster minority is inaccessible #233

Not seeing expected improvement in throughput of RaftCluster.ReplicateAsync method when cluster minority is inaccessible #233

Comments

LarsWithCA commented Apr 19, 2024

sakno commented Apr 20, 2024

LarsWithCA commented Apr 22, 2024

sakno commented Apr 23, 2024

sakno commented Apr 23, 2024

LarsWithCA commented Apr 24, 2024

sakno commented Apr 24, 2024

LarsWithCA commented Apr 29, 2024

sakno commented May 2, 2024

LarsWithCA commented May 2, 2024

sakno commented May 2, 2024 • edited Loading

sakno commented May 2, 2024 • edited Loading

LarsWithCA commented May 2, 2024 • edited Loading

sakno commented May 2, 2024 • edited Loading

LarsWithCA commented May 2, 2024

sakno commented May 2, 2024

sakno commented May 2, 2024

LarsWithCA commented May 2, 2024

LarsWithCA commented May 3, 2024

sakno commented May 3, 2024

sakno commented May 5, 2024

sakno commented May 2, 2024 •

edited

Loading

sakno commented May 2, 2024 •

edited

Loading

LarsWithCA commented May 2, 2024 •

edited

Loading

sakno commented May 2, 2024 •

edited

Loading