Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Quick Benchmark Results #205

Open
Steamgjk opened this issue May 5, 2021 · 6 comments
Open

Question about Quick Benchmark Results #205

Steamgjk opened this issue May 5, 2021 · 6 comments

Comments

@Steamgjk
Copy link

Steamgjk commented May 5, 2021

Hi, I am a bit curious about the latency result in https://github.com/eBay/NuRaft/blob/master/docs/bench_results.md

The network RTT is about 180 micro seconds. Raft needs two RTTs for one request to be committed (Client->Leader->Follower->Leader->Client). In that way, the median latency should be much larger than 180 micro seconds, but why are they almost the same (187 micro seconds)?

@greensky00
Copy link
Contributor

greensky00 commented May 5, 2021

Hi @Steamgjk

As you can find in the benchmark program, client and leader are running in the same process (more precisely, there is no separate client but the benchmark program itself is both client and server, and directly invokes the Raft API), to measure the pure Raft performance:

ptr<raft_result> ret =
args->stuff_.raft_instance_->append_entries( {msg} );

Hence, there is no network cost between client and leader, and each replication can be done within a single RTT.

@Steamgjk
Copy link
Author

Steamgjk commented May 10, 2021

Hi, @greensky00
I am wondering whether you have some bench results with >3 replicas, because according to my perf test,
with 3 replicas, the throughput can reach ~33K ops/second
but when it comes to 9 replicas, the throughput drops to ~20K ops/second
I am wondering whether this is a normal result, or I miconfigured something.

@greensky00
Copy link
Contributor

Hi @Steamgjk
We don't publish the result with more replicas, but that is expected behavior as the amount of data that the leader should transfer over the network is proportional to the number of followers.

@Steamgjk
Copy link
Author

Hi, @greensky00 . Do you think it is bounded by network bandwidth or something else? You know, I am using n1-standard-32 VM as replias, the bandwidth is 32^4 Gbps. I feel that is quite large and should be inefficient. So I think it may be more reasonable to attribute the bottleneck to CPU, because replicas need to serialize/deserialize/process more messages when we have more replicas. What do you think is a convincing explanation, CPU or bandwidht, or something else?
https://cloud.google.com/compute/docs/machine-types

@greensky00
Copy link
Contributor

@Steamgjk
You can monitor the CPU usage of the leader during the test, and if it reaches 3200%, you can say it is CPU-bound. But I don't believe it uses that much CPU. Network bandwidth is also not likely the cause, as payload size (256 bytes) is small compared to the TCP window size (64KB by default). With simple math, the total throughput of the leader is 256 * 20K * 8 ~= 40MB only for 9 replicas.

Most likely the bottleneck comes from serialization. As I already mentioned in the other comment (#207 (comment)), it is not a random and independent data broadcasting. Replication should be strictly and globally ordered, which means log X can't be sent to a follower without sending logs up to X-1. Since there are more packets for "sending logs up to X-1" with more followers, those increased packets are likely waiting for more time in the network queue, and consequently, "sending log X" should wait for more time as well.

@Steamgjk
Copy link
Author

Steamgjk commented May 10, 2021

Most likely the bottleneck comes from serialization. As I already mentioned in the other comment (#207 (comment)), it is not a random and independent data broadcasting. Replication should be strictly and globally ordered, which means log X can't be sent to a follower without sending logs up to X-1. Since there are more packets for "sending logs up to X-1" with more followers, those increased packets are likely waiting for more time in the network queue, and consequently, "sending log X" should wait for more time as well.

I agree with it. Serialization/deserialization should be the bottleneck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants