Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example raft-kv-memstore hangs after printing change-membership #550

Closed
vishwaspai opened this issue Sep 9, 2022 · 25 comments · Fixed by #615
Closed

Example raft-kv-memstore hangs after printing change-membership #550

vishwaspai opened this issue Sep 9, 2022 · 25 comments · Fixed by #615
Labels
C-bug Category: bug

Comments

@vishwaspai
Copy link

Took a latest checkout of main (commit hash: 347aca1) and ran
raft-kv-memstore$ ./test-cluster.sh

But it hangs after reaching this place..

...
Changing membership from [1] to 3 nodes cluster: [1, 2, 3]

--- rpc(:21001/change-membership, [1, 2, 3])

I see that 3 process are running and first process is using 100% CPU and other 2 are almost idle. Could not access discord channel (for some reason it does not open) hence reporting it here. If required I can share any specific logs.
p.s: Example - raft-kv-rocksdb works fine.

@github-actions
Copy link

github-actions bot commented Sep 9, 2022

👋 Thanks for opening this issue!

Get help or engage by:

  • /help : to print help messages.
  • /assignme : to assign this issue to you.

@vishwaspai vishwaspai changed the title Example raft-kv-memstore hangs after pringing change-membership Example raft-kv-memstore hangs after printing change-membership Sep 9, 2022
@vishwaspai
Copy link
Author

Did some debugging. As soon as node-2 gets added (via add-learner), leader (node-1) starts sending raft-append rpc to node-2 aggressively. A this time cluster has only 1 leader and 1 learner. I think this overwhelms either node-1 or node-2 and things do not work after that.

If I change the config in this example (see below) to same as rocksdb, things work fine. I still see raft-append on node-2, but only few messages per second.

let config = Arc::new(Config {
    heartbeat_interval: 250,
    ..Default::default()
});

Why does it go wrong with default config of heartbeat_interval of 50 ?

@drmingdrmer
Copy link
Member

The logs will be helpful.
After running ./test-cluster.sh, there should be 3 logs in dir examples/raft-kv-memstore:
n1.log n2.log n3.log.

Attach these logs, please and let me look at what's happening.

Thanks for letting me know about this issue:D

@vishwaspai
Copy link
Author

Attached logs

raft-kv-memstore-logs.tar.gz

@drmingdrmer
Copy link
Member

Well... I did not see anything wrong in the logs, except a lot timeout:

  • Node-1 keeps failing to send append-entries RPC, you can find logs like handle_update_matched target=2 result=Err("deadline has elapsed") since 2022-09-10T13:50:19Z, in n1.log(the leader).
  • Node-2 received append-entries requests and dealt with them correctly, but it failed to respond before the 50ms timeout. You can find logs like append-entries request vote=vote:1-1 prev_log_id=1-1-63 entries=1-1-64:blank ... 1-1-175:blank leader_committed=1-1-138 in n2.log(a learner).

It looks just a simple timeout issue. Were you using a slow computer running this test? 🤔

And as far as I know, extending the heartbeat interval will just solve this issue, as you mentioned.:)

@drmingdrmer
Copy link
Member

Heartbeat interval affects the timeout: an append-entries RPC should not take longer than the interval between 2 heartbeats.

@vishwaspai
Copy link
Author

It looks just a simple timeout issue. Were you using a slow computer running this test?

No. I'm running on i7-12700H with 14 cores.

@drmingdrmer
Copy link
Member

I did not yet find what was going on from the log.

I updated the main branch to let examples output a more detailed log. May you rerun the test with the latest main branch ac48309? And let's see what's going on.

And what's your OS and the rust-toolchain?

And may you attach the examples/raft-kv-memstore/Cargo.lock that is built when running test-cluster.sh? So that I can check the dependency crate versions.

@vishwaspai
Copy link
Author

Added all the information in the attached tar.

raft-kv-memstore-logs-ac48309.tar.gz

@drmingdrmer drmingdrmer added the C-bug Category: bug label Sep 13, 2022
@drmingdrmer
Copy link
Member

drmingdrmer commented Sep 13, 2022

Do you have a proxy installed on your machine? 🤔

2022-09-13T06:27:43.448747Z DEBUG ThreadId(01) new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:replication{id=1 target=2}:main{vote=vote:1-1 target=2 cluster=foo}:line_rate_loop:send_append_entries: start sending append_entries, timeout: 50 payload=vote=vote:1-1, prev_log_id=1-1-82, leader_commit=1-1-83, entries=1-1-83:blank

2022-09-13T06:27:43.482663Z DEBUG ThreadId(01) new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:replication{id=1 target=2}:main{vote=vote:1-1 target=2 cluster=foo}:line_rate_loop:send_append_entries: starting new connection: http://127.0.0.1:21002/    

The first log is printed by openraft at:

tracing::debug!(
payload=%payload.summary(),
"start sending append_entries, timeout: {:?}",
self.config.heartbeat_interval
);
let the_timeout = Duration::from_millis(self.config.heartbeat_interval);
let res = timeout(the_timeout, self.network.send_append_entries(payload)).await;
let append_resp = match res {

The second log is printed by reqwest:
https://github.com/seanmonstar/reqwest/blob/8b37ae4b15f668e3e3f9b8cf43b547fc4f95e444/src/connect.rs#L560

You can see that every log with starting new connection... is delayed by about 35ms in n1.log with:

cat n1.log | grep 'starting new connection' -B1 | grep -v '^\-\-' | awk '{v=$1; gsub("2022-09-13T..:..:",   "",   v); gsub("Z",  "",  v); print (v-a)*1000 "ms "; a=v}'

Output:

...
1.318ms
35.363ms
1.182ms
34.744ms
1.313ms
35.149ms
1.329ms
35.211ms

reqwest::Connector tries every proxy. Can you help me check if there is a proxy on your machine that maybe delay every RPC message by 35ms?

@vishwaspai
Copy link
Author

Do you have a proxy installed on your machine?

No.

@drmingdrmer
Copy link
Member

It looks like it is not an issue with openraft, but an issue at the network layer.

I'm not an expert on reqwest and can not figure out what's possibly wrong. :(

Can you do a single-step debugging of it on your machine? Running this test does almost the same as test-cluster.sh. You can get started with it:
https://github.com/datafuselabs/openraft/blob/main/examples/raft-kv-memstore/tests/cluster/test_cluster.rs

Another concern is whether a normal curl will be delayed by 35ms on your machine.
To confirm that, try the latest main branch, I added timing to every curl in test-cluster.sh:
59cc9e8

@vishwaspai
Copy link
Author

For now, I ran with -w '\nConnect: %{time_connect}s\nXfer : %{time_starttransfer}s\nTotal : %{time_total}s\n' to curl (instead of time) and following is the result.

Based on the results I see that add-learner is taking time. Other curl requests fast. At least, I don't see this as networking issue. Anyways, I'll see if I can spend time on single stepping.

Initialize server 1 as a single-node cluster

--- rpc(:21001/init, {})  => 3ms
Connect: 0.000067s
Xfer   : 0.003601s
Total  : 0.003638s


Server 1 is a leader now
Get metrics from the leader

--- rpc(:21001/metrics, ) => 0.9ms
Connect: 0.000067s
Xfer   : 0.000851s
Total  : 0.000866s


Adding node 2 and node 3 as learners, to receive log from leader node 1

--- rpc(:21001/add-learner, [2, "127.0.0.1:21002"]) =>  157ms
Connect: 0.000203s
Xfer   : 0.157091s
Total  : 0.157123s


Node 2 added as learner

--- rpc(:21001/add-learner, [3, "127.0.0.1:21003"]) => 540ms
Xfer   : 0.539554s
Total  : 0.539581s


Node 3 added as learner
Get metrics from the leader, after adding 2 learners

--- rpc(:21001/metrics, ) => 0.7 ms
Connect: 0.000045s
Xfer   : 0.000678s
Total  : 0.000693s

Changing membership from [1] to 3 nodes cluster: [1, 2, 3]

--- rpc(:21001/change-membership, [1, 2, 3])
...

@drmingdrmer
Copy link
Member

Based on the results I see that add-learner is taking time

When adding a learner, the leader will block until the new learner catches up with the logs, thus it takes a longer time, which should be expected behavior.

@vishwaspai
Copy link
Author

Could not find much with additional debugging.
With heartbeat of above 100ms things work fine. Below this, I see that the raft-append rpc timesout. I tried some tweaking (number of workers etc) on actix_web, but I do not much change in behavior.

@vishwaspai
Copy link
Author

Can I do something else to help ? Is this problem specific to my environment ?

@drmingdrmer
Copy link
Member

Can I do something else to help ? Is this problem specific to my environment ?

The problem happens after calling send_append_entries() and before calling reqwest: https://github.com/seanmonstar/reqwest/blob/8b37ae4b15f668e3e3f9b8cf43b547fc4f95e444/src/connect.rs#L560

As I mentioned in a previous comment:
#550 (comment)

I still can not reproduce the 35ms delay on my laptop or in a CI.
Maybe you can find out what's going on in your machine.

Thank you in advance!

@DamnWidget
Copy link

I am experiencing the same issue, the test just hangs after showing the message of membership changing, there is 0ms of delay as I have no proxies or anything configured in this computer. The cluster_test.rs also hangs in the same point.

As @vishwaspai if I increase the heartebeat_timeout to some value higher than 100 it works (the test fails as the last read from node 3 returns an empty string instead of bar in the test but t works fine in the shell script). I have no clue what is going on but the 100 value barrier feels quite suspicious to me, will try to test in another environments.

Attached logs of the three nodes plus my Cargo.lock in case it can be of any help

raft-kv-memstore-logs-and-lock.tar.gz

@DamnWidget
Copy link

Definitely is something wrong with the environment. I ran it in another environment with zero issues, this are my details:

Legend:

  • Env1: Pop_OS! 22.04 LTS with Linux Kernel 6.0.6, AMD Ryzen 7 4800H CPU
  • Env2: Artix Linux with Linux Kernel 5.15.26-lts, Intel i7-6700K CPU

I will try in another computer with also Artix Linux kernel 6.0.x and AMD 3950X CPU in a few minutes

@DamnWidget
Copy link

Third test environment:

  • Artix Linux with Kernel 6.0.6 (zen flavor) and AMD 3950X CPU

No issues. Probably something with Ubuntu, will try to run on this last env with a live distro or VM if I find the time

@drmingdrmer
Copy link
Member

Thank you so much @DamnWidget !

@drmingdrmer
Copy link
Member

drmingdrmer commented Dec 9, 2022

It happened recently in a CI session:
https://github.com/datafuselabs/openraft/actions/runs/3649785637/jobs/6164911737

This issue affects memstore example, but the rocksdb example works fine.
I think the problem is with actix_web, which is used by memstore example, while rocksdb uses toy_rpc as its server.

Update:
My mistake. It's because the rocksdb example is configured with a greater heartbeat timeout.

@DamnWidget
Copy link

DamnWidget commented Dec 9, 2022

In all of my tests it happened if the hb was configured to any value lower than 101 milliseconds, it also only happened in Pop_OS! when I tried, unfortunately I didn't find the time to try with other Ubuntu or Debían based distributions, it definitely wasn't a problem in Artix

@drmingdrmer
Copy link
Member

drmingdrmer commented Dec 9, 2022

This problem is caused by an issue of cargo that affects crate openssl.

  1. In the example example/raft-key-value-memstore, it uses reqwest to send an
    RPC to raft service.
    And reqwest depends on native-tls, while native-tls depends on openssl.

  2. When calling openssl::SslConnector::builder():
    it will load certificate files if env variable SSL_CERT_FILE is set:
    https://github.com/sfackler/rust-openssl/blob/eaee383429d156bd91c4a188ba57cf1747c2e440/openssl/src/ssl/mod.rs#L889-L896

    strace shows the time spent on loading certificates:

    1670588693.413004 (+     0.000084) openat(AT_FDCWD, "/usr/lib/ssl/certs/ca-certificates.crt", O_RDONLY) = 3
    1670588693.413054 (+     0.000049) newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=190243, ...}, AT_EMPTY_PATH) = 0
    1670588693.413093 (+     0.000038) read(3, "-----BEGIN CERTIFICATE-----\nMIIH"..., 4096) = 4096
    1670588693.413790 (+     0.000697) read(3, "8B****************************vJw"..., 4096) = 4096
    ...
    1670588693.474642 (+     0.001588) read(3, "D\n*****************************T"..., 4096) = 4096
    1670588693.474848 (+     0.000205) brk(0x55e121d5d000) = 0x55e121d5d000
    1670588693.474888 (+     0.000039) brk(0x55e121d5c000) = 0x55e121d5c000
    1670588693.478875 (+     0.001058) read(3, "", 4096) = 0
    
  3. And on Linux(at least on ubuntu), command cargo run pollutes environment
    variables: It leaks SSL_CERT_FILE and SSL_CERT_DIR to the testing sub progress it runs.

Finally, every time a reqwest::Client is created, it spends several dozen
milliseconds on loading a bunch of certificates, which times out a raft RPC(50 ms).

I created a mini demo showing that on linux, cargo xxx will slow down openssl
based program:
https://github.com/drmingdrmer/test-native-tls-rs

On linux:

cargo run

SSL_CERT_DIR: /usr/lib/ssl/certs
SSL_CERT_FILE: /usr/lib/ssl/certs/ca-certificates.crt
...
=== : 66.382982ms

On my m1 mac:

cargo run

...
=== : 9.23432ms

Related issues:

drmingdrmer added a commit to drmingdrmer/openraft that referenced this issue Dec 9, 2022
On Linux: command `cargo run` pollutes environment variables: It leaks
`SSL_CERT_FILE` and `SSL_CERT_DIR` to the testing sub progress it runs.
Which cause `reqwest` spending ~50 ms loading the certificates for every
RPC.

We just extend the RPC timeout to work around.

- Fix: datafuselabs#550
@DamnWidget
Copy link

Nice catch, in Artix don't happens probably because it uses a more modern version of OpenSSL. Will try to check as soon as I can get into the computer

drmingdrmer added a commit to drmingdrmer/openraft that referenced this issue Dec 10, 2022
On Linux: command `cargo run` pollutes environment variables: It leaks
`SSL_CERT_FILE` and `SSL_CERT_DIR` to the testing sub progress it runs.
Which cause `reqwest` spending ~50 ms loading the certificates for every
RPC.

We just extend the RPC timeout to work around.

- Fix: datafuselabs#550
drmingdrmer added a commit to drmingdrmer/openraft that referenced this issue Dec 10, 2022
On Linux: command `cargo run` pollutes environment variables: It leaks
`SSL_CERT_FILE` and `SSL_CERT_DIR` to the testing sub progress it runs.
Which cause `reqwest` spending ~50 ms loading the certificates for every
RPC.

We just extend the RPC timeout to work around.

- Fix: datafuselabs#550
drmingdrmer added a commit to drmingdrmer/openraft that referenced this issue Dec 10, 2022
On Linux: command `cargo run` pollutes environment variables: It leaks
`SSL_CERT_FILE` and `SSL_CERT_DIR` to the testing sub progress it runs.
Which cause `reqwest` spending ~50 ms loading the certificates for every
RPC.

We just extend the RPC timeout to work around.

- Fix: datafuselabs#550
drmingdrmer added a commit to drmingdrmer/openraft that referenced this issue Dec 10, 2022
On Linux: command `cargo run` pollutes environment variables: It leaks
`SSL_CERT_FILE` and `SSL_CERT_DIR` to the testing sub progress it runs.
Which cause `reqwest` spending ~50 ms loading the certificates for every
RPC.

We just extend the RPC timeout to work around.

- Fix: datafuselabs#550
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants