Example raft-kv-memstore hangs after printing change-membership #550

vishwaspai · 2022-09-09T10:33:37Z

Took a latest checkout of main (commit hash: 347aca1) and ran
raft-kv-memstore$ ./test-cluster.sh

But it hangs after reaching this place..

...
Changing membership from [1] to 3 nodes cluster: [1, 2, 3]

--- rpc(:21001/change-membership, [1, 2, 3])

I see that 3 process are running and first process is using 100% CPU and other 2 are almost idle. Could not access discord channel (for some reason it does not open) hence reporting it here. If required I can share any specific logs.
p.s: Example - raft-kv-rocksdb works fine.

The text was updated successfully, but these errors were encountered:

github-actions · 2022-09-09T10:33:50Z

👋 Thanks for opening this issue!

Get help or engage by:

/help : to print help messages.
/assignme : to assign this issue to you.

vishwaspai · 2022-09-10T01:43:58Z

Did some debugging. As soon as node-2 gets added (via add-learner), leader (node-1) starts sending raft-append rpc to node-2 aggressively. A this time cluster has only 1 leader and 1 learner. I think this overwhelms either node-1 or node-2 and things do not work after that.

If I change the config in this example (see below) to same as rocksdb, things work fine. I still see raft-append on node-2, but only few messages per second.

let config = Arc::new(Config {
    heartbeat_interval: 250,
    ..Default::default()
});

Why does it go wrong with default config of heartbeat_interval of 50 ?

drmingdrmer · 2022-09-10T13:12:57Z

The logs will be helpful.
After running ./test-cluster.sh, there should be 3 logs in dir examples/raft-kv-memstore:
n1.log n2.log n3.log.

Attach these logs, please and let me look at what's happening.

Thanks for letting me know about this issue:D

vishwaspai · 2022-09-10T13:55:25Z

Attached logs

raft-kv-memstore-logs.tar.gz

drmingdrmer · 2022-09-12T03:41:01Z

Well... I did not see anything wrong in the logs, except a lot timeout:

Node-1 keeps failing to send append-entries RPC, you can find logs like handle_update_matched target=2 result=Err("deadline has elapsed") since 2022-09-10T13:50:19Z, in n1.log(the leader).
Node-2 received append-entries requests and dealt with them correctly, but it failed to respond before the 50ms timeout. You can find logs like append-entries request vote=vote:1-1 prev_log_id=1-1-63 entries=1-1-64:blank ... 1-1-175:blank leader_committed=1-1-138 in n2.log(a learner).

It looks just a simple timeout issue. Were you using a slow computer running this test? 🤔

And as far as I know, extending the heartbeat interval will just solve this issue, as you mentioned.:)

drmingdrmer · 2022-09-12T03:42:28Z

Heartbeat interval affects the timeout: an append-entries RPC should not take longer than the interval between 2 heartbeats.

vishwaspai · 2022-09-12T04:36:45Z

It looks just a simple timeout issue. Were you using a slow computer running this test?

No. I'm running on i7-12700H with 14 cores.

drmingdrmer · 2022-09-13T06:23:13Z

I did not yet find what was going on from the log.

I updated the main branch to let examples output a more detailed log. May you rerun the test with the latest main branch ac48309? And let's see what's going on.

And what's your OS and the rust-toolchain?

And may you attach the examples/raft-kv-memstore/Cargo.lock that is built when running test-cluster.sh? So that I can check the dependency crate versions.

vishwaspai · 2022-09-13T06:35:02Z

Added all the information in the attached tar.

raft-kv-memstore-logs-ac48309.tar.gz

drmingdrmer · 2022-09-13T10:07:18Z

Do you have a proxy installed on your machine? 🤔

2022-09-13T06:27:43.448747Z DEBUG ThreadId(01) new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:replication{id=1 target=2}:main{vote=vote:1-1 target=2 cluster=foo}:line_rate_loop:send_append_entries: start sending append_entries, timeout: 50 payload=vote=vote:1-1, prev_log_id=1-1-82, leader_commit=1-1-83, entries=1-1-83:blank

2022-09-13T06:27:43.482663Z DEBUG ThreadId(01) new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:replication{id=1 target=2}:main{vote=vote:1-1 target=2 cluster=foo}:line_rate_loop:send_append_entries: starting new connection: http://127.0.0.1:21002/

The first log is printed by openraft at:

openraft/openraft/src/replication/mod.rs

Lines 326 to 335 in feb9ef7

    
           tracing::debug!( 
        
               payload=%payload.summary(), 
        
               "start sending append_entries, timeout: {:?}", 
        
               self.config.heartbeat_interval 
        
           ); 
        
           let the_timeout = Duration::from_millis(self.config.heartbeat_interval); 
        
           let res = timeout(the_timeout, self.network.send_append_entries(payload)).await; 
        
           let append_resp = match res {

The second log is printed by reqwest:
https://github.com/seanmonstar/reqwest/blob/8b37ae4b15f668e3e3f9b8cf43b547fc4f95e444/src/connect.rs#L560

You can see that every log with starting new connection... is delayed by about 35ms in n1.log with:

cat n1.log | grep 'starting new connection' -B1 | grep -v '^\-\-' | awk '{v=$1; gsub("2022-09-13T..:..:",   "",   v); gsub("Z",  "",  v); print (v-a)*1000 "ms "; a=v}'

Output:

...
1.318ms
35.363ms
1.182ms
34.744ms
1.313ms
35.149ms
1.329ms
35.211ms

reqwest::Connector tries every proxy. Can you help me check if there is a proxy on your machine that maybe delay every RPC message by 35ms?

vishwaspai · 2022-09-13T10:25:15Z

Do you have a proxy installed on your machine?

No.

drmingdrmer · 2022-09-13T14:40:36Z

It looks like it is not an issue with openraft, but an issue at the network layer.

I'm not an expert on reqwest and can not figure out what's possibly wrong. :(

Can you do a single-step debugging of it on your machine? Running this test does almost the same as test-cluster.sh. You can get started with it:
https://github.com/datafuselabs/openraft/blob/main/examples/raft-kv-memstore/tests/cluster/test_cluster.rs

Another concern is whether a normal curl will be delayed by 35ms on your machine.
To confirm that, try the latest main branch, I added timing to every curl in test-cluster.sh:
59cc9e8

vishwaspai · 2022-09-14T01:23:03Z

For now, I ran with -w '\nConnect: %{time_connect}s\nXfer : %{time_starttransfer}s\nTotal : %{time_total}s\n' to curl (instead of time) and following is the result.

Based on the results I see that add-learner is taking time. Other curl requests fast. At least, I don't see this as networking issue. Anyways, I'll see if I can spend time on single stepping.

Initialize server 1 as a single-node cluster

--- rpc(:21001/init, {})  => 3ms
Connect: 0.000067s
Xfer   : 0.003601s
Total  : 0.003638s


Server 1 is a leader now
Get metrics from the leader

--- rpc(:21001/metrics, ) => 0.9ms
Connect: 0.000067s
Xfer   : 0.000851s
Total  : 0.000866s


Adding node 2 and node 3 as learners, to receive log from leader node 1

--- rpc(:21001/add-learner, [2, "127.0.0.1:21002"]) =>  157ms
Connect: 0.000203s
Xfer   : 0.157091s
Total  : 0.157123s


Node 2 added as learner

--- rpc(:21001/add-learner, [3, "127.0.0.1:21003"]) => 540ms
Xfer   : 0.539554s
Total  : 0.539581s


Node 3 added as learner
Get metrics from the leader, after adding 2 learners

--- rpc(:21001/metrics, ) => 0.7 ms
Connect: 0.000045s
Xfer   : 0.000678s
Total  : 0.000693s

Changing membership from [1] to 3 nodes cluster: [1, 2, 3]

--- rpc(:21001/change-membership, [1, 2, 3])
...

drmingdrmer · 2022-09-14T01:29:33Z

Based on the results I see that add-learner is taking time

When adding a learner, the leader will block until the new learner catches up with the logs, thus it takes a longer time, which should be expected behavior.

vishwaspai · 2022-09-15T14:53:21Z

Could not find much with additional debugging.
With heartbeat of above 100ms things work fine. Below this, I see that the raft-append rpc timesout. I tried some tweaking (number of workers etc) on actix_web, but I do not much change in behavior.

vishwaspai · 2022-09-20T09:55:39Z

Can I do something else to help ? Is this problem specific to my environment ?

drmingdrmer · 2022-09-21T01:02:58Z

Can I do something else to help ? Is this problem specific to my environment ?

The problem happens after calling send_append_entries() and before calling reqwest: https://github.com/seanmonstar/reqwest/blob/8b37ae4b15f668e3e3f9b8cf43b547fc4f95e444/src/connect.rs#L560

As I mentioned in a previous comment:
#550 (comment)

I still can not reproduce the 35ms delay on my laptop or in a CI.
Maybe you can find out what's going on in your machine.

Thank you in advance!

DamnWidget · 2022-11-21T12:18:10Z

I am experiencing the same issue, the test just hangs after showing the message of membership changing, there is 0ms of delay as I have no proxies or anything configured in this computer. The cluster_test.rs also hangs in the same point.

As @vishwaspai if I increase the heartebeat_timeout to some value higher than 100 it works (the test fails as the last read from node 3 returns an empty string instead of bar in the test but t works fine in the shell script). I have no clue what is going on but the 100 value barrier feels quite suspicious to me, will try to test in another environments.

Attached logs of the three nodes plus my Cargo.lock in case it can be of any help

raft-kv-memstore-logs-and-lock.tar.gz

DamnWidget · 2022-11-21T12:30:49Z

Definitely is something wrong with the environment. I ran it in another environment with zero issues, this are my details:

Legend:

Env1: Pop_OS! 22.04 LTS with Linux Kernel 6.0.6, AMD Ryzen 7 4800H CPU
Env2: Artix Linux with Linux Kernel 5.15.26-lts, Intel i7-6700K CPU

I will try in another computer with also Artix Linux kernel 6.0.x and AMD 3950X CPU in a few minutes

DamnWidget · 2022-11-21T12:46:42Z

Third test environment:

Artix Linux with Kernel 6.0.6 (zen flavor) and AMD 3950X CPU

No issues. Probably something with Ubuntu, will try to run on this last env with a live distro or VM if I find the time

drmingdrmer · 2022-11-21T12:53:06Z

Thank you so much @DamnWidget !

drmingdrmer · 2022-12-09T02:30:23Z

It happened recently in a CI session:
https://github.com/datafuselabs/openraft/actions/runs/3649785637/jobs/6164911737

This issue affects memstore example, but the rocksdb example works fine.
I think the problem is with actix_web, which is used by memstore example, while rocksdb uses toy_rpc as its server.

Update:
My mistake. It's because the rocksdb example is configured with a greater heartbeat timeout.

DamnWidget · 2022-12-09T09:33:11Z

In all of my tests it happened if the hb was configured to any value lower than 101 milliseconds, it also only happened in Pop_OS! when I tried, unfortunately I didn't find the time to try with other Ubuntu or Debían based distributions, it definitely wasn't a problem in Artix

drmingdrmer · 2022-12-09T15:14:34Z

This problem is caused by an issue of cargo that affects crate openssl.

In the example example/raft-key-value-memstore, it uses reqwest to send an
RPC to raft service.
And reqwest depends on native-tls, while native-tls depends on openssl.

When calling openssl::SslConnector::builder():
it will load certificate files if env variable SSL_CERT_FILE is set:
https://github.com/sfackler/rust-openssl/blob/eaee383429d156bd91c4a188ba57cf1747c2e440/openssl/src/ssl/mod.rs#L889-L896

strace shows the time spent on loading certificates:

1670588693.413004 (+     0.000084) openat(AT_FDCWD, "/usr/lib/ssl/certs/ca-certificates.crt", O_RDONLY) = 3
1670588693.413054 (+     0.000049) newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=190243, ...}, AT_EMPTY_PATH) = 0
1670588693.413093 (+     0.000038) read(3, "-----BEGIN CERTIFICATE-----\nMIIH"..., 4096) = 4096
1670588693.413790 (+     0.000697) read(3, "8B****************************vJw"..., 4096) = 4096
...
1670588693.474642 (+     0.001588) read(3, "D\n*****************************T"..., 4096) = 4096
1670588693.474848 (+     0.000205) brk(0x55e121d5d000) = 0x55e121d5d000
1670588693.474888 (+     0.000039) brk(0x55e121d5c000) = 0x55e121d5c000
1670588693.478875 (+     0.001058) read(3, "", 4096) = 0

And on Linux(at least on ubuntu), command cargo run pollutes environment
variables: It leaks SSL_CERT_FILE and SSL_CERT_DIR to the testing sub progress it runs.

Finally, every time a reqwest::Client is created, it spends several dozen
milliseconds on loading a bunch of certificates, which times out a raft RPC(50 ms).

I created a mini demo showing that on linux, cargo xxx will slow down openssl
based program:
https://github.com/drmingdrmer/test-native-tls-rs

On linux:

cargo run

SSL_CERT_DIR: /usr/lib/ssl/certs
SSL_CERT_FILE: /usr/lib/ssl/certs/ca-certificates.crt
...
=== : 66.382982ms

On my m1 mac:

cargo run

...
=== : 9.23432ms

Related issues:

Works under cargo test, but not when harness is executed directly?
Works under cargo test, but not when harness is executed directly? sfackler/rust-openssl#575
Cargo appears to leak SSL_CERT_FILE and SSL_CERT_DIR to subprocesses
Cargo appears to leak SSL_CERT_FILE and SSL_CERT_DIR to subprocesses rust-lang/cargo#3676
Cargo binaries are not run in pristine environment
Cargo binaries are not run in pristine environment rust-lang/cargo#2888

On Linux: command `cargo run` pollutes environment variables: It leaks `SSL_CERT_FILE` and `SSL_CERT_DIR` to the testing sub progress it runs. Which cause `reqwest` spending ~50 ms loading the certificates for every RPC. We just extend the RPC timeout to work around. - Fix: datafuselabs#550

DamnWidget · 2022-12-09T17:01:25Z

Nice catch, in Artix don't happens probably because it uses a more modern version of OpenSSL. Will try to check as soon as I can get into the computer

On Linux: command `cargo run` pollutes environment variables: It leaks `SSL_CERT_FILE` and `SSL_CERT_DIR` to the testing sub progress it runs. Which cause `reqwest` spending ~50 ms loading the certificates for every RPC. We just extend the RPC timeout to work around. - Fix: datafuselabs#550

vishwaspai changed the title ~~Example raft-kv-memstore hangs after pringing change-membership~~ Example raft-kv-memstore hangs after printing change-membership Sep 9, 2022

drmingdrmer added the C-bug Category: bug label Sep 13, 2022

drmingdrmer mentioned this issue Dec 9, 2022

Fix: workaround cargo leaking SSL_CERT_FILE issue #615

Merged

drmingdrmer closed this as completed in #615 Dec 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example raft-kv-memstore hangs after printing change-membership #550

Example raft-kv-memstore hangs after printing change-membership #550

vishwaspai commented Sep 9, 2022

github-actions bot commented Sep 9, 2022

vishwaspai commented Sep 10, 2022

drmingdrmer commented Sep 10, 2022

vishwaspai commented Sep 10, 2022

drmingdrmer commented Sep 12, 2022

drmingdrmer commented Sep 12, 2022

vishwaspai commented Sep 12, 2022

drmingdrmer commented Sep 13, 2022

vishwaspai commented Sep 13, 2022

drmingdrmer commented Sep 13, 2022 •

edited

Loading

vishwaspai commented Sep 13, 2022

drmingdrmer commented Sep 13, 2022

vishwaspai commented Sep 14, 2022

drmingdrmer commented Sep 14, 2022

vishwaspai commented Sep 15, 2022

vishwaspai commented Sep 20, 2022

drmingdrmer commented Sep 21, 2022

DamnWidget commented Nov 21, 2022

DamnWidget commented Nov 21, 2022

DamnWidget commented Nov 21, 2022

drmingdrmer commented Nov 21, 2022

drmingdrmer commented Dec 9, 2022 •

edited

Loading

DamnWidget commented Dec 9, 2022 •

edited

Loading

drmingdrmer commented Dec 9, 2022 •

edited

Loading

DamnWidget commented Dec 9, 2022

Example raft-kv-memstore hangs after printing change-membership #550

Example raft-kv-memstore hangs after printing change-membership #550

Comments

vishwaspai commented Sep 9, 2022

github-actions bot commented Sep 9, 2022

vishwaspai commented Sep 10, 2022

drmingdrmer commented Sep 10, 2022

vishwaspai commented Sep 10, 2022

drmingdrmer commented Sep 12, 2022

drmingdrmer commented Sep 12, 2022

vishwaspai commented Sep 12, 2022

drmingdrmer commented Sep 13, 2022

vishwaspai commented Sep 13, 2022

drmingdrmer commented Sep 13, 2022 • edited Loading

vishwaspai commented Sep 13, 2022

drmingdrmer commented Sep 13, 2022

vishwaspai commented Sep 14, 2022

drmingdrmer commented Sep 14, 2022

vishwaspai commented Sep 15, 2022

vishwaspai commented Sep 20, 2022

drmingdrmer commented Sep 21, 2022

DamnWidget commented Nov 21, 2022

DamnWidget commented Nov 21, 2022

DamnWidget commented Nov 21, 2022

drmingdrmer commented Nov 21, 2022

drmingdrmer commented Dec 9, 2022 • edited Loading

DamnWidget commented Dec 9, 2022 • edited Loading

drmingdrmer commented Dec 9, 2022 • edited Loading

DamnWidget commented Dec 9, 2022

drmingdrmer commented Sep 13, 2022 •

edited

Loading

drmingdrmer commented Dec 9, 2022 •

edited

Loading

DamnWidget commented Dec 9, 2022 •

edited

Loading

drmingdrmer commented Dec 9, 2022 •

edited

Loading