uv: seed the PRNG #197

MathieuBordere · 2021-04-27T14:19:29Z

The PRNG of rand was not seeded, causing it to be seeded with 1 every time (see https://linux.die.net/man/3/rand) depending on the implementation of rand on the platform, this can result in the same random sequence obtained from rand on all nodes, causing the election timeout to fire at the same moment for all nodes and a cluster that cannot make progress. This would be especially problematic if all nodes start up at the same time.

In a future PR I also want to adapt the election timeout mechanism. At the moment, every tick we check if the random timeout has passed. But the value of the tick is quite large (in case of dqlite 500ms) and there are only so many ticks in an election timeout interval, meaning that the randomness of the election timeout is significantly reduced, potentially resulting in slow leader election and an unresponsive cluster.

codecov-commenter · 2021-04-27T14:23:14Z

Codecov Report

Merging #197 (0dda754) into master (269ac8b) will decrease coverage by 0.05%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master     #197      +/-   ##
==========================================
- Coverage   86.84%   86.79%   -0.06%     
==========================================
  Files          47       47              
  Lines        7642     7640       -2     
  Branches     2041     2042       +1     
==========================================
- Hits         6637     6631       -6     
- Misses        902      906       +4     
  Partials      103      103

Impacted Files	Coverage Δ
src/uv.c	`86.17% <0.00%> (-0.93%)`	⬇️
src/replication.c	`79.67% <0.00%> (-0.15%)`	⬇️
src/byte.h	`100.00% <0.00%> (ø)`
src/uv_fs.c	`75.29% <0.00%> (+0.18%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 269ac8b...0dda754. Read the comment docs.

freeekanayaka

Nice catch

freeekanayaka · 2021-04-27T16:44:20Z

In a future PR I also want to adapt the election timeout mechanism. At the moment, every tick we check if the random timeout has passed. But the value of the tick is quite large (in case of dqlite 500ms) and there are only so many ticks in an election timeout interval, meaning that the randomness of the election timeout is significantly reduced, potentially resulting in slow leader election and an unresponsive cluster.

IIRC the tick interval should equal the heartbeat interval, which in turn should be set to 1/10 of the election timeout (e.g. 3 seconds election timeout and 300ms heartbeat/tick). See the Raft paper for some argument about the 1/10 ration being "ideal". Isn't that the case right now in dqlite? That ratio should be set in dqlite_node_set_network_latency.

MathieuBordere · 2021-04-27T16:51:00Z

In a future PR I also want to adapt the election timeout mechanism. At the moment, every tick we check if the random timeout has passed. But the value of the tick is quite large (in case of dqlite 500ms) and there are only so many ticks in an election timeout interval, meaning that the randomness of the election timeout is significantly reduced, potentially resulting in slow leader election and an unresponsive cluster.

IIRC the tick interval should equal the heartbeat interval, which in turn should be set to 1/10 of the election timeout (e.g. 3 seconds election timeout and 300ms heartbeat/tick). See the Raft paper for some argument about the 1/10 ration being "ideal". Isn't that the case right now in dqlite? That ratio should be set in dqlite_node_set_network_latency.

Yes, if that function is called the ratio is set correctly, but most of the users just use the defaults as set in https://github.com/canonical/dqlite/blob/d30467a4da514b0f7d34880926f454b28aa63f42/src/server.c#L66 There the heartbeat timeout is 500ms and the election timeout 3000ms. The random value of the election timeout is chosen between 3000ms and 6000ms by raft in that case

raft/src/election.c

Line 41 in 269ac8b

unsigned timeout = (unsigned)r->io->random(r->io, (int)r->election_timeout,

freeekanayaka · 2021-04-27T16:51:56Z

For reference, https://github.com/canonical/raft/blob/master/src/start.c#L194 is where the periodic call to the tick function is triggered, every heartbeat_timeout milliseconds as mentioned. In other words, as long as the heartbeat timeout is granular enough wrt the election timeout (as it should, because of the recommended 1/10th ratio) everything should be fine, I think.

MathieuBordere · 2021-04-27T16:53:40Z

For reference, https://github.com/canonical/raft/blob/master/src/start.c#L194 is where the periodic call to the tick function is triggered, every heartbeat_timeout milliseconds as mentioned. In other words, as long as the heartbeat timeout is granular enough wrt the election timeout (as it should, because of the recommended 1/10th ratio) everything should be fine, I think.

Yeah indeed, checking every tick is probably sufficient, ~~in the case of the default the ratio is somewhere between 1/6 and 1/12.~~
edit: sorry this is not correct.

freeekanayaka · 2021-04-27T16:54:56Z

In a future PR I also want to adapt the election timeout mechanism. At the moment, every tick we check if the random timeout has passed. But the value of the tick is quite large (in case of dqlite 500ms) and there are only so many ticks in an election timeout interval, meaning that the randomness of the election timeout is significantly reduced, potentially resulting in slow leader election and an unresponsive cluster.

IIRC the tick interval should equal the heartbeat interval, which in turn should be set to 1/10 of the election timeout (e.g. 3 seconds election timeout and 300ms heartbeat/tick). See the Raft paper for some argument about the 1/10 ration being "ideal". Isn't that the case right now in dqlite? That ratio should be set in dqlite_node_set_network_latency.

Yes, if that function is called the ratio is set correctly, but most of the users just use the defaults as set in https://github.com/canonical/dqlite/blob/d30467a4da514b0f7d34880926f454b28aa63f42/src/server.c#L66 There the heartbeat timeout is 500ms and the election timeout 3000ms. The random value of the election timeout is chosen between 3000ms and 6000ms by raft in that case

raft/src/election.c

Line 41 in 269ac8b

unsigned timeout = (unsigned)r->io->random(r->io, (int)r->election_timeout,

Maybe it would suffices to change the default heartbeat timeout to 300ms then. Apologies if that was indeed what you meant :)

MathieuBordere · 2021-04-27T17:02:12Z

In a future PR I also want to adapt the election timeout mechanism. At the moment, every tick we check if the random timeout has passed. But the value of the tick is quite large (in case of dqlite 500ms) and there are only so many ticks in an election timeout interval, meaning that the randomness of the election timeout is significantly reduced, potentially resulting in slow leader election and an unresponsive cluster.

IIRC the tick interval should equal the heartbeat interval, which in turn should be set to 1/10 of the election timeout (e.g. 3 seconds election timeout and 300ms heartbeat/tick). See the Raft paper for some argument about the 1/10 ration being "ideal". Isn't that the case right now in dqlite? That ratio should be set in dqlite_node_set_network_latency.

Yes, if that function is called the ratio is set correctly, but most of the users just use the defaults as set in https://github.com/canonical/dqlite/blob/d30467a4da514b0f7d34880926f454b28aa63f42/src/server.c#L66 There the heartbeat timeout is 500ms and the election timeout 3000ms. The random value of the election timeout is chosen between 3000ms and 6000ms by raft in that case

raft/src/election.c

Line 41 in 269ac8b

unsigned timeout = (unsigned)r->io->random(r->io, (int)r->election_timeout,

Maybe it would suffices to change the default heartbeat timeout to 300ms then. Apologies if that was indeed what you meant :)

Lowering to 300ms looks like a good start indeed, it's not what I meant :-)

uv: seed the PRNG

0dda754

MathieuBordere requested review from freeekanayaka and stgraber April 27, 2021 14:20

freeekanayaka approved these changes Apr 27, 2021

View reviewed changes

MathieuBordere mentioned this pull request Apr 27, 2021

server: Lower default HeartBeat timeout. canonical/dqlite#298

Merged

MathieuBordere merged commit 37af7cd into canonical:master Apr 27, 2021

MathieuBordere deleted the election-timeout-randomness branch December 9, 2022 10:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uv: seed the PRNG #197

uv: seed the PRNG #197

MathieuBordere commented Apr 27, 2021 •

edited

codecov-commenter commented Apr 27, 2021

freeekanayaka left a comment

freeekanayaka commented Apr 27, 2021

MathieuBordere commented Apr 27, 2021

freeekanayaka commented Apr 27, 2021

MathieuBordere commented Apr 27, 2021 •

edited

freeekanayaka commented Apr 27, 2021

MathieuBordere commented Apr 27, 2021

uv: seed the PRNG #197

uv: seed the PRNG #197

Conversation

MathieuBordere commented Apr 27, 2021 • edited

codecov-commenter commented Apr 27, 2021

Codecov Report

freeekanayaka left a comment

Choose a reason for hiding this comment

freeekanayaka commented Apr 27, 2021

MathieuBordere commented Apr 27, 2021

freeekanayaka commented Apr 27, 2021

MathieuBordere commented Apr 27, 2021 • edited

freeekanayaka commented Apr 27, 2021

MathieuBordere commented Apr 27, 2021

MathieuBordere commented Apr 27, 2021 •

edited

MathieuBordere commented Apr 27, 2021 •

edited