Skip to content
This repository has been archived by the owner on Mar 4, 2024. It is now read-only.

uv: seed the PRNG #197

Merged

Conversation

MathieuBordere
Copy link
Contributor

@MathieuBordere MathieuBordere commented Apr 27, 2021

The PRNG of rand was not seeded, causing it to be seeded with 1 every time (see https://linux.die.net/man/3/rand) depending on the implementation of rand on the platform, this can result in the same random sequence obtained from rand on all nodes, causing the election timeout to fire at the same moment for all nodes and a cluster that cannot make progress. This would be especially problematic if all nodes start up at the same time.

In a future PR I also want to adapt the election timeout mechanism. At the moment, every tick we check if the random timeout has passed. But the value of the tick is quite large (in case of dqlite 500ms) and there are only so many ticks in an election timeout interval, meaning that the randomness of the election timeout is significantly reduced, potentially resulting in slow leader election and an unresponsive cluster.

@codecov-commenter
Copy link

Codecov Report

Merging #197 (0dda754) into master (269ac8b) will decrease coverage by 0.05%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #197      +/-   ##
==========================================
- Coverage   86.84%   86.79%   -0.06%     
==========================================
  Files          47       47              
  Lines        7642     7640       -2     
  Branches     2041     2042       +1     
==========================================
- Hits         6637     6631       -6     
- Misses        902      906       +4     
  Partials      103      103              
Impacted Files Coverage Δ
src/uv.c 86.17% <0.00%> (-0.93%) ⬇️
src/replication.c 79.67% <0.00%> (-0.15%) ⬇️
src/byte.h 100.00% <0.00%> (ø)
src/uv_fs.c 75.29% <0.00%> (+0.18%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 269ac8b...0dda754. Read the comment docs.

Copy link
Contributor

@freeekanayaka freeekanayaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

@freeekanayaka
Copy link
Contributor

In a future PR I also want to adapt the election timeout mechanism. At the moment, every tick we check if the random timeout has passed. But the value of the tick is quite large (in case of dqlite 500ms) and there are only so many ticks in an election timeout interval, meaning that the randomness of the election timeout is significantly reduced, potentially resulting in slow leader election and an unresponsive cluster.

IIRC the tick interval should equal the heartbeat interval, which in turn should be set to 1/10 of the election timeout (e.g. 3 seconds election timeout and 300ms heartbeat/tick). See the Raft paper for some argument about the 1/10 ration being "ideal". Isn't that the case right now in dqlite? That ratio should be set in dqlite_node_set_network_latency.

@MathieuBordere
Copy link
Contributor Author

In a future PR I also want to adapt the election timeout mechanism. At the moment, every tick we check if the random timeout has passed. But the value of the tick is quite large (in case of dqlite 500ms) and there are only so many ticks in an election timeout interval, meaning that the randomness of the election timeout is significantly reduced, potentially resulting in slow leader election and an unresponsive cluster.

IIRC the tick interval should equal the heartbeat interval, which in turn should be set to 1/10 of the election timeout (e.g. 3 seconds election timeout and 300ms heartbeat/tick). See the Raft paper for some argument about the 1/10 ration being "ideal". Isn't that the case right now in dqlite? That ratio should be set in dqlite_node_set_network_latency.

Yes, if that function is called the ratio is set correctly, but most of the users just use the defaults as set in https://github.com/canonical/dqlite/blob/d30467a4da514b0f7d34880926f454b28aa63f42/src/server.c#L66 There the heartbeat timeout is 500ms and the election timeout 3000ms. The random value of the election timeout is chosen between 3000ms and 6000ms by raft in that case

unsigned timeout = (unsigned)r->io->random(r->io, (int)r->election_timeout,

@freeekanayaka
Copy link
Contributor

For reference, https://github.com/canonical/raft/blob/master/src/start.c#L194 is where the periodic call to the tick function is triggered, every heartbeat_timeout milliseconds as mentioned. In other words, as long as the heartbeat timeout is granular enough wrt the election timeout (as it should, because of the recommended 1/10th ratio) everything should be fine, I think.

@MathieuBordere
Copy link
Contributor Author

MathieuBordere commented Apr 27, 2021

For reference, https://github.com/canonical/raft/blob/master/src/start.c#L194 is where the periodic call to the tick function is triggered, every heartbeat_timeout milliseconds as mentioned. In other words, as long as the heartbeat timeout is granular enough wrt the election timeout (as it should, because of the recommended 1/10th ratio) everything should be fine, I think.

Yeah indeed, checking every tick is probably sufficient, in the case of the default the ratio is somewhere between 1/6 and 1/12.
edit: sorry this is not correct.

@freeekanayaka
Copy link
Contributor

In a future PR I also want to adapt the election timeout mechanism. At the moment, every tick we check if the random timeout has passed. But the value of the tick is quite large (in case of dqlite 500ms) and there are only so many ticks in an election timeout interval, meaning that the randomness of the election timeout is significantly reduced, potentially resulting in slow leader election and an unresponsive cluster.

IIRC the tick interval should equal the heartbeat interval, which in turn should be set to 1/10 of the election timeout (e.g. 3 seconds election timeout and 300ms heartbeat/tick). See the Raft paper for some argument about the 1/10 ration being "ideal". Isn't that the case right now in dqlite? That ratio should be set in dqlite_node_set_network_latency.

Yes, if that function is called the ratio is set correctly, but most of the users just use the defaults as set in https://github.com/canonical/dqlite/blob/d30467a4da514b0f7d34880926f454b28aa63f42/src/server.c#L66 There the heartbeat timeout is 500ms and the election timeout 3000ms. The random value of the election timeout is chosen between 3000ms and 6000ms by raft in that case

unsigned timeout = (unsigned)r->io->random(r->io, (int)r->election_timeout,

Maybe it would suffices to change the default heartbeat timeout to 300ms then. Apologies if that was indeed what you meant :)

@MathieuBordere
Copy link
Contributor Author

In a future PR I also want to adapt the election timeout mechanism. At the moment, every tick we check if the random timeout has passed. But the value of the tick is quite large (in case of dqlite 500ms) and there are only so many ticks in an election timeout interval, meaning that the randomness of the election timeout is significantly reduced, potentially resulting in slow leader election and an unresponsive cluster.

IIRC the tick interval should equal the heartbeat interval, which in turn should be set to 1/10 of the election timeout (e.g. 3 seconds election timeout and 300ms heartbeat/tick). See the Raft paper for some argument about the 1/10 ration being "ideal". Isn't that the case right now in dqlite? That ratio should be set in dqlite_node_set_network_latency.

Yes, if that function is called the ratio is set correctly, but most of the users just use the defaults as set in https://github.com/canonical/dqlite/blob/d30467a4da514b0f7d34880926f454b28aa63f42/src/server.c#L66 There the heartbeat timeout is 500ms and the election timeout 3000ms. The random value of the election timeout is chosen between 3000ms and 6000ms by raft in that case

unsigned timeout = (unsigned)r->io->random(r->io, (int)r->election_timeout,

Maybe it would suffices to change the default heartbeat timeout to 300ms then. Apologies if that was indeed what you meant :)

Lowering to 300ms looks like a good start indeed, it's not what I meant :-)

@MathieuBordere MathieuBordere merged commit 37af7cd into canonical:master Apr 27, 2021
@MathieuBordere MathieuBordere deleted the election-timeout-randomness branch December 9, 2022 10:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants