-
Notifications
You must be signed in to change notification settings - Fork 136
Conversation
Allows raft to perform a potentially long running task asynchronously from the main loop. Users of this function should take the necessary thread-safety measures when making use of this function as the work can run in parallell to the main thread.
Codecov Report
@@ Coverage Diff @@
## master #268 +/- ##
==========================================
+ Coverage 87.71% 87.90% +0.18%
==========================================
Files 108 110 +2
Lines 15527 15777 +250
Branches 2398 2412 +14
==========================================
+ Hits 13620 13869 +249
- Misses 1720 1722 +2
+ Partials 187 186 -1
Continue to review full report at Codecov.
|
04d3b43
to
d0b0557
Compare
6581c9a
to
c01d121
Compare
c01d121
to
a3e585c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't go through all details yet, but here's a couple of questions:
- If you can disclose that, where did the need of adding this feature come from? (I assume some real world use case). I ask because I'm slightly surprised that synchronous snapshots can become a performance issue, at least with the assumption that the whole FSM is kept in memory.
- In version 2 of
raft_fsm
is it intentional that thesnapshot
method still gets called even ifsnapshot_async
is defined?
We have a (rudimentary) benchmark, a typical result of a run with synchronous snapshots on the Github actions machines is something like
Notice, the large maximum times a write or a read can take. A typical run with async snapshots is (on the dqlite branch where I'm implementing it)
The spikes are still there, but a lot less high. I'll try to do some more measurements tomorrow.
|
I see. It'd be interesting to run those measurements on a bare metal machine with as little disk activity as possible going on (other than raft/dqlite itself). I would expect the version with async snapshot support to virtually eliminate the write spikes. If not, perhaps there is something else also to look at, as a separate work from this PR.
Oh I see. So you'd copy the whole WAL synchronously (since it should typically be relatively small) so there are no race conditions, and then copy the database part asynchronously and since it won't be changed (checkpoints off) it is okay to do that in a separate thread. |
Yes, and now that checkpoints are off and we have a copy of the WAL, we no longer need to copy the database, we can just pass the page pointers to raft. |
Ah nice. |
Another question: since only page pointers are passed to raft, what would the |
Yeah indeed, I first made this while I still did the copy of the database in the worker thread, only afterwards I moved dqlite to passing pointers. I still think it's a bit cleaner to offload the work to a thread. |
In which sense do you think it's cleaner? I'd say it's cleaner if the goal is "add support for asynchronous snapshots". But if we switch the goal to "add support for post-snapshot cleanup", then simply introducing I'm just a bit perplexed because it feels there's additional complexity (new I know it might sound bad, but perhaps it'd be worth considering implementing just |
It was already merged in this night, if you really object, I'll talk to @stgraber to see if it's worth to revert it. |
Assuming that the work is copying pointers it should really be fast and I wouldn't expect any regression. I'd personally go for the minimum required change (adding |
For example, we have a few spots that check if a snapshot is in progress, and they use https://github.com/canonical/raft/blob/master/src/replication.c#L1276 Do they still work correctly now that a snapshot might be in progress even if |
https://github.com/canonical/raft/blob/master/src/replication.c#L1276 checks if same for https://github.com/canonical/raft/blob/master/src/replication.c#L1418 https://github.com/canonical/raft/blob/master/src/replication.c#L1657 only tries to detect a snapshot install and is still okay. |
Don't understand me wrong, I understand your concern, and I would probably have done it differently after your input. |
Did some measurements and the async work is indeed really fast, I would probably then revert this merge (@stgraber) and work on the simpler approach to remove some risk and keep this one around for when it's really needed. |
Thanks, sounds like a wise choice to me. |
Thank you for the input. |
Can we have a async branch with updated commit to master ? I am work on a SQL based solution with in memory database, I need async snapshot so I can flush the database into disk. Without async snapshot it will block the event loop for long time If I have a big database. I also need async restore with a callback let the raft know it is ready to continue. (without block the loop, because there could be raft group run in it) |
Hi, it will probably land this week but you can already find it here . There's no |
I has merge your branch into master and it work well. please consider add async_restore into v3, and made it optional work like async_snapshot. With async_restore I am able to add disk support with fast snapshot for big size database. Without async_restore the event will blocked and break a lot others. (like NATS sub client run in the event loop, or other raft group) |
This PR introduces:
raft_io->async_work
method that allows raft to run any workload asynchronously from the main loop.snapshot_async
andsnapshot_finalize
method.snapshot_async
is ran through the newasync_work
interface and allows a user of raft to run snapshots out of the main loop.Updated
version-info
according to https://www.gnu.org/software/libtool/manual/html_node/Updating-version-info.html