-
Notifications
You must be signed in to change notification settings - Fork 136
uv: barrier before every snapshot #264
uv: barrier before every snapshot #264
Conversation
edit: there's a flaky test, I'll convert to draft for now and reopen. |
Nice catch, the approach looks good to me. Strictly speaking perhaps requiring to close the open segment is probably necessary on for the very first snapshot that is taken (where typically there are no closed segments yet, leading to the bug you discovered), while in case there are already closed segments the bug shouldn't occur. However closing the open segment doesn't seem to cost much (compared to the overall cost of taking a snapshot), so I guess it's fine. |
Thanks, an alternative was to add the start index to the open segments, but that would have been trickier I think (dealing with multiple formats, and the segment logic is already quite complex imo). |
The reason open segments don't have the start index is that you don't know that upfront (when creating the open segment file on disk) and renaming the open segment file each time you start using it slows down the throughput. It probably goes without saying, but FWIW I do agree that the segment logic is complex, and that seems the price for its efficiency: not sure you know it, but just for reference that logic was taken virtually unchanged from LogCabin, which is the very first Raft implementation, coded by the Raft author himself. IIRC etcd has a similar approach to a certain extent. |
59d6cb0
to
424de42
Compare
|
This solves a bug that came up during jepsen tests, it occurs when: - a server is killed ungracefully (leading to a written open segment in the directory when the system restarts) - a snapshot file exists - no closed segments exist Because open segments are not closed before writing a snapshot, raft, when starting up and reading the files in the data directory, erroneously assumes that all the entries in the open segment are newer than the entries in the snapshot, while in reality the entries are already contained in the snapshot, leading to a wrong state. Closing the current open segments before writing the snapshot ensures that no old entries can mistakenly be identified as new entries, all entries in the open segments will be newer than the snapshot. To close the open segments before making a snapshot, we perform a `barrier` request, but we need to make a distinction between a 'blocking' and a 'non-blocking' barrier in order to save performance. Both barriers close all current open segments before firing the barrier callback, but a 'non-blocking' barrier allows writes to go through to newly created open-segments. This non-blocking barrier is used when raft is creating a snapshot during regular operation, the blocking barrier is used when installing snapshots and truncating the log.
424de42
to
a3de078
Compare
This solves a bug that came up during jepsen tests, it occurs when:
the directory when the system restarts)
Because open segments are not closed before writing a snapshot, raft,
when starting up and reading the files in the data directory,
erroneously assumes that all the entries in the open segment are newer
than the entries in the snapshot, while in reality the entries are
already contained in the snapshot, leading to a wrong state.
Closing the current open segments before writing the snapshot ensures
that no old entries can mistakenly be identified as new entries, all
entries in the open segments will be newer than the snapshot.
To close the open segments before making a snapshot, we perform a
barrier
request, but we need to make a distinction between a 'blocking' and a 'non-blocking'
barrier in order to save performance.
Both barriers close all current open segments before firing the barrier
callback, but a 'non-blocking' barrier allows writes to go through to newly
created open-segments. This non-blocking barrier is used when raft is creating
a snapshot during regular operation, the blocking barrier is used when installing
snapshots and truncating the log.