Skip to content
This repository has been archived by the owner on Mar 4, 2024. It is now read-only.

uv: barrier before every snapshot #264

Merged
merged 1 commit into from
Feb 11, 2022

Conversation

MathieuBordere
Copy link
Contributor

This solves a bug that came up during jepsen tests, it occurs when:

  • a server is killed ungracefully (leading to a written open segment in
    the directory when the system restarts)
  • a snapshot file exists
  • no closed segments exist

Because open segments are not closed before writing a snapshot, raft,
when starting up and reading the files in the data directory,
erroneously assumes that all the entries in the open segment are newer
than the entries in the snapshot, while in reality the entries are
already contained in the snapshot, leading to a wrong state.

Closing the current open segments before writing the snapshot ensures
that no old entries can mistakenly be identified as new entries, all
entries in the open segments will be newer than the snapshot.

To close the open segments before making a snapshot, we perform a barrier
request, but we need to make a distinction between a 'blocking' and a 'non-blocking'
barrier in order to save performance.
Both barriers close all current open segments before firing the barrier
callback, but a 'non-blocking' barrier allows writes to go through to newly
created open-segments. This non-blocking barrier is used when raft is creating
a snapshot during regular operation, the blocking barrier is used when installing
snapshots and truncating the log.

@MathieuBordere
Copy link
Contributor Author

MathieuBordere commented Feb 11, 2022

edit: there's a flaky test, I'll convert to draft for now and reopen.

@MathieuBordere MathieuBordere marked this pull request as draft February 11, 2022 15:10
src/uv_snapshot.c Outdated Show resolved Hide resolved
@freeekanayaka
Copy link
Contributor

Nice catch, the approach looks good to me.

Strictly speaking perhaps requiring to close the open segment is probably necessary on for the very first snapshot that is taken (where typically there are no closed segments yet, leading to the bug you discovered), while in case there are already closed segments the bug shouldn't occur. However closing the open segment doesn't seem to cost much (compared to the overall cost of taking a snapshot), so I guess it's fine.

@MathieuBordere
Copy link
Contributor Author

Nice catch, the approach looks good to me.

Strictly speaking perhaps requiring to close the open segment is probably necessary on for the very first snapshot that is taken (where typically there are no closed segments yet, leading to the bug you discovered), while in case there are already closed segments the bug shouldn't occur. However closing the open segment doesn't seem to cost much (compared to the overall cost of taking a snapshot), so I guess it's fine.

Thanks, an alternative was to add the start index to the open segments, but that would have been trickier I think (dealing with multiple formats, and the segment logic is already quite complex imo).

@freeekanayaka
Copy link
Contributor

Nice catch, the approach looks good to me.
Strictly speaking perhaps requiring to close the open segment is probably necessary on for the very first snapshot that is taken (where typically there are no closed segments yet, leading to the bug you discovered), while in case there are already closed segments the bug shouldn't occur. However closing the open segment doesn't seem to cost much (compared to the overall cost of taking a snapshot), so I guess it's fine.

Thanks, an alternative was to add the start index to the open segments, but that would have been trickier I think (dealing with multiple formats, and the segment logic is already quite complex imo).

The reason open segments don't have the start index is that you don't know that upfront (when creating the open segment file on disk) and renaming the open segment file each time you start using it slows down the throughput.

It probably goes without saying, but FWIW I do agree that the segment logic is complex, and that seems the price for its efficiency: not sure you know it, but just for reference that logic was taken virtually unchanged from LogCabin, which is the very first Raft implementation, coded by the Raft author himself. IIRC etcd has a similar approach to a certain extent.

@MathieuBordere
Copy link
Contributor Author

MathieuBordere commented Feb 11, 2022

I have a bit of a lack of inspiration to show in a test that appends are not blocked by a non-blocking barrier, will return to this tonight or early next week.
good to go

This solves a bug that came up during jepsen tests, it occurs when:
- a server is killed ungracefully (leading to a written open segment in
the directory when the system restarts)
- a snapshot file exists
- no closed segments exist

Because open segments are not closed before writing a snapshot, raft,
when starting up and reading the files in the data directory,
erroneously assumes that all the entries in the open segment are newer
than the entries in the snapshot, while in reality the entries are
already contained in the snapshot, leading to a wrong state.

Closing the current open segments before writing the snapshot ensures
that no old entries can mistakenly be identified as new entries, all
entries in the open segments will be newer than the snapshot.

To close the open segments before making a snapshot, we perform a `barrier`
request, but we need to make a distinction between a 'blocking' and a 'non-blocking'
barrier in order to save performance.
Both barriers close all current open segments before firing the barrier
callback, but a 'non-blocking' barrier allows writes to go through to newly
created open-segments. This non-blocking barrier is used when raft is creating
a snapshot  during regular operation, the blocking barrier is used when installing
snapshots and truncating the log.
@MathieuBordere MathieuBordere marked this pull request as ready for review February 11, 2022 18:38
@stgraber stgraber merged commit ca8b466 into canonical:master Feb 11, 2022
@MathieuBordere MathieuBordere deleted the barrier-before-snapshot branch December 9, 2022 10:15
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants