Skip to content
This repository has been archived by the owner on Mar 4, 2024. It is now read-only.

Remove blocking barrier distinctions #446

Closed

Conversation

MathieuBordere
Copy link
Contributor

This is a draft PR that implements the approach to remove the (non)blocking barrier distinction discussed in #444

I believe @cole-miller is not a huge fan of the approach. Alternatives would be:

  • new segment format that encodes the index explicitly (more risky, think about downgrading raft versions, also need to be able to still handle old format segments)
  • run a barrier on the first snapshot only, this carries a performance penalty as new writes will be blocked during the snapshot.
  • ...

@codecov
Copy link

codecov bot commented Jun 27, 2023

Codecov Report

Merging #446 (17b8f17) into master (a40e849) will decrease coverage by 0.14%.
The diff coverage is 58.53%.

@@            Coverage Diff             @@
##           master     #446      +/-   ##
==========================================
- Coverage   76.73%   76.60%   -0.14%     
==========================================
  Files          51       51              
  Lines        9597     9616      +19     
  Branches     2443     2456      +13     
==========================================
+ Hits         7364     7366       +2     
- Misses       1159     1166       +7     
- Partials     1074     1084      +10     
Impacted Files Coverage Δ
src/client.c 60.78% <0.00%> (-0.24%) ⬇️
src/membership.c 72.84% <0.00%> (-1.99%) ⬇️
src/replication.c 69.14% <0.00%> (-0.25%) ⬇️
src/start.c 62.16% <0.00%> (-1.15%) ⬇️
src/uv_finalize.c 65.88% <ø> (-0.40%) ⬇️
src/uv_prepare.c 78.10% <ø> (-0.13%) ⬇️
src/uv_truncate.c 55.55% <ø> (-0.38%) ⬇️
src/uv_snapshot.c 55.21% <53.84%> (-0.37%) ⬇️
src/uv_append.c 85.12% <91.66%> (+0.36%) ⬆️
src/uv.c 76.59% <100.00%> (+0.25%) ⬆️

... and 1 file with indirect coverage changes

@cole-miller
Copy link
Contributor

I believe @cole-miller is not a huge fan of the approach.

I actually don't mind it! Given the constraints it seems like a nice solution to me.

@MathieuBordere
Copy link
Contributor Author

Jepsen doesn't seem extremely happy https://github.com/canonical/jepsen.dqlite/actions/runs/5390663924/jobs/9786357309, taking a look before removing draft.

@MathieuBordere
Copy link
Contributor Author

Investigating the failures, it looks like raft has trouble decoding a configuration found in the raft log when taking a snapshot. All of the occurrences follow a truncation of the log though. I'm suspecting that removing the barrier before taking the snapshot, together with the truncation - that in turn also requires a barrier - has some unanticipated interaction.

/* The next entry in the first open segment will come after the snapshot
* entries. */
if (put->trailing == 0 && uv->prepare_next_counter == 0) {
uv->prepare_next_counter = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be unrelated to the Jenkins failures, but after a very quick look at the PR's diff I noticed this change, and I'm not entirely sure it's enough. Could it be that there is a ready but yet unused open-0 open segment that will be the one used after the snapshot is installed?

Just throwing it there in case it help, didn't really look much.

@freeekanayaka
Copy link
Contributor

Investigating the failures, it looks like raft has trouble decoding a configuration found in the raft log when taking a snapshot. All of the occurrences follow a truncation of the log though. I'm suspecting that removing the barrier before taking the snapshot, together with the truncation - that in turn also requires a barrier - has some unanticipated interaction.

Perhaps it'd be a good idea to attach to this PR the Jepsen's artifacts containing the logs of one the failing tests, for future reference (since they'll get deleted at some point).

@MathieuBordere
Copy link
Contributor Author

MathieuBordere commented Jun 28, 2023

Perhaps it'd be a good idea to attach to this PR the Jepsen's artifacts containing the logs of one the failing tests, for future reference (since they'll get deleted at some point).

If it's a real bug in the current implementation, maybe (probably) it's just a bug in this PR, attaching anyway for completeness.

What is fishy

LIBRAFT   1687943975883340965 src/replication.c:1018 log mismatch -> truncate (4353)                                                                          |    latency-raw.png
LIBRAFT   1687943975883346565 src/membership.c:187 roll back membership                                                                                       |    rate.png
LIBRAFT   1687943975883353165 src/membership.c:69 fetch committed configuration at 4808   

So the node decides to truncate to index 4353, but index 4808 is supposed to be committed, that should not be possible.

and later (probably) trying to decode a non-configuration entry as a configuration entry.

LIBRAFT   1687943987751192578 src/membership.c:69 fetch committed configuration at 4808                                                                       
LIBRAFT   1687943987751194978 src/membership.c:72 configuration found in log.                                                                                 
0x7f4759461600 configurationAdd                                                                                                                               
        src/configuration.c:131                                                                                                                               
0x7f4759461dea configurationDecode                                                                                                                            
        src/configuration.c:363                                                                                                                               
0x7f475946cdab takeSnapshot                                                                                                                                   
        src/replication.c:1587                                                                                                                                
0x7f475946cdab replicationApply                                                                                                                               
        src/replication.c:1679                                                                                                                                
0x7f475946e732 replicationUpdate                                                                                                                              
        src/replication.c:762                                                                                                                                 
0x7f475946954e recvAppendEntriesResult                                                                                                                        
        src/recv_append_entries_result.c:65                                                                                                                   
0x7f4759468644 recvMessage                                                                                                                                    
        src/recv.c:36                                                                                                                                         
0x7f4759468644 recvCb                                                                                                                                         
        src/recv.c:103                                                                                                                                        
0x7f475947cbcb uvFireRecvCb                                                                                                                                   
        src/uv_recv.c:190                                                                                                                                     
0x7f475947d117 uvServerReadCb                                                                                                                                 
        src/uv_recv.c:302                                                                                                                                     
0x7f47594f6ba0 ???                                                                                                                                            
        ???:0                                                                                                                                                 
0x7f47594f76d7 ???                                                                                                                                            
        ???:0                                                                                                                                                 
0x7f47594fcb7f ???                                                                                                                                            
        ???:0                                                                                                                                                 
0x7f47594ec84b ???                                                                                                                                            
        ???:0                                                                                                                                                 
0x7f47598467cf taskRun                                                                                                                                        
        src/server.c:732                                                                                                                                      
0x7f47598467cf taskStart                                                                                                                                      
        src/server.c:772                                                                                                                                      
0x7f47598b3608 ???                                                                                                                                            
        ???:0                                                                                                                                                 
0x7f475962c132 ???                                                                                                                                            
        ???:0
0xffffffffffffffff ???
        ???:0
//TODO HERE
app: src/configuration.c:131: configurationAdd: Assertion `id != 0' failed.

jepsen-data-bank-stop-failure.zip

@MathieuBordere
Copy link
Contributor Author

MathieuBordere commented Jun 28, 2023

My gut feeling is that due to the issue in #435 (where a second barrier callback doesn't fire when all active segments are involved in a barrier), the log truncation didn't happen when a snapshot barrier was active covering up an issue that is being exposed now. But still investigating.

@freeekanayaka
Copy link
Contributor

My gut feeling is that due to the issue in #435 (where a second barrier callback doesn't fire when all active segments are involved in a barrier), the log truncation didn't happen when a snapshot barrier was active covering up an issue that is being exposed now. But still investigating.

Could well be, but if that's the case, I'm wondering why such a bug was not surfaced by Jepsen also before the non-blocking barrier code was put in place. Or maybe our Jepsen suite is doing more nasty tests now that it was not doing before. Or we simply didn't examine some Jepsen failures before?

Anyway, logs should indeed shed some light on this.

@MathieuBordere
Copy link
Contributor Author

MathieuBordere commented Jun 28, 2023

My gut feeling is that due to the issue in #435 (where a second barrier callback doesn't fire when all active segments are involved in a barrier), the log truncation didn't happen when a snapshot barrier was active covering up an issue that is being exposed now. But still investigating.

Could well be, but if that's the case, I'm wondering why such a bug was not surfaced by Jepsen also before the non-blocking barrier code was put in place. Or maybe our Jepsen suite is doing more nasty tests now that it was not doing before. Or we simply didn't examine some Jepsen failures before?

Anyway, logs should indeed shed some light on this.

yeah, I'm a bit reasoning instinctively here, trying to gather proof now :-)

Mathieu Borderé added 6 commits June 29, 2023 11:34
Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>
Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>
Signed-off-by: Mathieu Borderé <mathieu.bordere@canonical.com>
@MathieuBordere
Copy link
Contributor Author

MathieuBordere commented Jun 30, 2023

What seems to happen is the following:

Jepsen has a snapshot trailing of 1024 and snapshot threshold of 128.

Node takes snapshot at 2454

LIBRAFT   1688115651332445058 src/replication.c:1582 take snapshot at 2454

Data folder contains segment 0000000000000001-0000000000001408 and open-1 containing 1K+ entries.

After the snapshot completes, the log is truncated, but only the closed segment is removed (it contains older entries then the trailing cut off 1430), the open segment, that was not closed, because we don't fire a barrier anymore before the snapshot, is left as-is.

Next time we restart, raft sees the open segment, no closed segments and the snapshot and erroneously puts all entries in the open segment after the snapshot leading to data corruption.

Will try to investigate what we can do, an explicit start index in an open segment sure sounds nice ... it just too tricky without it. Leaving the barrier before taking a snapshot and living with the extra complexity is an easier, and already tested option too...

edit: Or we could say, "Always leave at least the last closed segment", but then what's the point of the trailing parameter, it gets a bit fuzzy then.

@MathieuBordere
Copy link
Contributor Author

MathieuBordere commented Jun 30, 2023

edit: Or we could say, "Always leave at least the last closed segment", but then what's the point of the trailing parameter, it gets a bit fuzzy then.

I could live with this by the way, I'd rather have this than the blocking / non-blocking barrier stuff.

@cole-miller @freeekanayaka What do you think?

@MathieuBordere
Copy link
Contributor Author

MathieuBordere commented Jun 30, 2023

Also, like @cole-miller mentioned, the introduction of open-0 will also break a raft downgrade, so we could just as well move to a new segment format with an explicit start index.

@MathieuBordere
Copy link
Contributor Author

I'm going to revisit this with the insights from this exercise and live with the blocking / non-blocking distinction for now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants