release-23.1: kvserver: sync before removing sideloaded files #117299

pav-kv · 2024-01-03T21:38:08Z

Backport:

2/2 commits from "server: support strict MemFS in StickyVFSRegistry" (server: support strict MemFS in StickyVFSRegistry #115595)
6/6 commits from "kvserver: sync before removing sideloaded files" (kvserver: sync before removing sideloaded files #114191)

Please see individual PRs for details.

/cc @cockroachdb/release

Release justification: critical bug fix

blathers-crl · 2024-01-03T21:38:11Z

cockroach-teamcity · 2024-01-03T21:38:24Z

This change is

pav-kv · 2024-01-03T21:38:27Z

~~Depends on #117295~~

pav-kv · 2024-01-08T11:37:54Z

Hm, the test failed in Extended CI with an interesting error:

=== RUN   TestCrashWhileTruncatingSideloadedEntries
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/33e1d369c27b9c01b2b6009c561815a3/logTestCrashWhileTruncatingSideloadedEntries412629092
    test_log_scope.go:79: use -show-logs to present logs inline
    client_raft_log_queue_test.go:234: leader replica: [n1,s1,r64/1:/{Table/Max-Max}]
    client_raft_log_queue_test.go:239: follower replica: [n2,s2,r64/2:/{Table/Max-Max}]
    client_raft_log_queue_test.go:245: leader: log indices: [11..20]
    client_raft_log_queue_test.go:246: leader: applied to: 20
    client_raft_log_queue_test.go:245: follower: log indices: [17..20]
    client_raft_log_queue_test.go:246: follower: applied to: 20
    client_raft_log_queue_test.go:271: committed AddSSTs
    client_raft_log_queue_test.go:245: leader: log indices: [11..40]
    client_raft_log_queue_test.go:246: leader: applied to: 40
    client_raft_log_queue_test.go:245: follower: log indices: [17..40]
    client_raft_log_queue_test.go:246: follower: applied to: 20
    client_raft_log_queue_test.go:299: committed truncation for indices < 39
    client_raft_log_queue_test.go:245: leader: log indices: [39..41]
    client_raft_log_queue_test.go:246: leader: applied to: 41
    client_raft_log_queue_test.go:245: follower: log indices: [17..41]
    client_raft_log_queue_test.go:246: follower: applied to: 20
    client_raft_log_queue_test.go:319: unblocked follower application flow
    client_raft_log_queue_test.go:321: follower applied the truncation
    client_raft_log_queue_test.go:245: follower: log indices: [39..41]
    client_raft_log_queue_test.go:246: follower: applied to: 41
    client_raft_log_queue_test.go:345: CRASH!
    client_raft_log_queue_test.go:349: restarting follower
    client_raft_log_queue_test.go:352: FS after restart:
                  /
             892    000002.log
           10853    000010.log
            1816    000012.sst
            1163    000013.sst
            1018    000014.sst
            1120    i21.t6
            1120    i22.t6
            1120    i23.t6
            1120    i24.t6
            1120    i25.t6
            1120    i26.t6
            1120    i27.t6
            1120    i28.t6
            1120    i29.t6
            1120    i30.t6
            1120    i31.t6
            1120    i32.t6
            1120    i33.t6
            1120    i34.t6
            1120    i35.t6
            1120    i36.t6
            1120    i37.t6
            1120    i38.t6
            1120    i39.t6
            1120    i40.t6
              16    CURRENT
               0    LOCK
            2226    MANIFEST-000001
            2538    OPTIONS-000003
               8    STORAGE_MIN_VERSION
                    auxiliary/
                      sideloading/
                        r0XXXX/
                          r64/
            1120            i39.t6
            1120            i40.t6
                      sstsnapshot/
               0    marker.format-version.000012.013
               0    marker.manifest.000001.MANIFEST-000001
F240105 18:43:53.285591 4473 3@pebble/internal/base/filenames.go:147  [n2,s2,pebble] 1  000035.sst:
F240105 18:43:53.285591 4473 3@pebble/internal/base/filenames.go:147  [n2,s2,pebble] 1 +orig err: open 000035.sst: file does not exist

A pebble file is missing after a crash+restart. I'll investigate, wonder if this could be a Pebble bug.

erikgrinaker · 2024-01-08T11:41:13Z

A pebble file is missing after a crash+restart. I'll investigate, wonder if this could be a Pebble bug.

Are these any of the sideloaded files, or just a "regular" LSM SST? Since we don't sync state application, I suppose it's plausible that we could lose a file that was recently flushed from the memtable, but the data is still there in the WAL (and maybe also in the memtable after restart, but not yet flushed).

Have we seen any failures like these in the nightlies?

erikgrinaker · 2024-01-08T11:42:45Z

Ah nevermind, this is an actual Pebble error, not a failed test assertion. Yeah, I'd pull in storage for a look.

pav-kv · 2024-01-08T12:01:17Z

Hasn't been caught in nightlies, but has occurred twice on 23.1, on this PR.

The strict MemFS can be used for emulating crashes, and testing for data durability loss that they may cause if fsync is not used correctly. Epic: none Release note: none

The StickyVFSRegistry is only used with in-memory vfs.MemFS for now. Return the concrete type from the Get method directly, to avoid unnecessary casts on the caller site. Epic: none Release note: none

Epic: none Release note: none

This commit adds a couple of interceptors into the raft proposal's lifetime: one allows blocking the command application flow (without blocking the entry commit flow), another allows intercepting commands after applying them to storage and applying the in-memory side effects. Both interceptors are useful in the next commits that reproduce a problem in raft log's sideloaded storage. Epic: none Release note: none

This commit adds a knob that allows tests to disable the randomization which sometimes makes asynchronous log appends synchronous. This is necessary for tests that rely on the async log appends. Epic: none Release note: none

To decide whether log truncation requires sync, we need to check whether there are any sideloaded entries in the removed log entries range. This commit adds the corresponding helper method to the sideloaded storage. The existing BytesIfTruncatedFromTo method could be used instead, but it is more expensive: it fetches stats for all the files it scans, whereas we just need an existence check, and we can terminate the scan early. Epic: none Release note: none

A typical raft log truncation command removes log entries and updates the state machine in a single Pebble batch. This ensures that the state machine's TruncatedState is always up-to-date with the log. However, if some of the truncated entries are stored in the sideloaded storage (typically AddSST commands), the corresponding files are removed separately from the Pebble batch (when applying side effects). If the batch is not synced before the files are removed, a process crash may cause an inconsistent state: the TruncatedState in the state machine does not reflect the removal of the entries, but the entry files were removed. After restart, raft may try to load these entries in order to apply them or send to other replicas. It will find them missing and crash loop. This commit syncs the state machine upon applying truncation commands that remove at least one sideloaded entry, to prevent the issue above. Epic: none Release note (bug fix): this commit fixes a bug in raft log truncation that could lead to crash loops, and unrecoverable loss of quorum in the unlikely worst case that all replicas enter this crash loop. The bug manifests when a few things coincide. The cluster is running a bulk write workload (e.g. schema change, import, restore), a log truncation command is running, and CRDB process crashes at an unfortunate moment (e.g. the process is killed, or kills itself for reasons like when it detects a disk stall).

This commit adds a regression test for the issue fixed in the previous commit. Without the commit, the test panics because the sideloaded storage loses entries which the follower Replica will try to apply after a restart. For a detailed explanation of the scenario, see the previous commit and the comment in the test. Epic: none Release note: none

pav-kv · 2024-01-10T13:08:54Z

Fixed the flakiness (required backporting #117523 first).

pav-kv · 2024-01-10T13:42:47Z

Now there is a legitimate flake. Looks similar to #114191 (comment).

Maybe the 23.1 circuit breakers don't do such a good job at isolating the node as the new ones in master.

@erikgrinaker Do you know: when we trip a circuit breaker, does it try to reconnect and untrip? I'd like to permanently trip it.

erikgrinaker · 2024-01-10T14:04:33Z

Maybe the 23.1 circuit breakers don't do such a good job at isolating the node as the new ones in master.

That's probably true, the circuit breakers use a different implementation in 23.2.

The test is flaky. Possibly due to the circuit breakers auto-recovering after being tripped, whereas we need to permanently trip them. Epic: none Release note: none

pav-kv · 2024-01-15T13:33:46Z

Skipping the test under stress to land this fix. Filed #117785 to fix the flakiness (likely has to do with the circuit breaker behaviours).

blathers-crl bot added the backport Label PR's that are backports to older release branches label Jan 3, 2024

pav-kv force-pushed the backport23.1-115595-114191 branch from 84ed82d to 50b733a Compare January 5, 2024 18:13

pav-kv requested review from erikgrinaker and nvanbenschoten January 5, 2024 18:14

pav-kv marked this pull request as ready for review January 5, 2024 18:14

pav-kv requested review from a team as code owners January 5, 2024 18:14

erikgrinaker approved these changes Jan 8, 2024

View reviewed changes

pav-kv added 8 commits January 10, 2024 11:27

server: support strict MemFS in StickyVFSRegistry

e81ce54

The strict MemFS can be used for emulating crashes, and testing for data durability loss that they may cause if fsync is not used correctly. Epic: none Release note: none

server: return concrete MemFS from StickyVFSRegistry

787dc1a

The StickyVFSRegistry is only used with in-memory vfs.MemFS for now. Return the concrete type from the Get method directly, to avoid unnecessary casts on the caller site. Epic: none Release note: none

kvserver: document state machine sync nuances

7875dff

Epic: none Release note: none

logstore: option to disable log append randomization

066385e

This commit adds a knob that allows tests to disable the randomization which sometimes makes asynchronous log appends synchronous. This is necessary for tests that rely on the async log appends. Epic: none Release note: none

pav-kv force-pushed the backport23.1-115595-114191 branch from 50b733a to 1b1da17 Compare January 10, 2024 13:07

pav-kv mentioned this pull request Jan 15, 2024

kvserver: TestCrashWhileTruncatingSideloadedEntries fails under stress #117785

Open

kvserver: skip sideloaded crash test under stress

5b96a57

The test is flaky. Possibly due to the circuit breakers auto-recovering after being tripped, whereas we need to permanently trip them. Epic: none Release note: none

pav-kv merged commit fccf4ff into cockroachdb:release-23.1 Jan 15, 2024
5 of 6 checks passed

pav-kv deleted the backport23.1-115595-114191 branch January 15, 2024 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-23.1: kvserver: sync before removing sideloaded files #117299

release-23.1: kvserver: sync before removing sideloaded files #117299

pav-kv commented Jan 3, 2024 •

edited

blathers-crl bot commented Jan 3, 2024 •

edited by pav-kv

cockroach-teamcity commented Jan 3, 2024

pav-kv commented Jan 3, 2024 •

edited

pav-kv commented Jan 8, 2024 •

edited

erikgrinaker commented Jan 8, 2024 •

edited

erikgrinaker commented Jan 8, 2024 •

edited

pav-kv commented Jan 8, 2024

pav-kv commented Jan 10, 2024

pav-kv commented Jan 10, 2024 •

edited

erikgrinaker commented Jan 10, 2024

pav-kv commented Jan 15, 2024

release-23.1: kvserver: sync before removing sideloaded files #117299

release-23.1: kvserver: sync before removing sideloaded files #117299

Conversation

pav-kv commented Jan 3, 2024 • edited

blathers-crl bot commented Jan 3, 2024 • edited by pav-kv

cockroach-teamcity commented Jan 3, 2024

pav-kv commented Jan 3, 2024 • edited

pav-kv commented Jan 8, 2024 • edited

erikgrinaker commented Jan 8, 2024 • edited

erikgrinaker commented Jan 8, 2024 • edited

pav-kv commented Jan 8, 2024

pav-kv commented Jan 10, 2024

pav-kv commented Jan 10, 2024 • edited

erikgrinaker commented Jan 10, 2024

pav-kv commented Jan 15, 2024

pav-kv commented Jan 3, 2024 •

edited

blathers-crl bot commented Jan 3, 2024 •

edited by pav-kv

pav-kv commented Jan 3, 2024 •

edited

pav-kv commented Jan 8, 2024 •

edited

erikgrinaker commented Jan 8, 2024 •

edited

erikgrinaker commented Jan 8, 2024 •

edited

pav-kv commented Jan 10, 2024 •

edited