Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-23.2: logstore: sync sideloaded storage directories #115709

Merged
merged 6 commits into from Dec 7, 2023

Commits on Nov 15, 2023

  1. logstore: rm test-only DiskSideloadStorage field

    This commit removes the dirCreated field of DiskSideloadStorage, because
    it is only used in tests, and is reduntant (directory existence check
    already does the job).
    
    Epic: none
    Release note: none
    pav-kv committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    7c3b91a View commit details
    Browse the repository at this point in the history
  2. logstore: inline dir creation in DiskSideloadStorage

    Epic: none
    Release note: none
    pav-kv committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    f207338 View commit details
    Browse the repository at this point in the history
  3. logstore: sync sideloaded directories creation

    The sideloaded log storage does not sync the hierarchy of directories it
    creates. This can potentially lead to full or partial loss of its
    sub-directories in case of a system crash or power off.
    
    After this commit, every time sideloaded storage creates a directory, it
    syncs its parent so that the reference is durable.
    
    Epic: none
    Release note: none
    pav-kv committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    05793cd View commit details
    Browse the repository at this point in the history

Commits on Nov 21, 2023

  1. logstore: add TODOs in sideloaded storage

    A couple of things to address in the future: sideloaded files removal
    should happen strictly after a state machine sync; sideloaded files and
    directories should be cleaned up on startup because their removal is not
    always durable.
    
    Epic: none
    Release note: none
    pav-kv committed Nov 21, 2023
    Configuration menu
    Copy the full SHA
    ce25a9f View commit details
    Browse the repository at this point in the history

Commits on Nov 29, 2023

  1. logstore: sync sideloaded files creation

    The sideloaded storage fsyncs the files that it creates. Even though
    this guarantees durability of the files content and metadata, this still
    does not guarantee that the references to these files are durable. For
    example, Linux man page for fsync [^1] says:
    
    ```
    Calling fsync() does not necessarily ensure that the entry in the
    directory containing the file has also reached disk.  For that an
    explicit fsync() on a file descriptor for the directory is also
    needed.
    ```
    
    It means that these files can be lost after a system crash of power off.
    This leads to issues:
    
    1. The storage syncs are not atomic with the sideloaded files syncs. It
       is thus possible that raft log metadata "references" a sideloaded
       file and gets synced, but the file is not yet. A power off/on at
       this point leads to an internal inconsistency, and can result in a
       crash loop when raft will try to load these entries to apply and/or
       send to other replicas.
    
    2. The durability of entry files is used as a pre-condition to sending
       raft messages that trigger committing these entries. A coordinated
       power off/on on a majority of replicas can thus lead to losing
       committed entries and unrecoverable loss-of-quorum.
    
    This commit fixes the above issues, by syncing the parent directory
    after writing sideloaded entry files. The natural point for this is
    MaybeSideloadEntries on the handleRaftReady path.
    
    [^1]: https://man7.org/linux/man-pages/man2/fsync.2.html
    
    Epic: none
    
    Release note (bug fix): this commit fixes a durability bug in raft log
    storage, caused by incorrect syncing of filesystem metadata. It was
    possible to lose writes of a particular kind (AddSSTable) that is e.g.
    used by RESTORE. This loss was possible only under power-off or OS crash
    conditions. As a result, CRDB could enter a crash loop on restart. In
    the worst case of a coordinated power-off/crash across multiple nodes
    this could lead to an unrecoverable loss of quorum.
    pav-kv committed Nov 29, 2023
    Configuration menu
    Copy the full SHA
    f8af9ff View commit details
    Browse the repository at this point in the history
  2. logstore: repro files loss in sideloaded storage

    This commit adds `TestSideloadStorageSync` which demonstrates that the
    sideloaded log storage can lose files and directories upon system crash.
    This is due to the fact that the whole directory hierarchy is not
    properly synced when the directories and files are created.
    
    A typical sideloaded storage file (entry 123 at term 44 for range r1234)
    looks like: `<data-dir>/auxiliary/sideloading/r1XXX/r1234/i123.t44`.
    
    Only existence of auxiliary directory is persisted upon its creation, by
    syncing the <data-dir> when Pebble initializes the store. All other
    directories (sideloading, r1XXX, r1234) are not persisted upon creation.
    
    Epic: none
    Release note: none
    pav-kv committed Nov 29, 2023
    Configuration menu
    Copy the full SHA
    daa35be View commit details
    Browse the repository at this point in the history