-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
logstore: sync sideloaded storage directories #114616
Commits on Nov 15, 2023
-
logstore: rm test-only DiskSideloadStorage field
This commit removes the dirCreated field of DiskSideloadStorage, because it is only used in tests, and is reduntant (directory existence check already does the job). Epic: none Release note: none
Configuration menu - View commit details
-
Copy full SHA for 4d1f243 - Browse repository at this point
Copy the full SHA 4d1f243View commit details
Commits on Nov 16, 2023
-
logstore: inline dir creation in DiskSideloadStorage
Epic: none Release note: none
Configuration menu - View commit details
-
Copy full SHA for 4447f38 - Browse repository at this point
Copy the full SHA 4447f38View commit details
Commits on Nov 21, 2023
-
logstore: add TODOs in sideloaded storage
A couple of things to address in the future: sideloaded files removal should happen strictly after a state machine sync; sideloaded files and directories should be cleaned up on startup because their removal is not always durable. Epic: none Release note: none
Configuration menu - View commit details
-
Copy full SHA for ebbb64c - Browse repository at this point
Copy the full SHA ebbb64cView commit details
Commits on Nov 29, 2023
-
logstore: repro files loss in sideloaded storage
This commit adds `TestSideloadStorageSync` which demonstrates that the sideloaded log storage can lose files and directories upon system crash. This is due to the fact that the whole directory hierarchy is not properly synced when the directories and files are created. A typical sideloaded storage file (entry 123 at term 44 for range r1234) looks like: `<data-dir>/auxiliary/sideloading/r1XXX/r1234/i123.t44`. Only existence of auxiliary directory is persisted upon its creation, by syncing the <data-dir> when Pebble initializes the store. All other directories (sideloading, r1XXX, r1234) are not persisted upon creation. Epic: none Release note: none
Configuration menu - View commit details
-
Copy full SHA for 72995ed - Browse repository at this point
Copy the full SHA 72995edView commit details
Commits on Dec 6, 2023
-
logstore: sync sideloaded directories creation
The sideloaded log storage does not sync the hierarchy of directories it creates. This can potentially lead to full or partial loss of its sub-directories in case of a system crash or power off. After this commit, every time sideloaded storage creates a directory, it syncs its parent so that the reference is durable. Epic: none Release note: none
Configuration menu - View commit details
-
Copy full SHA for 715bd0d - Browse repository at this point
Copy the full SHA 715bd0dView commit details -
logstore: sync sideloaded files creation
The sideloaded storage fsyncs the files that it creates. Even though this guarantees durability of the files content and metadata, this still does not guarantee that the references to these files are durable. For example, Linux man page for fsync [^1] says: ``` Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed. ``` It means that these files can be lost after a system crash of power off. This leads to issues: 1. The storage syncs are not atomic with the sideloaded files syncs. It is thus possible that raft log metadata "references" a sideloaded file and gets synced, but the file is not yet. A power off/on at this point leads to an internal inconsistency, and can result in a crash loop when raft will try to load these entries to apply and/or send to other replicas. 2. The durability of entry files is used as a pre-condition to sending raft messages that trigger committing these entries. A coordinated power off/on on a majority of replicas can thus lead to losing committed entries and unrecoverable loss-of-quorum. This commit fixes the above issues, by syncing the parent directory after writing sideloaded entry files. The natural point for this is MaybeSideloadEntries on the handleRaftReady path. [^1]: https://man7.org/linux/man-pages/man2/fsync.2.html Epic: none Release note (bug fix): this commit fixes a durability bug in raft log storage, caused by incorrect syncing of filesystem metadata. It was possible to lose writes of a particular kind (AddSSTable) that is e.g. used by RESTORE. This loss was possible only under power-off or OS crash conditions. As a result, CRDB could enter a crash loop on restart. In the worst case of a coordinated power-off/crash across multiple nodes this could lead to an unrecoverable loss of quorum.
Configuration menu - View commit details
-
Copy full SHA for 0591d3f - Browse repository at this point
Copy the full SHA 0591d3fView commit details