New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[3.12.x] LMDB recovery #3886
[3.12.x] LMDB recovery #3886
Conversation
This pull request introduces 3 alerts and fixes 14 when merging ae7739e into 7bce623 - view on LGTM.com new alerts:
fixed alerts:
|
Instead of relying on them being zeroes. Having random values in the 'start' and 'len' fields can result in issues. (cherry picked from commit 697f84c)
It's useful for all the cf-check commands that support options. (cherry picked from commit ddf7b6c)
All cf-check commands working with LMDB have to deal with its error codes. (cherry picked from commit 17075e9)
Try to backup all files and report how many failed to be backed up. (cherry picked from commit e636426)
Backing up an LMDB file can be done in two ways: 1) Copy the contents of the LMDB file to a new place. 2) Read the contents of the LMDB file and write them into a new LMDB file. While option 1) copies over all the potential inconsistencies in the LMDB file, option 2) only copies the readable data and the new copy should be in a consistent state. Changelog: --dump option was added to the 'cf-check backup' command (cherry picked from commit a701645)
'cf-check repair' should try to preserve the data that can be preserved. So if an LMDB file is corrupted, it should try to read as much as it can from it and write it into a new file and then replace the original file with the new copy. Hence `replicate_lmdb_file()` and then `rename(orig, new_copy)`. Also, add a new '--force' option to 'cf-check repair' to repair all the files no matter if they are reported as corrupted or not. Ticket: CFE-3127 Changelog: 'cf-check repair' now preserves readable data in corrupted LMDB files (cherry picked from commit 192e930)
'repair_file()' sounds quite generic and these functions are in pulled in to the global namespace of cf-check as well as some other binaries. (cherry picked from commit eade46d)
So that it can be used in cases where the code wants to wait for the lock or when it wants to acquire it and act differently when it cannot do so. (cherry picked from commit 3f45d6b)
To make it easy to check if the current process already holds a lock for the given file descriptor. (cherry picked from commit 1726f30)
Shared locks are a kind of an optimisation because multiple processes can hold them at the same time and thus not block each other. (cherry picked from commit 714faa9)
Knowing when a DB was opened last time can be helpful in corruption handling -- we should only repair the DB once, other processes using the corrupted DB should just restart/reopen the DB. Also add a function for getting a DBHandle based on a DB file name. Ticket: CFE-3127 Changelog: None (cherry picked from commit a6608f5)
'cf-agent' and some of our daemons check if LMDB files are OK when starting. However, this check is not good enough to detect all the problems (it's just trying to read the DB contents) and corruptions can also happen while the process is running. We should try to repair the corrupted LMDB files when the corruption is detected and reported by LMDB itself (on an assertion failure in LMDB). Ticket: CFE-3127 Changelog: Corrupted LMDB files are now automatically repaired (cherry picked from commit a0be2da) Conflicts: libcompat/clock_gettime.c libutils/encode.h tests/unit/Makefile.am
The `Seq **corrupt` argument of the diagnose_files() function is an out parameter, a place to store the sequence of detected corrupted files. It can be NULL if the caller only wants to get the number of such corrupted files, but otherwise it doesn't have to be initialized. In fact, it should not be initialized in most cases to not confuse the "used but not initialized" compiler check. Instead of `assert(corrupt == NULL | *corrupt == NULL)`, let's just decribe the meaning of parameters and return value in a doxygen comment. And make sure a new sequence is created for the out value. Also mark the first parameter (`Seq *files`) as `const`, because the sequence is not modified by the function. (cherry picked from commit cf25b56)
ae7739e
to
d4235e3
Compare
@cf-bottom jenkins with exotics, please |
Sure, I triggered a build: (with exotics) Jenkins: https://ci.cfengine.com/job/pr-pipeline/3000/ Packages: http://buildcache.cfengine.com/packages/testing-pr/jenkins-pr-pipeline-3000/ |
This pull request introduces 3 alerts and fixes 14 when merging d4235e3 into 68dd46f - view on LGTM.com new alerts:
fixed alerts:
|
@cf-bottom jenkins, please. And this time with the enterprise PR, please. |
Sure, I triggered a build: Jenkins: https://ci.cfengine.com/job/pr-pipeline/3017/ Packages: http://buildcache.cfengine.com/packages/testing-pr/jenkins-pr-pipeline-3017/ |
3.12.x version of #3873
Merge together:
#3886
https://github.com/cfengine/enterprise/pull/554