Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.12.x] LMDB recovery #3886

Merged
merged 13 commits into from Oct 20, 2019
Merged

Conversation

vpodzime
Copy link
Contributor

@vpodzime vpodzime commented Oct 9, 2019

3.12.x version of #3873

Merge together:
#3886
https://github.com/cfengine/enterprise/pull/554

@lgtm-com
Copy link

lgtm-com bot commented Oct 9, 2019

This pull request introduces 3 alerts and fixes 14 when merging ae7739e into 7bce623 - view on LGTM.com

new alerts:

  • 2 for Pointer argument is dereferenced without checking for NULL
  • 1 for Time-of-check time-of-use filesystem race condition

fixed alerts:

  • 8 for Pointer argument is dereferenced without checking for NULL
  • 6 for Unterminated variadic call

Instead of relying on them being zeroes. Having random values in
the 'start' and 'len' fields can result in issues.

(cherry picked from commit 697f84c)
It's useful for all the cf-check commands that support options.

(cherry picked from commit ddf7b6c)
All cf-check commands working with LMDB have to deal with its
error codes.

(cherry picked from commit 17075e9)
Try to backup all files and report how many failed to be backed
up.

(cherry picked from commit e636426)
Backing up an LMDB file can be done in two ways:

1) Copy the contents of the LMDB file to a new place.

2) Read the contents of the LMDB file and write them into a new
   LMDB file.

While option 1) copies over all the potential inconsistencies in
the LMDB file, option 2) only copies the readable data and the
new copy should be in a consistent state.

Changelog: --dump option was added to the 'cf-check backup' command
(cherry picked from commit a701645)
'cf-check repair' should try to preserve the data that can be
preserved. So if an LMDB file is corrupted, it should try to read
as much as it can from it and write it into a new file and then
replace the original file with the new copy. Hence
`replicate_lmdb_file()` and then `rename(orig, new_copy)`.

Also, add a new '--force' option to 'cf-check repair' to repair
all the files no matter if they are reported as corrupted or not.

Ticket: CFE-3127
Changelog: 'cf-check repair' now preserves readable data in corrupted LMDB files
(cherry picked from commit 192e930)
'repair_file()' sounds quite generic and these functions are in
pulled in to the global namespace of cf-check as well as some
other binaries.

(cherry picked from commit eade46d)
So that it can be used in cases where the code wants to wait for
the lock or when it wants to acquire it and act differently when
it cannot do so.

(cherry picked from commit 3f45d6b)
To make it easy to check if the current process already holds a
lock for the given file descriptor.

(cherry picked from commit 1726f30)
Shared locks are a kind of an optimisation because multiple
processes can hold them at the same time and thus not block each
other.

(cherry picked from commit 714faa9)
Knowing when a DB was opened last time can be helpful in
corruption handling -- we should only repair the DB once, other
processes using the corrupted DB should just restart/reopen the
DB.

Also add a function for getting a DBHandle based on a DB file
name.

Ticket: CFE-3127
Changelog: None
(cherry picked from commit a6608f5)
'cf-agent' and some of our daemons check if LMDB files are OK
when starting. However, this check is not good enough to detect
all the problems (it's just trying to read the DB contents) and
corruptions can also happen while the process is running.

We should try to repair the corrupted LMDB files when the
corruption is detected and reported by LMDB itself (on an
assertion failure in LMDB).

Ticket: CFE-3127
Changelog: Corrupted LMDB files are now automatically repaired
(cherry picked from commit a0be2da)

Conflicts:
  libcompat/clock_gettime.c
  libutils/encode.h
  tests/unit/Makefile.am
The `Seq **corrupt` argument of the diagnose_files() function is
an out parameter, a place to store the sequence of detected
corrupted files. It can be NULL if the caller only wants to get
the number of such corrupted files, but otherwise it doesn't have
to be initialized. In fact, it should not be initialized in most
cases to not confuse the "used but not initialized" compiler
check.

Instead of `assert(corrupt == NULL | *corrupt == NULL)`, let's
just decribe the meaning of parameters and return value in a
doxygen comment. And make sure a new sequence is created for the
out value.

Also mark the first parameter (`Seq *files`) as `const`, because
the sequence is not modified by the function.

(cherry picked from commit cf25b56)
@vpodzime
Copy link
Contributor Author

@cf-bottom jenkins with exotics, please

@cf-bottom
Copy link

@lgtm-com
Copy link

lgtm-com bot commented Oct 16, 2019

This pull request introduces 3 alerts and fixes 14 when merging d4235e3 into 68dd46f - view on LGTM.com

new alerts:

  • 2 for Pointer argument is dereferenced without checking for NULL
  • 1 for Time-of-check time-of-use filesystem race condition

fixed alerts:

  • 8 for Pointer argument is dereferenced without checking for NULL
  • 6 for Unterminated variadic call

@vpodzime
Copy link
Contributor Author

@cf-bottom jenkins, please. And this time with the enterprise PR, please.

@cf-bottom
Copy link

@vpodzime vpodzime merged commit 2e7cfaa into cfengine:3.12.x Oct 20, 2019
@vpodzime vpodzime deleted the 3.12.x-lmdb_recovery branch October 20, 2019 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants