Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vadp-dumper: fix multithreaded backup/restore issues #1593

Merged
merged 28 commits into from
Dec 13, 2023

Conversation

sebsura
Copy link
Contributor

@sebsura sebsura commented Nov 9, 2023

Thank you for contributing to the Bareos Project!

This pr fixes some bugs of the vadp_dumper, mostly related to its multithreaded mode. For example:

  • fixed multithreaded writing trying to read from stdout instead of stdin
  • fixed runtime value of sectors_per_call not being taken into account when allocating buffers
  • fixed multiple data races related to memory reuse when using multithreaded mode

Additionally this PR adds support for the QueryAllocatedBlocks function in the following way:
Previously we backed up every block that was reported by vmware to have changed since some point
in time (or at all in case of full backups). This list can contain blocks that were once changed but are now
not needed anymore which we previously backed up needlessly. As such we now ask vmware to also give us a list
of all allocated (i.e. needed) blocks and backup only sectors that are both part of a allocated block and part of a
changed block.

Please check

  • Short description and the purpose of this PR is present above this paragraph
  • Your name is present in the AUTHORS file (optional)

If you have any questions or problems, please give a comment in the PR.

Helpful documentation and best practices

Checklist for the reviewer of the PR (will be processed by the Bareos team)

Make sure you check/merge the PR using devtools/pr-tool to have some simple automated checks run and a proper changelog record added.

General
  • Is the PR title usable as CHANGELOG entry?
  • Purpose of the PR is understood
  • Commit descriptions are understandable and well formatted
  • Required backport PRs have been created
Source code quality
  • Source code changes are understandable
  • Variable and function names are meaningful
  • Code comments are correct (logically and spelling)
  • Required documentation changes are present and part of the PR

@sebsura sebsura added the draft label Nov 9, 2023
@sebsura sebsura force-pushed the dev/ssura/master/fix-vadp-dumper branch 6 times, most recently from 3971a50 to a0afab7 Compare November 13, 2023 07:15
@sebsura sebsura marked this pull request as draft November 16, 2023 11:12
@sebsura sebsura force-pushed the dev/ssura/master/fix-vadp-dumper branch from a552473 to 2fecb27 Compare November 28, 2023 12:42
@sebsura sebsura removed the draft label Nov 28, 2023
@sebsura sebsura marked this pull request as ready for review November 28, 2023 12:50
@sebsura sebsura force-pushed the dev/ssura/master/fix-vadp-dumper branch from 2fecb27 to 0b5d769 Compare December 11, 2023 10:47
Neither write nor read guarantee that they will finish everything in
one call.  One has to repeatedly call them in a loop to ensure that
everything gets written/read.
This code path is only taken if multithreaded restore is selected.
Since the input size is a variable, we need to ensure that we hold
enough capacity to actually read the data into the buffer.  Otherwise,
if we get unlucky, we will run into the situation were the first few
cbts are very small, and the rest are really big, ensuring a buffer
overflow; possibly causing a crash.
We need to ensure that the writer does not write into the block we are
currently reading.  Currently we only distinguish between data thats
inside the queue (which can neither be written to nor read from) and
data thats outside the queue (which can _both_ be written to and read
from).
To fix this data race we split the data into three categories instead:
- The head of the queue (which can only be read from)
- The rest of the queue (which can neither be read from nor written
to)
- The data outside the queue (which can only be written to)
To accomplish this we split the normal dequeue() function into two
parts.  The normal dequeue() becomes peek-like, only returning the
head element, and the new function FinishDeq() does the actual
dequeuing.
The general gist before was this:

* copy thread:
- Wait for start
- Wait for dequeue
  - if data: handle data; loop
  - if no data: break inner (this only happens on flush)
- `signal` that queue was fushed and that the thread is waiting
  for work
- goto begin
* main thread
- do some setup
- start enqueueing data; signal start to thread if not started
  (this was done with an unsynchronized read!)
- flush when done
- During cleanup, cancel the thread while it is waiting on a
  start signal.
with some locks & unlocks sprinkled in.  One should note that the
cleanup copy thread callback always unlockes the lock regardless of
whether the copy thread locked the mutex or not!

This was slightly changed:
* Clearly there is no need to wait for "start" at all, since we wait on
  the dequeue anyways -- so this was removed.
* Instead of just canceling the thread, we set a flag that tells the
  Copythread to exit, which the thread checks after every flush.  As
  such we also make sure flush the queue on cleanup.
* We also now properly initialize all thread_context members!
@sebsura sebsura force-pushed the dev/ssura/master/fix-vadp-dumper branch 2 times, most recently from a6c996d to 1cd8ec5 Compare December 13, 2023 07:14
Copy link
Member

@sduehr sduehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, I'd like to suggest the following changes:

  1. When passing -s 0 to bareos_vadp_dumper it excessively consumes cpu and memory, so that value should be rejected.
  2. Output like the following should only appear when -v was passed:
Making slot 3 bigger (4194304 -> 8388608)
Making slot 15 bigger (6291456 -> 8388608)
Changed len: 3471835136, Saved len: 3471835136

@sebsura
Copy link
Contributor Author

sebsura commented Dec 13, 2023

The same will probably happen with -s 1, -s 2, ... (alltough -s 0 is the worst). What do you think a good minimum for the sectors_per_call value would be ? I imagine we could use the old value as minimum, i.e. 1024.

@sebsura sebsura force-pushed the dev/ssura/master/fix-vadp-dumper branch 2 times, most recently from d91ff72 to 043e3b2 Compare December 13, 2023 09:44
@sebsura
Copy link
Contributor Author

sebsura commented Dec 13, 2023

I made it reject all numbers <= 0 as well as incorrect inputs (so -s blabla should now also not work)
I also hid/removed the messages. The slot messages were just left over debug messages; whenever such a message was printed, vadp-dumper wouldve written to unallocated memory before.

@arogge
Copy link
Member

arogge commented Dec 13, 2023

pr-tool said the headline of commit "vadp-dumper: add support for VixDiskLib_QueryAllocatedBlocks()" is too long.

This function returns -- up to a certain amount of precision -- a list
of intervals of all sectors containing useful (i.e. allocated) data.

For full backups its recommended to backup only the allocated blocks;
for other kinds of backups (incremental/differential) it is only
necessary to backup those sectors that are both changed & allocated.

Since we currently cannot take advantage of information regarding
unallocated, changed blocks, we just ignore them.  In the future these
could be used for faster restores & consolidation.

The algorithm used for finding out the intersection is trivial as both
lists are sorted (and the intervals themselves are disjoint in each
set).  As such it is enough to just go through both arrays linearly at
the same time, sending the pairwise intersection if any, and finally
advancing  the pointer to the "smaller" interval.
Since we cannot meaningfully recover from a mutex error, we should
just assert that they do not happen.
@sebsura sebsura force-pushed the dev/ssura/master/fix-vadp-dumper branch from 043e3b2 to 923c25f Compare December 13, 2023 12:16
Copy link
Member

@sduehr sduehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, looks good now.

@BareosBot BareosBot merged commit 7fe4d5f into bareos:master Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants