New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qvm-backup-restore is slow #2986

Closed
woju opened this Issue Aug 7, 2017 · 32 comments

Comments

@woju
Member

woju commented Aug 7, 2017

Qubes OS version (e.g., R3.2): R4.0

qvm-backup-restore pipes internally whole backup tarfile at least three times (header, qubes.xml, data). This is painful on big backups and frustrating if something goes wrong at a later stage for large (~1TB) backups. (cf. bliviet)

Ideally reading should be streamlined to at most two passes: 1) header + hmac check; 2) the rest. Tarfile layout should explicitly support this, if it does not already.

Cc: @marmarek

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Aug 7, 2017

Member

It does this only when you connect backup source directly to dom0, which you shouldn't do.

Member

marmarek commented Aug 7, 2017

It does this only when you connect backup source directly to dom0, which you shouldn't do.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Aug 7, 2017

Member

And for the dom0 case, this is limitation of tar. If you find a way tell tar to stop reading the archive, when all the files mentioned on command line are found, that would greatly improve the situation. Otherwise, alternative, hacky, solution is to keep track of restored files in python and kill the tar process when it get all requested files - something I'd like to avoid.

As for your idea - it isn't that simple, because you know what files to expect (including qubes.xml + qubes.xml.hmac vs qubes.xml.enc) only after retrieving backup header.

Member

marmarek commented Aug 7, 2017

And for the dom0 case, this is limitation of tar. If you find a way tell tar to stop reading the archive, when all the files mentioned on command line are found, that would greatly improve the situation. Otherwise, alternative, hacky, solution is to keep track of restored files in python and kill the tar process when it get all requested files - something I'd like to avoid.

As for your idea - it isn't that simple, because you know what files to expect (including qubes.xml + qubes.xml.hmac vs qubes.xml.enc) only after retrieving backup header.

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju Aug 7, 2017

Member

I did it from separate VM, granting it (unsuccessfully at first, that's how I know) necessary Admin API permissions. I felt it does three passes.

For some ideas how to do that, I'd rewrite it using tarfile and I'd forget about the part of spec that says the file that counts is the last one in the archive.

I probably know nothing about corner cases, but this doesn't take 2.5 hours:

import tarfile
t = tarfile.open('qubes-backup', ignore_zeros=True)

ti = t.next()
assert ti.name == 'backup-header'
backup_header = t.extractfile(ti).read()

ti = t.next()
assert ti.name == 'backup-header.hmac'
backup_header_hmac = t.extractfile(ti).read()
Member

woju commented Aug 7, 2017

I did it from separate VM, granting it (unsuccessfully at first, that's how I know) necessary Admin API permissions. I felt it does three passes.

For some ideas how to do that, I'd rewrite it using tarfile and I'd forget about the part of spec that says the file that counts is the last one in the archive.

I probably know nothing about corner cases, but this doesn't take 2.5 hours:

import tarfile
t = tarfile.open('qubes-backup', ignore_zeros=True)

ti = t.next()
assert ti.name == 'backup-header'
backup_header = t.extractfile(ti).read()

ti = t.next()
assert ti.name == 'backup-header.hmac'
backup_header_hmac = t.extractfile(ti).read()
@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Aug 7, 2017

Member

While this may work for backup header, loading actual backup data into RAM is REALLY BAD IDEA (you've mentioned 1TB, no?). And having two different methods for extracting and verifying backup header and backup data is asking for bug in either of them.

Member

marmarek commented Aug 7, 2017

While this may work for backup header, loading actual backup data into RAM is REALLY BAD IDEA (you've mentioned 1TB, no?). And having two different methods for extracting and verifying backup header and backup data is asking for bug in either of them.

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju Aug 7, 2017

Member

Hmmm.

Process Process-2:2:
Traceback (most recent call last):
  ...
  File "/usr/lib/python3/dist-packages/qubesadmin/backup/restore.py", line 1698, in _handle_appmenus_list
    stdin=stream)
  ...
FileNotFoundError: [Errno 2] No such file or directory: 'qvm-appmenus'
qubesadmin.backup.extract: Error while processing ...

and the same for all vms.

Looks like another 6 hours till I get back.

Member

woju commented Aug 7, 2017

Hmmm.

Process Process-2:2:
Traceback (most recent call last):
  ...
  File "/usr/lib/python3/dist-packages/qubesadmin/backup/restore.py", line 1698, in _handle_appmenus_list
    stdin=stream)
  ...
FileNotFoundError: [Errno 2] No such file or directory: 'qvm-appmenus'
qubesadmin.backup.extract: Error while processing ...

and the same for all vms.

Looks like another 6 hours till I get back.

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju Aug 7, 2017

Member

It's not into RAM. t.extractfile gives a file-like object.

Member

woju commented Aug 7, 2017

It's not into RAM. t.extractfile gives a file-like object.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Aug 7, 2017

Member

Running qvm-backup-restore in dedicated VM = paranoid restore mode. Not supporting appmenus is somehow intentional missing feature.

Member

marmarek commented Aug 7, 2017

Running qvm-backup-restore in dedicated VM = paranoid restore mode. Not supporting appmenus is somehow intentional missing feature.

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju Aug 7, 2017

Member

Does it even work at all?

Member

woju commented Aug 7, 2017

Does it even work at all?

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Aug 7, 2017

Member

You tell me...

Member

marmarek commented Aug 7, 2017

You tell me...

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju Aug 7, 2017

Member

I tell you it doesn't. Just don't tell Joanna...

Member

woju commented Aug 7, 2017

I tell you it doesn't. Just don't tell Joanna...

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Aug 7, 2017

Member

It's not into RAM. t.extractfile gives a file-like object.

Ah, so you're seeking solution for performance related issues, by passing the whole blob through python?

Member

marmarek commented Aug 7, 2017

It's not into RAM. t.extractfile gives a file-like object.

Ah, so you're seeking solution for performance related issues, by passing the whole blob through python?

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju Aug 7, 2017

Member

No. I'm seeking solutions for performance-related issues by intentionally forgetting about relevant parts of tar file specification. If it takes python to misimplement it, so be it, if the alternative is to do it in C.

Member

woju commented Aug 7, 2017

No. I'm seeking solutions for performance-related issues by intentionally forgetting about relevant parts of tar file specification. If it takes python to misimplement it, so be it, if the alternative is to do it in C.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Aug 7, 2017

Member

In theory, tar --seek should also greatly improve performance (it should simply seek through all ignored files). But in also in theory it should be automatically enabled...

Member

marmarek commented Aug 7, 2017

In theory, tar --seek should also greatly improve performance (it should simply seek through all ignored files). But in also in theory it should be automatically enabled...

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju Aug 7, 2017

Member

No, it won't. According to python's tarfile module documentation, the file that counts under a given path is the last one in the archive (I think it this is that way to be able to do incremental backups on tape by simply appending new files). So you have to seek whole file anyway, especially with -i. GNU tar works with this assumption. I say we break that and hereby declare header and HMAC to be the first two files in the archive.

Member

woju commented Aug 7, 2017

No, it won't. According to python's tarfile module documentation, the file that counts under a given path is the last one in the archive (I think it this is that way to be able to do incremental backups on tape by simply appending new files). So you have to seek whole file anyway, especially with -i. GNU tar works with this assumption. I say we break that and hereby declare header and HMAC to be the first two files in the archive.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Aug 7, 2017

Member

But seeking through the whole archive (just reading file headers) also should be much faster, even for 1TB archive.
I'm checking tar --occurrence option right now, which should do exactly what we want.

Member

marmarek commented Aug 7, 2017

But seeking through the whole archive (just reading file headers) also should be much faster, even for 1TB archive.
I'm checking tar --occurrence option right now, which should do exactly what we want.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Aug 7, 2017

Member

For simple archive it helps. If you're brave enough, try adding --occurrence=1 to https://github.com/QubesOS/qubes-core-admin-client/blob/master/qubesadmin/backup/restore.py#L870

Member

marmarek commented Aug 7, 2017

For simple archive it helps. If you're brave enough, try adding --occurrence=1 to https://github.com/QubesOS/qubes-core-admin-client/blob/master/qubesadmin/backup/restore.py#L870

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju Aug 7, 2017

Member

Me? Not brave enough? Just hold me my beer...

Member

woju commented Aug 7, 2017

Me? Not brave enough? Just hold me my beer...

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Aug 7, 2017

Member

You should get backup summary in seconds.

Member

marmarek commented Aug 7, 2017

You should get backup summary in seconds.

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju Aug 7, 2017

Member

It got better!

Member

woju commented Aug 7, 2017

It got better!

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju Aug 7, 2017

Member

Hmmm. Manpage says default for that option is 1.

Maybe the truth is "avaiable in the texinfo format".

Member

woju commented Aug 7, 2017

Hmmm. Manpage says default for that option is 1.

Maybe the truth is "avaiable in the texinfo format".

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Aug 7, 2017

Member

That was also my understanding, but now I think --occurrence[=N] means "The default N is 1, if you specify --occurrence without N".

Member

marmarek commented Aug 7, 2017

That was also my understanding, but now I think --occurrence[=N] means "The default N is 1, if you specify --occurrence without N".

@andrewdavidwong andrewdavidwong added this to the Release 4.0 milestone Aug 8, 2017

@woju woju self-assigned this Aug 8, 2017

@woju

This comment has been minimized.

Show comment
Hide comment
@woju

woju Aug 8, 2017

Member

Guys, I'm going to fix the issue about --occurence, but it leaves the bug around qvm-appmenus, so I'm going to file another ticket for that.

Member

woju commented Aug 8, 2017

Guys, I'm going to fix the issue about --occurence, but it leaves the bug around qvm-appmenus, so I'm going to file another ticket for that.

marmarek added a commit to marmarek/qubes-core-admin-client that referenced this issue Aug 30, 2017

backup/restore: make backup header extraction faster
Abort tar process after extracting requested files - do not parse the
archive until the end (possibly tens of GB later).

Fixes QubesOS/qubes-issues#2986

@marmarek marmarek referenced this issue in QubesOS/qubes-core-admin-client Aug 30, 2017

Merged

Two fixes for qvm-backup-restore #26

@qubesos-bot

This comment has been minimized.

Show comment
Hide comment
@qubesos-bot

qubesos-bot Sep 14, 2017

Automated announcement from builder-github

The package python2-qubesadmin-4.0.6-0.1.fc25 has been pushed to the r4.0 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

Automated announcement from builder-github

The package python2-qubesadmin-4.0.6-0.1.fc25 has been pushed to the r4.0 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

@qubesos-bot qubesos-bot referenced this issue in QubesOS/updates-status Sep 14, 2017

Closed

core-admin-client v4.0.6 (r4.0) #208

@qubesos-bot

This comment has been minimized.

Show comment
Hide comment
@qubesos-bot

qubesos-bot Sep 14, 2017

Automated announcement from builder-github

The package python2-qubesadmin-4.0.6-0.1.fc24 has been pushed to the r4.0 testing repository for the Fedora fc24 template.
To test this update, please install it with the following command:

sudo yum update --enablerepo=qubes-vm-r4.0-current-testing

Changes included in this update

Automated announcement from builder-github

The package python2-qubesadmin-4.0.6-0.1.fc24 has been pushed to the r4.0 testing repository for the Fedora fc24 template.
To test this update, please install it with the following command:

sudo yum update --enablerepo=qubes-vm-r4.0-current-testing

Changes included in this update

@qubesos-bot

This comment has been minimized.

Show comment
Hide comment
@qubesos-bot

qubesos-bot Sep 14, 2017

Automated announcement from builder-github

The package qubes-core-admin-client_4.0.6-1+deb8u1 has been pushed to the r4.0 testing repository for the Debian jessie template.
To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list by uncommenting the line containing jessie-testing, then use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

Automated announcement from builder-github

The package qubes-core-admin-client_4.0.6-1+deb8u1 has been pushed to the r4.0 testing repository for the Debian jessie template.
To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list by uncommenting the line containing jessie-testing, then use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

@qubesos-bot

This comment has been minimized.

Show comment
Hide comment
@qubesos-bot

qubesos-bot Sep 14, 2017

Automated announcement from builder-github

The package python2-qubesadmin-4.0.6-0.1.fc25 has been pushed to the r4.0 testing repository for the Fedora fc25 template.
To test this update, please install it with the following command:

sudo yum update --enablerepo=qubes-vm-r4.0-current-testing

Changes included in this update

Automated announcement from builder-github

The package python2-qubesadmin-4.0.6-0.1.fc25 has been pushed to the r4.0 testing repository for the Fedora fc25 template.
To test this update, please install it with the following command:

sudo yum update --enablerepo=qubes-vm-r4.0-current-testing

Changes included in this update

@qubesos-bot

This comment has been minimized.

Show comment
Hide comment
@qubesos-bot

qubesos-bot Sep 14, 2017

Automated announcement from builder-github

The package qubes-core-admin-client_4.0.6-1+deb9u1 has been pushed to the r4.0 testing repository for the Debian stretch template.
To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list by uncommenting the line containing stretch-testing, then use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

Automated announcement from builder-github

The package qubes-core-admin-client_4.0.6-1+deb9u1 has been pushed to the r4.0 testing repository for the Debian stretch template.
To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list by uncommenting the line containing stretch-testing, then use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

@qubesos-bot

This comment has been minimized.

Show comment
Hide comment
@qubesos-bot

qubesos-bot Oct 30, 2017

Automated announcement from builder-github

The package qubes-core-admin-client_4.0.9-1+deb8u1 has been pushed to the r4.0 stable repository for the Debian jessie template.
To install this update, please use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

Automated announcement from builder-github

The package qubes-core-admin-client_4.0.9-1+deb8u1 has been pushed to the r4.0 stable repository for the Debian jessie template.
To install this update, please use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

@qubesos-bot

This comment has been minimized.

Show comment
Hide comment
@qubesos-bot

qubesos-bot Oct 30, 2017

Automated announcement from builder-github

The package qubes-core-admin-client_4.0.9-1+deb9u1 has been pushed to the r4.0 stable repository for the Debian stretch template.
To install this update, please use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

Automated announcement from builder-github

The package qubes-core-admin-client_4.0.9-1+deb9u1 has been pushed to the r4.0 stable repository for the Debian stretch template.
To install this update, please use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

@qubesos-bot

This comment has been minimized.

Show comment
Hide comment
@qubesos-bot

qubesos-bot Oct 30, 2017

Automated announcement from builder-github

The package python2-qubesadmin-4.0.9-0.1.fc24 has been pushed to the r4.0 stable repository for the Fedora fc24 template.
To install this update, please use the standard update command:

sudo yum update

Changes included in this update

Automated announcement from builder-github

The package python2-qubesadmin-4.0.9-0.1.fc24 has been pushed to the r4.0 stable repository for the Fedora fc24 template.
To install this update, please use the standard update command:

sudo yum update

Changes included in this update

@qubesos-bot

This comment has been minimized.

Show comment
Hide comment
@qubesos-bot

qubesos-bot Oct 30, 2017

Automated announcement from builder-github

The package python2-qubesadmin-4.0.9-0.1.fc25 has been pushed to the r4.0 stable repository for the Fedora fc25 template.
To install this update, please use the standard update command:

sudo yum update

Changes included in this update

Automated announcement from builder-github

The package python2-qubesadmin-4.0.9-0.1.fc25 has been pushed to the r4.0 stable repository for the Fedora fc25 template.
To install this update, please use the standard update command:

sudo yum update

Changes included in this update

@qubesos-bot

This comment has been minimized.

Show comment
Hide comment
@qubesos-bot

qubesos-bot Oct 30, 2017

Automated announcement from builder-github

The package python2-qubesadmin-4.0.9-0.1.fc25 has been pushed to the r4.0 stable repository for dom0.
To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

Automated announcement from builder-github

The package python2-qubesadmin-4.0.9-0.1.fc25 has been pushed to the r4.0 stable repository for dom0.
To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment