New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup: Invalid HMAC on verification #1471

Open
andrewdavidwong opened this Issue Nov 30, 2015 · 9 comments

Comments

Projects
None yet
2 participants
@andrewdavidwong
Member

andrewdavidwong commented Nov 30, 2015

Since upgrading to R3.0, backup verification occasionally fails with the error message below, but immediately doing a host reboot and reattempting the verification on the same backup file then succeeds.

ERROR: ERROR: invalid hmac for file /var/tmp/restore_[...]:
[...]
Is the passphrase correct?
Partially restored files left in /var/tmp/restore_*, investigate them and/or clean them up
@andrewdavidwong

This comment has been minimized.

Show comment
Hide comment
@andrewdavidwong

andrewdavidwong Nov 30, 2015

Member

This is possibly related to #1124.

Member

andrewdavidwong commented Nov 30, 2015

This is possibly related to #1124.

@marmarek marmarek added this to the Release 3.0 updates milestone Nov 30, 2015

@andrewdavidwong

This comment has been minimized.

Show comment
Hide comment
@andrewdavidwong

andrewdavidwong Dec 23, 2015

Member

I think the problem occurs when there is insufficient disk space (on the disk on which /var/tmp/ resides). However, I would still consider this a bug, since the verification process does not indicate that (any) spare disk space will be needed, and the error messages do not indicate that the failure occurred due to insufficient space.

How much empty space is required, at minimum, for successful verification? (Just enough to hold the largest VM in the backup? More than that?)

Member

andrewdavidwong commented Dec 23, 2015

I think the problem occurs when there is insufficient disk space (on the disk on which /var/tmp/ resides). However, I would still consider this a bug, since the verification process does not indicate that (any) spare disk space will be needed, and the error messages do not indicate that the failure occurred due to insufficient space.

How much empty space is required, at minimum, for successful verification? (Just enough to hold the largest VM in the backup? More than that?)

@andrewdavidwong

This comment has been minimized.

Show comment
Hide comment
@andrewdavidwong

andrewdavidwong Dec 23, 2015

Member

OK, clearly a lot more space is being used than just the largest VM. Large backup-restore test failed with:

  • ~43GB left over in /var/tmp/ spread across multiple VM dirs.
  • ~70GB total free space on disk at start of restore test.
  • Largest VM in the backup was ~47GB.

@marmarek, is there any way to specify a location for qvm-backup-restore to use as its temp dir for holding temporary restore files during verification (e.g., /mnt/large-secondary-disk/tmp/)?

If not, is there a recommended Linux way to "move" /var/tmp/ to a larger secondary disk that works nicely with Qubes? (Found out the hard way that trying to use a symlink can create an endless "waiting" loop that prevents Qubes from booting. Access was needed to /var/tmp/ to mount and decrypt the secondary disk, but /var/tmp/ was on that secondary disk...)

Member

andrewdavidwong commented Dec 23, 2015

OK, clearly a lot more space is being used than just the largest VM. Large backup-restore test failed with:

  • ~43GB left over in /var/tmp/ spread across multiple VM dirs.
  • ~70GB total free space on disk at start of restore test.
  • Largest VM in the backup was ~47GB.

@marmarek, is there any way to specify a location for qvm-backup-restore to use as its temp dir for holding temporary restore files during verification (e.g., /mnt/large-secondary-disk/tmp/)?

If not, is there a recommended Linux way to "move" /var/tmp/ to a larger secondary disk that works nicely with Qubes? (Found out the hard way that trying to use a symlink can create an endless "waiting" loop that prevents Qubes from booting. Access was needed to /var/tmp/ to mount and decrypt the secondary disk, but /var/tmp/ was on that secondary disk...)

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Dec 23, 2015

Member

Generally it may need as much disk space as the whole backup... But in practice should be much smaller (about 200-300MB). If not, it's a bug. It works this way:

  1. Backup stream is extracted from source media (either direct file, or some VM command/file).
  2. External tar extracted - result: private.img.000, private.img.000.hmac, private.img.001, private.img.001.hmac`, etc
  3. In parallel as soon as file is extracted, hmac is checked and then passed to internal tar extractor (which would extract actual private.img in this case). As soon as file part is passed to that second tar, it is removed from the disk.
  4. In case of backup verification, that second tar is set to only list archive content, not extract.

Parts are at most 100MB. So having that parallel extraction, the second one should be quite fast (tar t ...) and only few files should be on disk. Maybe your backup storage is so fast that the first tar is much faster than the second one?

Currently there is no way to specify alternative path. You can mount something there (just put it in /etc/fstab).

Member

marmarek commented Dec 23, 2015

Generally it may need as much disk space as the whole backup... But in practice should be much smaller (about 200-300MB). If not, it's a bug. It works this way:

  1. Backup stream is extracted from source media (either direct file, or some VM command/file).
  2. External tar extracted - result: private.img.000, private.img.000.hmac, private.img.001, private.img.001.hmac`, etc
  3. In parallel as soon as file is extracted, hmac is checked and then passed to internal tar extractor (which would extract actual private.img in this case). As soon as file part is passed to that second tar, it is removed from the disk.
  4. In case of backup verification, that second tar is set to only list archive content, not extract.

Parts are at most 100MB. So having that parallel extraction, the second one should be quite fast (tar t ...) and only few files should be on disk. Maybe your backup storage is so fast that the first tar is much faster than the second one?

Currently there is no way to specify alternative path. You can mount something there (just put it in /etc/fstab).

@andrewdavidwong

This comment has been minimized.

Show comment
Hide comment
@andrewdavidwong

andrewdavidwong Dec 24, 2015

Member

Thanks for that explanation, @marmarek. I did another test, and the results were inconsistent with what I previously reported (same test as before, just excluded a couple of normal-size VMs, and verification succeeded this time even though the same amount of free space was available as before). So, it may be that the issue doesn't actually have anything to do with insufficient disk space. I think it was failing before because the backup was somehow corrupted upon creation, even though I created the backup twice in a row without changing any data. (I remember you saying before that sometimes particularly large backups can have "holes" due to some tar bug...)

Member

andrewdavidwong commented Dec 24, 2015

Thanks for that explanation, @marmarek. I did another test, and the results were inconsistent with what I previously reported (same test as before, just excluded a couple of normal-size VMs, and verification succeeded this time even though the same amount of free space was available as before). So, it may be that the issue doesn't actually have anything to do with insufficient disk space. I think it was failing before because the backup was somehow corrupted upon creation, even though I created the backup twice in a row without changing any data. (I remember you saying before that sometimes particularly large backups can have "holes" due to some tar bug...)

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Dec 24, 2015

Member

Excluded VMs aren't extracted (also during verification), so it still may be about disk space.

Member

marmarek commented Dec 24, 2015

Excluded VMs aren't extracted (also during verification), so it still may be about disk space.

@andrewdavidwong

This comment has been minimized.

Show comment
Hide comment
@andrewdavidwong

andrewdavidwong Dec 24, 2015

Member

Sorry, I meant that I excluded them during backup creation (i.e., didn't back them up).

Member

andrewdavidwong commented Dec 24, 2015

Sorry, I meant that I excluded them during backup creation (i.e., didn't back them up).

@andrewdavidwong

This comment has been minimized.

Show comment
Hide comment
@andrewdavidwong

andrewdavidwong Jan 4, 2016

Member

I just experienced the opposite issue. Verification first succeeded, then failed (and continued failing after many attempts). This is much more serious. Reported as issue #1577.

Member

andrewdavidwong commented Jan 4, 2016

I just experienced the opposite issue. Verification first succeeded, then failed (and continued failing after many attempts). This is much more serious. Reported as issue #1577.

@andrewdavidwong

This comment has been minimized.

Show comment
Hide comment
@andrewdavidwong

andrewdavidwong Nov 15, 2016

Member

Still happens on R3.2, but observed only when creating a large (80+ GB) backup and using compression. After backing up the exact same VMs without compression, verification succeeds.

Member

andrewdavidwong commented Nov 15, 2016

Still happens on R3.2, but observed only when creating a large (80+ GB) backup and using compression. After backing up the exact same VMs without compression, verification succeeds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment