Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qubes.xml corrupted due to disk full #3376

Closed
zander opened this issue Dec 8, 2017 · 14 comments
Closed

qubes.xml corrupted due to disk full #3376

zander opened this issue Dec 8, 2017 · 14 comments
Labels
C: core P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. r4.0-dom0-stable r4.1-dom0-stable T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.

Comments

@zander
Copy link

zander commented Dec 8, 2017

Qubes OS version:

4.0RC2

Steps to reproduce the behavior:

I personally did not see this behavior, it was reported on the user mailinglist. As I realize database corruption like this is a show-stopper bug, I'm reporting it here.

https://groups.google.com/d/msg/qubes-users/svrJiQ1d7S8/zHLoTWiQBQAJ

Expected behavior:

The xml file should never be left in a corrupted state.

@ghost
Copy link

ghost commented Dec 8, 2017

Ive had this too once (in qubes 3.2 admittely), but its a fairly easy fix, you just have to delete the qubes.xml and let it regenerate itself automatically. maybe we can make a script which does this by itself (removing and regenerating) once it detects a full disk?

@zander
Copy link
Author

zander commented Dec 8, 2017

My thinking was more in the direction that the component that generates the XML uses an atomic operation to replace the old one.
On Unix this means the xml is fully generated under a slightly different name and then (the atopic operation) renamed over the old file.

This way there is no risk of corruption.

@andrewdavidwong andrewdavidwong added T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. C: core labels Dec 9, 2017
@andrewdavidwong andrewdavidwong added this to the Release 4.0 milestone Dec 9, 2017
@tasket
Copy link

tasket commented Dec 12, 2017

The file-saving code appears to be in qubes/app.py and it does already use the save/flush/rename method. This would mean the culprit for the error probably lies in the filesystem or thin-lvm or layers beneath.

@dimlev
Copy link

dimlev commented Dec 29, 2017

I tested changing some settings using the qubes-manager (on 3.2) while my disk was full and I did not manage to corrupt qubes.xml. I did manage to corrupt firewall.xml though.

qubes/app.py uses the save/flush/rename method but qubes/firewall.py does not. So if you happen to change a firewall setting while the disk is full you end up with a zero byte / corrupted firewall.xml file.

Should I open a new issue for this?

@dimlev
Copy link

dimlev commented Dec 29, 2017

I also tested changing vm settings on 4.0RC3 (running on virtualbox) while the disk was full. Same results with 3.2

  • qubes.xml was fine.
  • firewall.xml became corrupted.

Also: Patrick (in the google groups) does not mention changing any settings that would trigger saving the qubes.xml file. So if he didn't, then what are some other ways that this could happen?

@iamahuman
Copy link

iamahuman commented Feb 7, 2021

qubes.xml should ideally be replaced atomically.

  1. Create qubes.xml- or an O_TMPFILE.
  2. Serialize into the file.
  3. Flush and perform an fsync(). (if performance/disk lifespan concerns arise, several updates can be consolidated into one.)
  4. Hard link/rename the new file into qubes.xml.

Alternatively symlinking is also an option.

@rustybird
Copy link

rustybird commented Feb 7, 2021

qubes.xml should ideally be replaced atomically.

  1. Create qubes.xml- or an O_TMPFILE.
  2. Serialize into the file.
  3. Flush and perform an fsync(). [...]
  4. Hard link/rename the new file into qubes.xml.
  1. fsync() the directory containing qubes.xml, for durability.

qubes.app.save() is missing both fsync() calls. I'll submit a PR to move a slightly modified qubes.storage.reflink._replace_file() context manager to qubes.utils, and then call that from the qubes.xml and firewall.xml save routines.

Btw it's not yet possible to use O_TMPFILE for atomic file replacement, because the kernel flag to overwrite a linkat() target still hasn't landed after (checks Patchwork again) over 4 years, aaarghh

@andrewdavidwong andrewdavidwong added the P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. label Feb 8, 2021
rustybird added a commit to rustybird/qubes-core-admin that referenced this issue Feb 10, 2021
That takes care of the missing fsync() calls.

Fixes QubesOS/qubes-issues#3376
@qubesos-bot
Copy link

Automated announcement from builder-github

The package qubes-core-dom0-4.1.19-1.fc32 has been pushed to the r4.1 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

@qubesos-bot
Copy link

Automated announcement from builder-github

The package qubes-core-dom0-4.1.20-1.fc32 has been pushed to the r4.1 stable repository for dom0.
To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

@iamahuman
Copy link

Should this issue's milestone be updated to Release 4.1? After all, I'm not sure if this should be called a bug or an enhancement.

@DemiMarie
Copy link

Should this issue's milestone be updated to Release 4.1? After all, I'm not sure if this should be called a bug or an enhancement.

Anything that can corrupt qubes.xml is definitely a bug (and a bad one at that). Release 4.0 is correct. @marmarek should I go ahead and backport this?

@rustybird
Copy link

rustybird commented Nov 13, 2021

Another one for the "backport pending" label. Sorry for the hassle

@DemiMarie DemiMarie added the backport pending On closed issues, indicates fix released for newer version will be backported to older version. label Nov 13, 2021
marmarek pushed a commit to QubesOS/qubes-core-admin that referenced this issue Dec 3, 2021
That takes care of the missing fsync() calls.

Fixes QubesOS/qubes-issues#3376

(cherry picked from commit 12d117b)
@qubesos-bot
Copy link

Automated announcement from builder-github

The component core-admin (including package qubes-core-dom0-4.0.59-1.fc25) has been pushed to the r4.0 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

@qubesos-bot
Copy link

Automated announcement from builder-github

The component core-admin (including package qubes-core-dom0-4.0.59-1.fc25) has been pushed to the r4.0 stable repository for dom0.
To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

@DemiMarie DemiMarie removed the backport pending On closed issues, indicates fix released for newer version will be backported to older version. label Dec 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: core P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. r4.0-dom0-stable r4.1-dom0-stable T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Projects
None yet
Development

No branches or pull requests

8 participants