New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dom0 root filesystem not mounted with discard on thin provisioning #3226

Closed
qubesuser opened this Issue Oct 27, 2017 · 11 comments

Comments

Projects
None yet
7 participants
@qubesuser

Qubes OS version:

R4.0-rc2

Steps to reproduce the behavior:

  1. Install Qubes using default settings, selecting LVM Thin Provisioning
  2. Create large file in dom0 from /dev/zero and delete it

Expected behavior:

The space consumed by dom0 root does not reflect the deleted file, and / filesystem has discard option turned on

Actual behavior:

The space consumed by dom0 root reflects the deleted file, / filesystem does not have discard option turned on

General notes:

If discard is not working on dom0 root, disk space may be inexplicably exhausted (due to backup restores, etc. temporarily using space in dom0 root) since dom0 root filesystem is as large as the thin pool by default.

Workaround

Fix fstab and run fstrim /

@na--

This comment has been minimized.

Show comment
Hide comment
@na--

na-- Oct 29, 2017

Maybe this deserves a separate issue, but right now I'm not sure that the dom0 volume should be in the LVM thin pool at all - it's too fragile. I've somehow managed to turn the whole pool (including the dom0 root volume in it) read-only several times in the last few days. I think most of the times that was due to inadvertently filling up the free space, since LVM restricts writes when that happens (see "Data space exhaustion" in man lvmthin). The issue is that it's very difficult (if not impossible) to fix this condition if the dom0 root is read-only and usually results in a hard reset of the system.

I know this will make restoring large backups harder, but I would really prefer the default drive partitioning to be:

physical drive
`-LUKS
  `-LVM PV/VG
    `-dom0 root & home (10 GB?)
    `-swap (1-2GB)
    `-LVM thin pool (remaining space)
      `-vm-sys-usb
      `-vm-sys-net
      ...
`-boot (1GB)

I think that this should be a much more stable configuration - if something happens to the thin pool, dom0 is unaffected and can be used for repairs. Please tell me if I this is stupid or if I should make it a separate issue.

na-- commented Oct 29, 2017

Maybe this deserves a separate issue, but right now I'm not sure that the dom0 volume should be in the LVM thin pool at all - it's too fragile. I've somehow managed to turn the whole pool (including the dom0 root volume in it) read-only several times in the last few days. I think most of the times that was due to inadvertently filling up the free space, since LVM restricts writes when that happens (see "Data space exhaustion" in man lvmthin). The issue is that it's very difficult (if not impossible) to fix this condition if the dom0 root is read-only and usually results in a hard reset of the system.

I know this will make restoring large backups harder, but I would really prefer the default drive partitioning to be:

physical drive
`-LUKS
  `-LVM PV/VG
    `-dom0 root & home (10 GB?)
    `-swap (1-2GB)
    `-LVM thin pool (remaining space)
      `-vm-sys-usb
      `-vm-sys-net
      ...
`-boot (1GB)

I think that this should be a much more stable configuration - if something happens to the thin pool, dom0 is unaffected and can be used for repairs. Please tell me if I this is stupid or if I should make it a separate issue.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Oct 29, 2017

Member

Generally I agree with @na-- . 10GB for root would be too small (you need to fit whole root.img of the template during installation), but 20GB should be ok. The problem is anaconda code for handling partition layout is quite complex. I've already tried something simpler: have root fs not using the whole pool (but still have pool filling all the disk), but failed after initial tries.

If anyone know anaconda and/or blivet and want to help, that would be awesome.

Member

marmarek commented Oct 29, 2017

Generally I agree with @na-- . 10GB for root would be too small (you need to fit whole root.img of the template during installation), but 20GB should be ok. The problem is anaconda code for handling partition layout is quite complex. I've already tried something simpler: have root fs not using the whole pool (but still have pool filling all the disk), but failed after initial tries.

If anyone know anaconda and/or blivet and want to help, that would be awesome.

@qubesuser

This comment has been minimized.

Show comment
Hide comment
@qubesuser

qubesuser Oct 29, 2017

The issue with that is that there seems to be no way of shrinking thin pools, which means that if dom0 root is outside the thin pool it cannot be grown beyond the space assigned to it, which can be problematic if one wants to install lots of software in dom0 (e.g. they want to try out GNOME or KDE in dom0).

I think dm-thin may still allow writes that don't cause metadata changes (i.e. those that don't break CoW or increase size), so it may be possible to just zero out a smaller dom0 partition before formatting so it's preallocated and should continue working.

A possible solution for the Anaconda issue is to simply create a new thin LV on first Qubes boot, copy the whole root filesystem to it, and then replace the original dom0 root with it. This has the advantage that the size of the LV can be computed automatically depending on the used space on the original dom0 root (this is going to be more useful once the GUI domain is split, since then there will be much less reason to install lots of software in dom0).

The issue with that is that there seems to be no way of shrinking thin pools, which means that if dom0 root is outside the thin pool it cannot be grown beyond the space assigned to it, which can be problematic if one wants to install lots of software in dom0 (e.g. they want to try out GNOME or KDE in dom0).

I think dm-thin may still allow writes that don't cause metadata changes (i.e. those that don't break CoW or increase size), so it may be possible to just zero out a smaller dom0 partition before formatting so it's preallocated and should continue working.

A possible solution for the Anaconda issue is to simply create a new thin LV on first Qubes boot, copy the whole root filesystem to it, and then replace the original dom0 root with it. This has the advantage that the size of the LV can be computed automatically depending on the used space on the original dom0 root (this is going to be more useful once the GUI domain is split, since then there will be much less reason to install lots of software in dom0).

@tasket

This comment has been minimized.

Show comment
Hide comment
@tasket

tasket Mar 19, 2018

Although I've had my pool almost fill-up (due largely to dom0 not discarding on /), I haven't yet experienced any problems like / going read-only since I started using 4.0rc in October. In fact, I'd say that using discard would have avoided the those problems in the first place.

I think the best way forward is to very simply enable discard on dom0 root. Otherwise, a half-solution like external fixed partition opens up a huge can of worms for system management tasks like restoring large VMs, handling templates and disk images. A 20GB root will create more boot-into-read-only fs incidents (and many more complaints) than adding discard to the current config... not less.

Not to mention that having the unused space unavailable to domUs when those admin tasks aren't being performed -- added to the ridiculously large swap space that anaconda already allocates -- will irritate people and waste their resources.

tasket commented Mar 19, 2018

Although I've had my pool almost fill-up (due largely to dom0 not discarding on /), I haven't yet experienced any problems like / going read-only since I started using 4.0rc in October. In fact, I'd say that using discard would have avoided the those problems in the first place.

I think the best way forward is to very simply enable discard on dom0 root. Otherwise, a half-solution like external fixed partition opens up a huge can of worms for system management tasks like restoring large VMs, handling templates and disk images. A 20GB root will create more boot-into-read-only fs incidents (and many more complaints) than adding discard to the current config... not less.

Not to mention that having the unused space unavailable to domUs when those admin tasks aren't being performed -- added to the ridiculously large swap space that anaconda already allocates -- will irritate people and waste their resources.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 19, 2018

Member

Starting with qubes-core-dom0-linux 4.0.13, there is fstrim.timer enabled in dom0, which performs fstrim -a once a week. Enabling discard on / may have negative performance consequences, especially when one enable also discard on LUKS layer. Especially on cheap, or older SSD...

I agree that choosing the right size for statically allocated dom0 root is tricky task, but the current situation is also problematic - VMs can easily (depending on used disk space) DoS dom0. And while filling just VMs storage isn't that big problem (you might need to remove some VM, or just file inside some, and reboot the system afterwards in the worst case), filling space for dom0 filesystem is much more problematic, because you VM management tools will stop working. Note that filling up free space inside filesystem on static LV will just result in "no space left on device" errors for applications trying to write something - freeing up space will immediately fix the problem. Filling up space in thin pool results in I/O errors, possibly forcing read-only remount and in the worst case filesystem corruption (unlikely with current filesystems, but still).

Maybe something in the middle could be used - static LV for root filesystem + thin allocated LV mounted somewhere in dom0 (/var/tmp?). On my system right now, / uses 19GB, which include 3.7GB in /var/lib/qubes/vm-kernels.

Member

marmarek commented Mar 19, 2018

Starting with qubes-core-dom0-linux 4.0.13, there is fstrim.timer enabled in dom0, which performs fstrim -a once a week. Enabling discard on / may have negative performance consequences, especially when one enable also discard on LUKS layer. Especially on cheap, or older SSD...

I agree that choosing the right size for statically allocated dom0 root is tricky task, but the current situation is also problematic - VMs can easily (depending on used disk space) DoS dom0. And while filling just VMs storage isn't that big problem (you might need to remove some VM, or just file inside some, and reboot the system afterwards in the worst case), filling space for dom0 filesystem is much more problematic, because you VM management tools will stop working. Note that filling up free space inside filesystem on static LV will just result in "no space left on device" errors for applications trying to write something - freeing up space will immediately fix the problem. Filling up space in thin pool results in I/O errors, possibly forcing read-only remount and in the worst case filesystem corruption (unlikely with current filesystems, but still).

Maybe something in the middle could be used - static LV for root filesystem + thin allocated LV mounted somewhere in dom0 (/var/tmp?). On my system right now, / uses 19GB, which include 3.7GB in /var/lib/qubes/vm-kernels.

@tasket

This comment has been minimized.

Show comment
Hide comment
@tasket

tasket Mar 19, 2018

This really should be a PEBCAK issue; users generally know they shouldn't fill up their disk.

On Qubes 3.2 just use a DE widget to monitor disk space, same as regular Linux. So the dom0 filesystem manages free space effectively, and the user is empowered to be responsible about disk usage without having to dodge curve balls.

Now on 4.0 we have a deallocation problem and people are flying blind all at once.

A simple 2-color meter + discard in fstab would restore the feedback, balance and relative simplicity users have on R3.2 so the system both manages and communicates disk space effectively.

OTOH, adding "large inflexible admin volume" that is still too small for certain tasks demands substantially more understanding and effort from users. Then the issue becomes less PEBCAK and more of a design, maintenance, documentation and cultural problem. You get lots of howtos and discussion about the care and feeding of anti-feature-X under various use cases, another banal "tend to this!" techie meme that contributes to users wanting something else.

tasket commented Mar 19, 2018

This really should be a PEBCAK issue; users generally know they shouldn't fill up their disk.

On Qubes 3.2 just use a DE widget to monitor disk space, same as regular Linux. So the dom0 filesystem manages free space effectively, and the user is empowered to be responsible about disk usage without having to dodge curve balls.

Now on 4.0 we have a deallocation problem and people are flying blind all at once.

A simple 2-color meter + discard in fstab would restore the feedback, balance and relative simplicity users have on R3.2 so the system both manages and communicates disk space effectively.

OTOH, adding "large inflexible admin volume" that is still too small for certain tasks demands substantially more understanding and effort from users. Then the issue becomes less PEBCAK and more of a design, maintenance, documentation and cultural problem. You get lots of howtos and discussion about the care and feeding of anti-feature-X under various use cases, another banal "tend to this!" techie meme that contributes to users wanting something else.

@tasket

This comment has been minimized.

Show comment
Hide comment
@tasket

tasket Mar 19, 2018

FWIW, I didn't see your last response before submitting the prior post.

Adding /tmp to the thin pool may reintroduce similar space/performance problems to the admin tasks that are at issue.

Performance: Is this really an issue for dom0? I think its much more critical for domUs, which are already using discard for everything.

With the older 2012 vintage SSD in my primary system, disk performance has been fine (with discard) for all my VMs including dom0. But this is about logical deallocation of extents.... right? We're not talking about (small) blocks, and not about hardware TRIM. I don't see the issue here.

DoS: Default domU size is only 2GB, is user-controlled and the user should be seeing per-VM disk allocation anyway, as they would with Qubes Manager on 3.2. Even so, this does raise a question about leaving a minimal amount of free space. For one, the normal DE warnings about low space should be enabled in dom0.

tasket commented Mar 19, 2018

FWIW, I didn't see your last response before submitting the prior post.

Adding /tmp to the thin pool may reintroduce similar space/performance problems to the admin tasks that are at issue.

Performance: Is this really an issue for dom0? I think its much more critical for domUs, which are already using discard for everything.

With the older 2012 vintage SSD in my primary system, disk performance has been fine (with discard) for all my VMs including dom0. But this is about logical deallocation of extents.... right? We're not talking about (small) blocks, and not about hardware TRIM. I don't see the issue here.

DoS: Default domU size is only 2GB, is user-controlled and the user should be seeing per-VM disk allocation anyway, as they would with Qubes Manager on 3.2. Even so, this does raise a question about leaving a minimal amount of free space. For one, the normal DE warnings about low space should be enabled in dom0.

@jharveyb

This comment has been minimized.

Show comment
Hide comment
@jharveyb

jharveyb Mar 19, 2018

Independent of the choice to enable discards by default, a DE widget to monitor disk usage for 4.0 is useful.

This script works with an Xfce Generic Monitor to present a single-color bar showing space used, and a tooltip showing free space. Since it just uses qvm-pool it won't include unallocated space that vgs & pgs show, but that seems acceptable.

#!/bin/sh
SIZE=$(qvm-pool -i lvm | awk '/^size/ {print $2}')
USAGE=$(qvm-pool -i lvm | awk '/^usage/ {print $2}')
FREE=$(($SIZE - $USAGE))
USEDCENT=$((100*$USAGE/$SIZE + 200*$USAGE/$SIZE % 2))
FREEGB=$(echo $FREE | cut -c 1-3)
FREEMB=$(echo $FREE | cut -c 4-5)
echo "<tool>$FREEGB.$FREEMB GB FREE</tool>"
echo "<bar>$USEDCENT</bar>"

jharveyb commented Mar 19, 2018

Independent of the choice to enable discards by default, a DE widget to monitor disk usage for 4.0 is useful.

This script works with an Xfce Generic Monitor to present a single-color bar showing space used, and a tooltip showing free space. Since it just uses qvm-pool it won't include unallocated space that vgs & pgs show, but that seems acceptable.

#!/bin/sh
SIZE=$(qvm-pool -i lvm | awk '/^size/ {print $2}')
USAGE=$(qvm-pool -i lvm | awk '/^usage/ {print $2}')
FREE=$(($SIZE - $USAGE))
USEDCENT=$((100*$USAGE/$SIZE + 200*$USAGE/$SIZE % 2))
FREEGB=$(echo $FREE | cut -c 1-3)
FREEMB=$(echo $FREE | cut -c 4-5)
echo "<tool>$FREEGB.$FREEMB GB FREE</tool>"
echo "<bar>$USEDCENT</bar>"

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 20, 2018

Member

Performance: Is this really an issue for dom0? I think its much more critical for domUs, which are already using discard for everything.

Ok, I feel convinced - since domUs are outside of dom0 filesystem, this change shouldn't affect them. And indeed, we already use discard in domUs and no one complained so far (although 4.0 is still in rc phase). Anyway, if it would be problematic on some disks: a) one may not enable TRIM/DISCARD on LUKS layer, b) one may disable it in dom0 fstab (and/or templates).

Member

marmarek commented Mar 20, 2018

Performance: Is this really an issue for dom0? I think its much more critical for domUs, which are already using discard for everything.

Ok, I feel convinced - since domUs are outside of dom0 filesystem, this change shouldn't affect them. And indeed, we already use discard in domUs and no one complained so far (although 4.0 is still in rc phase). Anyway, if it would be problematic on some disks: a) one may not enable TRIM/DISCARD on LUKS layer, b) one may disable it in dom0 fstab (and/or templates).

marmarek added a commit to marmarek/qubes-installer-qubes-os that referenced this issue Mar 20, 2018

anaconda: enable discard option for dom0 filesystems by default
This may have performance impact on some older SSD, but on the other
hand, without this option it's pretty easy to fill the whole LVM thin
pool even if there is plenty free space in dom0.
Note that this doesn't enable it on LUKS layer, this is still disabled
by default.

Fixes QubesOS/qubes-issues#3226
@qubesos-bot

This comment has been minimized.

Show comment
Hide comment
@qubesos-bot

qubesos-bot Mar 28, 2018

Automated announcement from builder-github

The package pykickstart-2.32-4.fc25 has been pushed to the r4.0 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

Automated announcement from builder-github

The package pykickstart-2.32-4.fc25 has been pushed to the r4.0 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

@qubesos-bot

This comment has been minimized.

Show comment
Hide comment
@qubesos-bot

qubesos-bot Apr 18, 2018

Automated announcement from builder-github

The package pykickstart-2.32-4.fc25 has been pushed to the r4.0 stable repository for dom0.
To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

Automated announcement from builder-github

The package pykickstart-2.32-4.fc25 has been pushed to the r4.0 stable repository for dom0.
To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment