Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VMs don't boot on 4Kn drive #7828

Closed
SurFlurer opened this issue Oct 20, 2022 · 16 comments · Fixed by QubesOS/qubes-vmm-xen#140
Closed

VMs don't boot on 4Kn drive #7828

SurFlurer opened this issue Oct 20, 2022 · 16 comments · Fixed by QubesOS/qubes-vmm-xen#140
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: Xen diagnosed Technical diagnosis has been performed (see issue comments). P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. pr submitted A pull request has been submitted for this issue. r4.1-bookworm-stable r4.1-bullseye-stable r4.1-buster-stable r4.1-centos-stream8-stable r4.1-dom0-stable r4.2-host-cur-test r4.2-vm-bookworm-cur-test r4.2-vm-bullseye-cur-test T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. updates testing Issue regarding an update that is currently in testing. Triage before migrating update to stable.

Comments

@SurFlurer
Copy link

How to file a helpful issue

Qubes OS release

R4.1.1 TESTING

Brief summary

Using default "LVM" installation layout.
When I install Qubes with "LVM" disk layout, the default pool is varlibqubes.
The VMs are all based on stock templates that are shipped with the iso. I didn't build 4Kn templates.
All my VMs don't boot after I upgrade dom0 to tesing-latest.
Without enabling testing, things are fine.

Steps to reproduce

  1. Find a NVMe drive using LBA sector size 4096.
  2. Install Qubes using "LVM" disk layout, and upgrade Qubes to latest (xen 4.14.5-7).
  3. Find that VMs do boot.
  4. Upgrade Qubes to testing-latest (without kernel latest) (xen 4.14.5-9).
  5. Find that no VM wants to boot.

Expected behavior

VMs boot as usual.

Actual behavior

An error message, asserting "Start failed: internal error: libxenlight failed to create new domain xxx"

Here are the logs in /var/log/libvirt/libxl/libxl-driver.log:

2022-10-20 12:23:19.476+0000: libxl: libxl_exec.c:117:libxl_report_child_exitstatus: /etc/xen/scripts/block add [5543] exited with error status 1
2022-10-20 12:23:19.476+0000: libxl: libxl_device.c:1302:device_hotplug_child_death_cb: script: losetup --direct-io=on /dev/loop8 /var/lib/qubes/appvms/sys-net/private-dirty.img failed
2022-10-20 12:23:19.570+0000: libxl: libxl_exec.c:117:libxl_report_child_exitstatus: /etc/xen/scripts/block add [5542] exited with error status 1
2022-10-20 12:23:19.570+0000: libxl: libxl_device.c:1302:device_hotplug_child_death_cb: script: losetup --direct-io=on /dev/loop9 /var/lib/qubes/appvms/sys-net/root-dirty.img failed
2022-10-20 12:23:19.666+0000: libxl: libxl_exec.c:117:libxl_report_child_exitstatus: /etc/xen/scripts/block add [5548] exited with error status 1
2022-10-20 12:23:19.666+0000: libxl: libxl_device.c:1302:device_hotplug_child_death_cb: script: losetup --direct-io=on -r /dev/loop10 /var/lib/qubes/vm-kernels/5.15.74-1.fc32/modules.img failed
2022-10-20 12:23:19.775+0000: libxl: libxl_exec.c:117:libxl_report_child_exitstatus: /etc/xen/scripts/block add [5545] exited with error status 1
2022-10-20 12:23:19.775+0000: libxl: libxl_device.c:1302:device_hotplug_child_death_cb: script: losetup --direct-io=on /dev/loop11 /var/lib/qubes/appvms/sys-net/volatile-dirty.img failed
2022-10-20 12:23:19.775+0000: libxl: libxl_create.c:1686:domcreate_launch_dm: Domain 3:unable to add disk devices
2022-10-20 12:23:19.828+0000: libxl: libxl_exec.c:117:libxl_report_child_exitstatus: /etc/xen/scripts/block remove [5805] exited with error status 1
2022-10-20 12:23:19.829+0000: libxl: libxl_device.c:1302:device_hotplug_child_death_cb: script: /etc/xen/scripts/block failed; error detected.
2022-10-20 12:23:19.839+0000: libxl: libxl_exec.c:117:libxl_report_child_exitstatus: /etc/xen/scripts/block remove [5804] exited with error status 1
2022-10-20 12:23:19.839+0000: libxl: libxl_device.c:1302:device_hotplug_child_death_cb: script: /etc/xen/scripts/block failed; error detected.
2022-10-20 12:23:19.849+0000: libxl: libxl_exec.c:117:libxl_report_child_exitstatus: /etc/xen/scripts/block remove [5807] exited with error status 1
2022-10-20 12:23:19.849+0000: libxl: libxl_device.c:1302:device_hotplug_child_death_cb: script: /etc/xen/scripts/block failed; error detected.
2022-10-20 12:23:19.858+0000: libxl: libxl_exec.c:117:libxl_report_child_exitstatus: /etc/xen/scripts/block remove [5810] exited with error status 1
2022-10-20 12:23:19.859+0000: libxl: libxl_device.c:1302:device_hotplug_child_death_cb: script: /etc/xen/scripts/block failed; error detected.
@SurFlurer SurFlurer added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. labels Oct 20, 2022
@marmarek
Copy link
Member

Do you see any more details in /var/log/xen/xen-hotplug.log ?

@SurFlurer
Copy link
Author

Yes. From /var/log/xen/xen-hotplug.log:

losetup: /dev/loop0: set direct io failed: Invalid argument
losetup: /dev/loop1: set direct io failed: Invalid argument
losetup: /dev/loop2: set direct io failed: Invalid argument
losetup: /dev/loop3: set direct io failed: Invalid argument
cat: /run/xen-hotplug/backend-vbd-1-51712-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-1-51728-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-1-51760-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-1-51744-node: No such file or directory
losetup: /dev/loop4: set direct io failed: Invalid argument
losetup: /dev/loop5: set direct io failed: Invalid argument
losetup: /dev/loop6: set direct io failed: Invalid argument
losetup: /dev/loop7: set direct io failed: Invalid argument
cat: /run/xen-hotplug/backend-vbd-2-51728-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-2-51712-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-2-51744-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-2-51760-node: No such file or directory
losetup: /dev/loop8: set direct io failed: Invalid argument
losetup: /dev/loop9: set direct io failed: Invalid argument
losetup: /dev/loop10: set direct io failed: Invalid argument
losetup: /dev/loop11: set direct io failed: Invalid argument
cat: /run/xen-hotplug/backend-vbd-3-51712-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-3-51728-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-3-51744-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-3-51760-node: No such file or directory
losetup: /dev/loop0: set direct io failed: Invalid argument
losetup: /dev/loop1: set direct io failed: Invalid argument
losetup: /dev/loop2: set direct io failed: Invalid argument
losetup: /dev/loop3: set direct io failed: Invalid argument
cat: /run/xen-hotplug/backend-vbd-1-51712-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-1-51744-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-1-51728-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-1-51760-node: No such file or directory
losetup: /dev/loop4: set direct io failed: Invalid argument
losetup: /dev/loop5: set direct io failed: Invalid argument
losetup: /dev/loop6: set direct io failed: Invalid argument
losetup: /dev/loop7: set direct io failed: Invalid argument
cat: /run/xen-hotplug/backend-vbd-2-51712-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-2-51728-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-2-51744-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-2-51760-node: No such file or directory
losetup: /dev/loop8: set direct io failed: Invalid argument
losetup: /dev/loop9: set direct io failed: Invalid argument
losetup: /dev/loop10: set direct io failed: Invalid argument
losetup: /dev/loop11: set direct io failed: Invalid argument
cat: /run/xen-hotplug/backend-vbd-3-51728-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-3-51712-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-3-51744-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-3-51760-node: No such file or directory
losetup: /dev/loop0: set direct io failed: Invalid argument
losetup: /dev/loop1: set direct io failed: Invalid argument
losetup: /dev/loop2: set direct io failed: Invalid argument
losetup: /dev/loop3: set direct io failed: Invalid argument
cat: /run/xen-hotplug/backend-vbd-1-51728-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-1-51712-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-1-51744-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-1-51760-node: No such file or directory
losetup: /dev/loop4: set direct io failed: Invalid argument
losetup: /dev/loop5: set direct io failed: Invalid argument
losetup: /dev/loop6: set direct io failed: Invalid argument
losetup: /dev/loop7: set direct io failed: Invalid argument
cat: /run/xen-hotplug/backend-vbd-2-51712-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-2-51728-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-2-51760-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-2-51744-node: No such file or directory
losetup: /dev/loop0: set direct io failed: Invalid argument
losetup: /dev/loop1: set direct io failed: Invalid argument
losetup: /dev/loop2: set direct io failed: Invalid argument
losetup: /dev/loop3: set direct io failed: Invalid argument
cat: /run/xen-hotplug/backend-vbd-1-51712-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-1-51728-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-1-51760-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-1-51744-node: No such file or directory
losetup: /dev/loop4: set direct io failed: Invalid argument
losetup: /dev/loop5: set direct io failed: Invalid argument
losetup: /dev/loop6: set direct io failed: Invalid argument
losetup: /dev/loop7: set direct io failed: Invalid argument
cat: /run/xen-hotplug/backend-vbd-2-51712-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-2-51728-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-2-51744-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-2-51760-node: No such file or directory
losetup: /dev/loop8: set direct io failed: Invalid argument
losetup: /dev/loop9: set direct io failed: Invalid argument
losetup: /dev/loop10: set direct io failed: Invalid argument
losetup: /dev/loop11: set direct io failed: Invalid argument
cat: /run/xen-hotplug/backend-vbd-3-51760-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-3-51728-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-3-51712-node: No such file or directory
cat: /run/xen-hotplug/backend-vbd-3-51744-node: No such file or directory

@marmarek
Copy link
Member

The fix will be in repo soon. In the meantime, you can recover by editing /etc/xen/scripts/block and removing --direct-io=on parameter from losetup call.

@SurFlurer
Copy link
Author

Got it! Thanks very much!

@qubesos-bot
Copy link

Automated announcement from builder-github

The package vmm-xen has been pushed to the r4.1 testing repository for the CentOS centos-stream8 template.
To test this update, please install it with the following command:

sudo yum update --enablerepo=qubes-vm-r4.1-current-testing

Changes included in this update

@DemiMarie
Copy link

Whoops! I had not considered that at all. Glad this got caught in testing!

@qubesos-bot
Copy link

Automated announcement from builder-github

The component vmm-xen (including package python3-xen-4.14.5-10.fc32) has been pushed to the r4.1 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

@qubesos-bot
Copy link

Automated announcement from builder-github

The package xen_4.14.5-10 has been pushed to the r4.1 testing repository for the Debian template.
To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list by uncommenting the line containing buster-testing (or appropriate equivalent for your template version), then use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

@andrewdavidwong andrewdavidwong added this to the Release 4.1 updates milestone Oct 20, 2022
@andrewdavidwong andrewdavidwong added C: Xen diagnosed Technical diagnosis has been performed (see issue comments). pr submitted A pull request has been submitted for this issue. updates testing Issue regarding an update that is currently in testing. Triage before migrating update to stable. labels Oct 20, 2022
@rustybird
Copy link

rustybird commented Oct 20, 2022

This is fixed in util-linux v2.38+, although we'd also have to pass an explicit --sector-size=512 argument to ensure that the incompatible case is automatically resolved in favor of turning direct I/O off (as opposed to changing the logical block size).

@SurFlurer did you by any chance manually set up a 4k dm-crypt sector size? The old cryptsetup in the R4.1.x installer defaults to 512 byte sectors even on drives with 4k logical sectors, so normally your LVM Thin pool should have arrived at 512 byte sectors as well and avoided this bug. See below

@qubesos-bot
Copy link

Automated announcement from builder-github

The package vmm-xen has been pushed to the r4.2 testing repository for the Debian template.
To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list by uncommenting the line containing bullseye-testing (or appropriate equivalent for your template version), then use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

@qubesos-bot
Copy link

Automated announcement from builder-github

The package vmm-xen has been pushed to the r4.2 testing repository for the Debian template.
To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list by uncommenting the line containing bookworm-testing (or appropriate equivalent for your template version), then use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

@SurFlurer
Copy link
Author

@rustybird

In fact, I'm using LVM thick pool, because if I choose to use the LVM thin layout in the installer, there will be an error message and no new partition will be created.

Using default "LVM" installation layout.

Whoops, I forgot to explicitly emphasize that "LVM" is different from LVM-thin.

When I install Qubes with "LVM" disk layout, the default pool is varlibqubes.

However, if my memory is correct, the default pool in LVM thin layout is “vm”.

@rustybird
Copy link

rustybird commented Oct 22, 2022

Oh man, I always assumed that dm-crypt with sector_size 512 (the default on R4.1.x) would "shield" the upper storage layers from the 4Kn drive's logical block size. But actually it only works the other way around, which is sector_size 4096 bumping up a physical or logical block size of 512 on the underlying device to 4096 on the dm device.

That explains why this bug was able to affect your system.


General summary (not specific to this bug): People with 4Kn drives currently can't use lvm_thin pools (#4974), unless they're doing esoteric things like building custom template image files etc. And people who set up 4K dm-crypt on 512e drives have the same problem. But installation layouts compatible with the file-reflink driver are okay: "Btrfs", "Standard Partition", or "LVM" that's not "Thin". The latter two use XFS for the varlibqubes pool and are somewhat less tested, as you noticed 😉

@qubesos-bot
Copy link

Automated announcement from builder-github

The component vmm-xen (including package python3-xen-4.14.5-10.fc32) has been pushed to the r4.1 stable repository for dom0.
To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

@qubesos-bot
Copy link

Automated announcement from builder-github

The package vmm-xen has been pushed to the r4.1 stable repository for the CentOS centos-stream8 template.
To install this update, please use the standard update command:

sudo yum update

Changes included in this update

@qubesos-bot
Copy link

Automated announcement from builder-github

The package xen_4.14.5-10+deb10u1 has been pushed to the r4.1 stable repository for the Debian template.
To install this update, please use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

rustybird added a commit to rustybird/qubes-linux-utils that referenced this issue Apr 28, 2023
block_size=0 in loop_config usually results in a loop device with
512 byte logical blocks (which is required for compatibility with the
normal VM volume content), but not always:

XFS backed by a block device with 4096 byte logical blocks (due to a 4Kn
drive and/or dm-crypt with sector_size=4096) doesn't support the
combination of direct I/O *and* 512 byte logical blocks for loop
devices. With block_size=0 the kernel resolves the conflict by changing
the logical block size to 4096. Explicitly pass block_size=512 to turn
off direct I/O in this case instead.

Fixes QubesOS/qubes-issues#7828
@andrewdavidwong andrewdavidwong added the affects-4.1 This issue affects Qubes OS 4.1. label Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: Xen diagnosed Technical diagnosis has been performed (see issue comments). P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. pr submitted A pull request has been submitted for this issue. r4.1-bookworm-stable r4.1-bullseye-stable r4.1-buster-stable r4.1-centos-stream8-stable r4.1-dom0-stable r4.2-host-cur-test r4.2-vm-bookworm-cur-test r4.2-vm-bullseye-cur-test T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. updates testing Issue regarding an update that is currently in testing. Triage before migrating update to stable.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants