Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upLinux stubdom - qemu crashes with intensive disk I/O #3651
Comments
andrewdavidwong
added
bug
C: xen
labels
Mar 3, 2018
andrewdavidwong
added this to the Release 4.0 milestone
Mar 3, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 3, 2018
Member
We have chosen this controller to ease Windows installation - the default one isn't supported by Windows installer out of the box (see #3068). Ideally we'd have it configurable, but unfortunately support for stubdomains (and specifically stubdomains with current qemu, instead of ancient qemu-traditional fork) is very limited in libxl.
Does updating qemu help (QubesOS/qubes-vmm-xen-stubdom-linux#14)? It doesn't look like the fix you've linked got any attention...
cc @HW42
|
We have chosen this controller to ease Windows installation - the default one isn't supported by Windows installer out of the box (see #3068). Ideally we'd have it configurable, but unfortunately support for stubdomains (and specifically stubdomains with current qemu, instead of ancient qemu-traditional fork) is very limited in libxl. cc @HW42 |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
alcreator
Mar 4, 2018
Unfortunately it still occurs with the latest qemu. I don't know if anyone has tested the upstream fix, since there still isn't official Q35 support in qemu for xen.
The only workaround that I can think of, if the root cause can't be found/fixed, would be to test if read-only disks still cause the crash. If they don't, then switch r/w disks back to ide/ahci (testing that these don't crash either), and have r/o disks on the mptsas1068.
alcreator
commented
Mar 4, 2018
|
Unfortunately it still occurs with the latest qemu. I don't know if anyone has tested the upstream fix, since there still isn't official Q35 support in qemu for xen. |
alcreator commentedMar 3, 2018
•
edited
Edited 2 times
-
alcreator
edited Mar 3, 2018 (most recent)
-
alcreator
edited Mar 3, 2018
Qubes OS version:
Qubes R4.0 (rc4)
Affected component(s):
HVM using linux stubdom (without pv drivers)
Steps to reproduce the behavior:
Perform an intensive I/O operation on the emulated disks, for example:
Using a linux liveCD (archlinux tested):
Boot under standalone HVM, and add xen_nopv to the kernel command line.
Create and mount standard partition and filesystem on an emulated disk (I tested with ntfs).
Enter the following command:
$ while true; do var=1; rm /path/to/mounted/filesystem/file;
while [ $var -lt 5 ]; do ((++var)); cat /dev/cdrom >> /path/to/mounted/filesystem/file; done;
diff /dev/cdrom /path/to/mounted/filesystem/file; done
/dev/urandom can also be used, although this takes longer to crash.
I have been able to trigger this issue with both linux and windows guest operating systems (performing a windows 10/server 2016 installation seems to trigger this faster than the linux test, however it dosen't occur on every installation attempt).
Expected behavior:
Qemu running under linux stubdom does not crash
Actual behavior:
After some time (about 15 min on my hardware), qemu crashes with the error message "Bad ram offset 14787f000" in the device model console. The offset changes on each crash.
General notes:
This appears to have been introduced with qubes-vmm-xen commit 6dd581aaaa4506a9dd34eb48559aabd23a2da361 "stubdom-linux: Use mptsas1068 scsi controller". I haven't reproduced the issue after a few hours testing with the commit reverted.
With the commit reverted, qemu defaults to the lsi53c895a SCSI controller. I have tested this commit with the megasas and megasas-gen2 SCSI controllers, and they also exhibit this issue.
From the observed triggers and symptoms, this bug may be related to the issue described in upstream bugfix "xen-mapcache: Fix the bug when overlapping emulated DMA operations may cause inconsistency in guest memory mappings" (discussion: https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02463.html)
The issue occurs significantly quicker with a full 32-bit stubdom build (less than 1 minute - build details at https://gist.github.com/alcreator/8c21502abc99c92fccf2a9903c9cb346), and if the "performance" cpu scaling governor is used under dom0
Related issues:
#3068