Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign uppage fault when booting Centos and RHEL HVMs #1943
Comments
marmarek
added
bug
C: xen
P: major
labels
May 3, 2016
marmarek
added this to the Release 3.1 updates milestone
May 3, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
May 3, 2016
Member
This looks to be somewhere in USB emulation code.
PS I've cut the log to only last VM run.
|
This looks to be somewhere in USB emulation code. PS I've cut the log to only last VM run. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
lorenzog
May 3, 2016
5 minutes after writing this I realised the problem was the kernel. Upgrading to a 4.x kernel solved the issue for RHEL. As always, one feels better after they've called the doctor....
Now investigating Centos7 before closing this
lorenzog
commented
May 3, 2016
•
|
5 minutes after writing this I realised the problem was the kernel. Upgrading to a 4.x kernel solved the issue for RHEL. As always, one feels better after they've called the doctor.... Now investigating Centos7 before closing this |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
lorenzog
May 3, 2016
Update - RHEL booted successfully once, after that never booted again. Error seems to be identical (see report below).
The VM does not have any USB nor PCI devices attached to it. Could it be a video output problem?
[...]
vga s->lfb_addr = f0000000 s->lfb_end = f1000000
vga s->lfb_addr = f0000000 s->lfb_end = f1000000
vga s->lfb_addr = f0000000 s->lfb_end = f1000000
Page fault at linear address 4307008, rip 20759, regs 0x5ef5b8, sp 5ef660, our_sp 0x5ef580, code 2
Thread: main
RIP: e030:[<0000000000020759>]
RSP: e02b:00000000005ef660 EFLAGS: 00010202
RAX: 0000000004307008 RBX: 00000000ec000008 RCX: 0000000000000001
RDX: 0000000000000008 RSI: 0000002002c752fa RDI: 0000000004307008
RBP: 00000000005ef660 R08: 0000000000000004 R09: 00000000f2000000
R10: 0000000000000000 R11: 000000000000000c R12: 0000000000000008
R13: 0000000000000008 R14: 0000002002c752fa R15: 0000000000001000
base is 0x5ef660 caller is 0x20f28
base is 0x5ef6b0 caller is 0x50c22
base is 0x5ef950 caller is 0x50e9d
base is 0x5ef980 caller is 0x35b5
base is 0x5ef9a0 caller is 0x6a86
base is 0x5efa10 caller is 0x21f27
base is 0x5efa60 caller is 0x950d
base is 0x5efdf0 caller is 0xd7f27
base is 0x5effe0 caller is 0x3423
5ef650: 60 f6 5e 00 00 00 00 00 2b e0 00 00 00 00 00 00
5ef660: b0 f6 5e 00 00 00 00 00 28 0f 02 00 00 00 00 00
5ef670: 08 00 00 00 00 00 00 00 50 89 d8 02 01 00 00 00
5ef680: b0 f6 5e 00 00 00 00 00 b0 52 c7 02 20 00 00 00
5ef650: 60 f6 5e 00 00 00 00 00 2b e0 00 00 00 00 00 00
5ef660: b0 f6 5e 00 00 00 00 00 28 0f 02 00 00 00 00 00
5ef670: 08 00 00 00 00 00 00 00 50 89 d8 02 01 00 00 00
5ef680: b0 f6 5e 00 00 00 00 00 b0 52 c7 02 20 00 00 00
20740: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
20750: 55 48 89 e5 89 d1 c1 e9 03 f3 48 a5 f7 c2 04 00
20760: 00 00 74 01 a5 f7 c2 02 00 00 00 74 02 66 a5 f7
20770: c2 01 00 00 00 74 01 a4 5d c3 55 48 89 e5 5d c3
Pagetable walk from virt 4307008, base 571000:
L4 = 0000000030bb6067 (0x572000) [offset = 0]
L3 = 0000000030bb5067 (0x573000) [offset = 0]
L2 = 0000000022421067 (0x5af000) [offset = 21]
L1 = 0000000000000000 [offset = 107]
lorenzog
commented
May 3, 2016
•
|
Update - RHEL booted successfully once, after that never booted again. Error seems to be identical (see report below). The VM does not have any USB nor PCI devices attached to it. Could it be a video output problem?
|
lorenzog
changed the title from
page fault when booting HVM with kernel 3.10.0 from Centos, RHEL
to
page fault when booting Centos and RHEL HVMs
May 3, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
There is emulated USB tablet device (see qemu cmdline in that log) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
lorenzog
May 3, 2016
The HVM config file contains this line:
<input type='tablet' bus='usb'/>
I've tried editing with virsh edit but changing 'tablet' to 'mouse' or 'keyboard' results in this error message:
error: XML document failed to validate against schema: Unable to validate doc against /usr/share/libvirt/schemas/domain.rng Extra element os in interleave Element domain failed to validate content
To be honest any change in that file results in this error. I'm a bit at a loss - how do I remove the emulated USB tablet?
lorenzog
commented
May 3, 2016
|
The HVM config file contains this line:
I've tried editing with virsh edit but changing 'tablet' to 'mouse' or 'keyboard' results in this error message:
To be honest any change in that file results in this error. I'm a bit at a loss - how do I remove the emulated USB tablet? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
May 3, 2016
Member
virsh will not help - domain config is regenerated at each domain startup. If you want try with manual change - dump that config to some file, edit, then pass to qvm-start --custom-config=....
|
virsh will not help - domain config is regenerated at each domain startup. If you want try with manual change - dump that config to some file, edit, then pass to |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
lorenzog
May 4, 2016
Brilliant, I was able to achieve reliable boot. Thank you very much! Good hint :)
Solution:
Changing 'tablet' and 'usb' into 'mouse' and 'ps2' allowed reliable boot (albeit with the message BUG: soft lockup - CPU#0 stuck for 23s [...])
Question:
Is there some documentation on how to make those changes permanent to the .conf file? I suppose domain config is located somewhere else, but I can't find out exactly where.
lorenzog
commented
May 4, 2016
|
Brilliant, I was able to achieve reliable boot. Thank you very much! Good hint :) Solution:Changing 'tablet' and 'usb' into 'mouse' and 'ps2' allowed reliable boot (albeit with the message Question:Is there some documentation on how to make those changes permanent to the .conf file? I suppose domain config is located somewhere else, but I can't find out exactly where. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
May 4, 2016
Member
No, domain config is generated at each startup (unless --custom-config is used). If you really want, you can edit its template: /usr/share/qubes/vm-template-hvm.xml. But it will affect all the HVMs, and that file will be overwritten on update.
The right way to go is to fix USB controller emulator to not crash - even in case of buggy driver.
|
No, domain config is generated at each startup (unless |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
lorenzog
commented
May 4, 2016
|
Fair enough. Thanks. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Can we close this as solved? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
I'd leave it open for that qemu bug. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Ok, sounds good. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dlmetcalf
May 17, 2016
Thanks for ticketing priority of this as Major.
Given most users need to run RHEL or Ubuntu for their day job work, support for these mainstream OS's could have a significant impact on uptake. Increasing the number of people who can/do use Qubes in their day job, might increase odds of raising donations (or code contributions). I know several places I could 'pitch' Qubes to, but latest RHEL & Ubuntu LTS support as templates & HVMs (of which this ticket is a start) is definitely a prerequisite.
p.s. The volume of work such a small Qubes team is achieving is super impressive BTW!
dlmetcalf
commented
May 17, 2016
|
Thanks for ticketing priority of this as Major. Given most users need to run RHEL or Ubuntu for their day job work, support for these mainstream OS's could have a significant impact on uptake. Increasing the number of people who can/do use Qubes in their day job, might increase odds of raising donations (or code contributions). I know several places I could 'pitch' Qubes to, but latest RHEL & Ubuntu LTS support as templates & HVMs (of which this ticket is a start) is definitely a prerequisite. p.s. The volume of work such a small Qubes team is achieving is super impressive BTW! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
JoeThielen
Jul 28, 2016
I am having this same issue with CentOS 7 (1511) Minimal ISO downloaded from CentOS.org (sha256sum: f90e4d28fa377669b2db16cbcb451fcb9a89d2460e3645993e30e137ac37d284).
As noted above, when the VM does boot, it will show the "BUG: soft lockup - CPU#0 stuck for" notice too. At one point in trying to diagnose this (before finding this thread), I tried varying the number of CPUs allotted to the VM, and I seemed to have more success with just one. If I tried more than one I seemed to have more issues. However, that was awhile ago, before I found this thread, and I may be confusing the issue.
I will try the 'tablet'/'usb' and 'mouse'/'ps2 changes in /usr/share/qubes/vm-template-hvm.xml to see if that helps long-term. I just did a quick test with three different CentOS HVMs, booting them several times each, and now it seems to boot 100% every time, after a minute or two wait for the CPU lockup thing...
Thanks to all who are reporting on this as well as working on it! I thought I was doing something noobish/wrong.
JoeThielen
commented
Jul 28, 2016
|
I am having this same issue with CentOS 7 (1511) Minimal ISO downloaded from CentOS.org (sha256sum: f90e4d28fa377669b2db16cbcb451fcb9a89d2460e3645993e30e137ac37d284). As noted above, when the VM does boot, it will show the "BUG: soft lockup - CPU#0 stuck for" notice too. At one point in trying to diagnose this (before finding this thread), I tried varying the number of CPUs allotted to the VM, and I seemed to have more success with just one. If I tried more than one I seemed to have more issues. However, that was awhile ago, before I found this thread, and I may be confusing the issue. I will try the 'tablet'/'usb' and 'mouse'/'ps2 changes in /usr/share/qubes/vm-template-hvm.xml to see if that helps long-term. I just did a quick test with three different CentOS HVMs, booting them several times each, and now it seems to boot 100% every time, after a minute or two wait for the CPU lockup thing... Thanks to all who are reporting on this as well as working on it! I thought I was doing something noobish/wrong. |
marmarek
added
the
help wanted
label
Jul 29, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
pedro7
commented
Aug 4, 2016
•
|
Does |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
JoeThielen
Aug 4, 2016
Pedro, that doesn't seem to do anything for me.
Here is what I did:
- Started CentOS 7 HVM
- I added that to my
/etc/default/grubfile in theGRUB_CMDLINE_LINUXparameter. - Ran
grub2-mkconfig --output=/boot/grub2/grub.cfg - Restarted the HVM.
It still sat there for over a minute or more, finally giving me the "BUG: soft lockup - CPU#0 stuck for" message and then booted. I restarted again and at the grub prompt I hit "e" to make sure it showed up on the command line and it was indeed there.
Let me know if I should have done something different. And thanks for the suggestion.
JoeThielen
commented
Aug 4, 2016
|
Pedro, that doesn't seem to do anything for me. Here is what I did:
It still sat there for over a minute or more, finally giving me the "BUG: soft lockup - CPU#0 stuck for" message and then booted. I restarted again and at the grub prompt I hit "e" to make sure it showed up on the command line and it was indeed there. Let me know if I should have done something different. And thanks for the suggestion. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
JoeThielen
Sep 1, 2016
I can confirm this is still an issue on R3.2-rc2. Not the reliable booting issue, but the "BUG: soft lockup - CPU#0 stuck for" message/issue. I tried the intel_idle.max_cstate=7 thing as well as editing /usr/share/qubes/vm-template-hvm.xml. I still get the delay and message every time.
JoeThielen
commented
Sep 1, 2016
|
I can confirm this is still an issue on R3.2-rc2. Not the reliable booting issue, but the "BUG: soft lockup - CPU#0 stuck for" message/issue. I tried the |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
JoeThielen
Sep 2, 2016
I can also confirm this is still an issue on R3.2-rc3. I did witness a CentOS 7 Minimal HVM fail to boot before I edited /usr/share/qubes/vm-template-hvm.xml, so I can also confirm that is still an issue too. The /usr/share/qubes/vm-template-hvm.xml workaround works, but I still get the "BUG: soft lockup - CPU#0 stuck for" message/issue.
JoeThielen
commented
Sep 2, 2016
|
I can also confirm this is still an issue on R3.2-rc3. I did witness a CentOS 7 Minimal HVM fail to boot before I edited /usr/share/qubes/vm-template-hvm.xml, so I can also confirm that is still an issue too. The /usr/share/qubes/vm-template-hvm.xml workaround works, but I still get the "BUG: soft lockup - CPU#0 stuck for" message/issue. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
JoeThielen
Sep 23, 2016
I think I've figured out the "BUG: soft lockup - CPU#0 stuck for" issue that causes delays when the HVM is booting. Looks like, at least in my situation, it's related to the bochs_drm Linux module. Looks like it has something to do with being a frame buffer driver. If I disable that module on the Linux kernel command line, then no more error!
So basically:
- Start CentOS 7 HVM
- I added "modprobe.disable=bochs_drm" to my /etc/default/grub file in the GRUB_CMDLINE_LINUX parameter.
- I REMOVED "rhgb" from my /etc/default/grub file in the GRUB_CMDLINE_LINUX parameter.
- Ran grub2-mkconfig --output=/boot/grub2/grub.cfg
- Restarted the HVM.
It looks like this has nothing to do with the original issue in this thread. I latched onto it because the "BUG: soft lockup - CPU#0" issue was mentioned and I had assumed they were linked. But I tried a lot of different things in my attempts to fix this issue. I tried various edits to /usr/share/qubes/vm-template-hvm.xml but I went back and changed it back to stock and noticed that this issue still occured even when the original <input type='tablet' bus='usb'/> was in there. Then I did some googling and found a bunch of posts relating to clocksource=jiffies so I tried that with Xen and the HVMs and that didn't fix anything either. I forget what finally turned me onto the bochs_drm module.
Anyway, the final result is that, at least in my case, there were two separate issues:
/usr/share/qubes/vm-template-hvm.xmlneeded to have the<input type='tablet' bus='usb'/>changed to<input type='mouse' bus='ps2'/>- This made HVM booting reliable
- The "BUG: soft lockup - CPU#0" issue documented above, which now seems good for me by disabling the bochs_drm module.
JoeThielen
commented
Sep 23, 2016
|
I think I've figured out the "BUG: soft lockup - CPU#0 stuck for" issue that causes delays when the HVM is booting. Looks like, at least in my situation, it's related to the bochs_drm Linux module. Looks like it has something to do with being a frame buffer driver. If I disable that module on the Linux kernel command line, then no more error! So basically:
It looks like this has nothing to do with the original issue in this thread. I latched onto it because the "BUG: soft lockup - CPU#0" issue was mentioned and I had assumed they were linked. But I tried a lot of different things in my attempts to fix this issue. I tried various edits to /usr/share/qubes/vm-template-hvm.xml but I went back and changed it back to stock and noticed that this issue still occured even when the original Anyway, the final result is that, at least in my case, there were two separate issues:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Sep 23, 2016
Member
Thanks for the info!
Is mouse position reliable with this change? AFAIR it easily
desynchronize. Do you have any Windows installation (without Qubes
Windows Tools installed) to check it there too?
The later change can be probably added here:
https://www.qubes-os.org/doc/linux-hvm-tips/
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
|
Thanks for the info! Is mouse position reliable with this change? AFAIR it easily The later change can be probably added here: Best Regards, |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
JoeThielen
Sep 24, 2016
I've only been using CentOS 7 HVM for services, not anything GUI. However, during CentOS GUI install I can say that, yes, I did have problems with desynchronized mouse position.
JoeThielen
commented
Sep 24, 2016
|
I've only been using CentOS 7 HVM for services, not anything GUI. However, during CentOS GUI install I can say that, yes, I did have problems with desynchronized mouse position. |
JoeThielen
referenced this issue
in QubesOS/qubes-doc
Sep 24, 2016
Merged
Add fix for Linux HVM kernel error on bootup: BUG: soft lockup - CPU#0 stuck for 23s! [systemd-udevd:244] #193
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
JoeThielen
Sep 24, 2016
@marmarek I've created a pull request in that doc page you mentioned with the steps to resolve the issue.
JoeThielen
commented
Sep 24, 2016
|
@marmarek I've created a pull request in that doc page you mentioned with the steps to resolve the issue. |
unman
referenced this issue
in QubesOS/qubes-doc
Apr 19, 2017
Merged
Explain how to boot with HVM kernel errors. #366
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
unman
Apr 19, 2017
Member
@andrewdavidwong I've explained how to boot a HVM with this error in QubesOS/qubes-doc#366 and JoeThielen's original PR resolved the issue.
This may be closed
|
@andrewdavidwong I've explained how to boot a HVM with this error in QubesOS/qubes-doc#366 and JoeThielen's original PR resolved the issue. |
lorenzog commentedMay 3, 2016
•
edited
Edited 1 time
-
lorenzog
edited May 3, 2016 (most recent)
Qubes OS version (e.g.,
R3.1):3.1
Affected TemplateVMs (e.g.,
fedora-23, if applicable):None - affects HVMs
Expected behavior:
HVM boots regularly
Actual behavior:
HVM does not boot 9 times out of 10 - logs show a page fault error
Steps to reproduce the behavior:
General notes:
Installation from CDROM/DVD works fine - system boots without a problem. However once installed, the first reboot never gets anywhere but instead shows a blank screen with a non-blinking cursor. VM shutdown via Qubes VM manager doesn't stop it; I have to kill it instead.
This happens with the following VMs:
This does NOT happen on a Ubuntu server 16.04, kernel 4.4.0-21.
The -dm.log file says (error at the bottom):
Related issues:
Relevant labels: