New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hardware reset during installation and boot of R4.2 on Ryzen 9 7950X #8322
Comments
I suspect this will need a hardware quirk in the installer. |
A simpler workaround turns out to be leaving IOMMU disabled during installation (the above MB defaults to auto and does not know about Qubes) then installation exits seconds before getting the hardware reset with a missing IOMMU error starting sys-firewall - presuming it was sys-net actually. Installation exits cleanly and one can immediately log in and remove the USB controller from sys-usb. I have no idea what I am missing out on in the install, this technically invalidates all further testing of R4.2. It does seem to work rather well actually... |
Quick check on rc3 and still there, however a clean install can be made by adding the "qubes.skip_autostart" option to vmlinuz on 2nd pass of installation. The installer does take notice, oddly sys_usb is not started and sys_firewall & sys_net are, probably a bug. Just take last USB controller out of sys_usb and start it and proceed as normal. Only problems I am having with rc3 is with USB devices being a bit flakey, may be related to whatever this problem is. |
How would one add the needed quirk to Anaconda? |
Is there a phase during installation where the installer boots sys-usb after assigning all usb devices to it? |
I don't think it's the right thing to do, at least with the current info here. It would potentially leave dom0 exposed to some USB devices, while user would have impression they are all isolated in sys-usb (since that was selected during install). The proper solution is ofc make it not crash. But as a workaround user can choose to not create sys-usb during install, and later create it by hand and remove the device from there. This way they will know some device is excluded and there is no risk of leaving it in dom0 without user knowledge. Such instruction should also explain the risk. But, if the device really should stay in dom0, not as a workaround for a crash, but as really intended behavior, then we have a mechanism for that - |
Has this been reported to Gigabyte? I wonder if SMM is getting an interrupt it did not expect to get and crashes as a result. |
What if the device was attached to nothing? Don’t assign it to sys-usb, but don’t assign it to any other qube (including dom0) either. Assign it to Xen’s quarantine domain. That might avoid the crash without the security consequences. Alternatively, what if Linux is told to not reset the device? I wonder if Linux sees that a PM reset is available, but that PM reset winds up resetting the whole system. |
That's highly unlikely. A much more likely cause is either dom0 or xen panic... And still, I don't want wasting time on elaborate workarounds (there are already a few simple ones in this thread), until we know for sure proper fix is not achievable. |
Is “assign to quarantine domain” simple or elaborate? |
This is reproducible on my 7950X with an Asus Strix X670E-F, so I don't thnk it's Gigabyte-specific. I also have a 7900XTX which may not be helping things. |
Also happens to me on 7950X with Asrock X670E Steel Legend. I have two USB controllers that cause a reboot -- 16:00.4 and 17:00.0 |
I have the same issue with my Asus Strix X670E-F. However I am not sure of what it is really. I tried every USB port on my setup, everything work, without this "USB controller". ( For the peoples having this issue, are you missing any USB port / functionality without the "USB controller" that you cannot passthrough ? Result of "sudo lsci -vvs 12:00.0"
The uncommon lines in this:
|
On mine the 17.00.0 is the Motherboard LED controller. But since there is no problem when not using sys-usb, it should be a passthrough problem (i.e. iommu groups) right? |
iommu groups or soft reset |
4.2-rc4 6.5.6 still there. Behavior is different - normal install, I left machine for 2nd pass and when I returned much later it was shut down. Bringing it up with qubes.skip_autostart there were 3 USB controllers in sys-usb that were unknown and all had to be removed for it to start. Guessing not everything made it disk before the reset. [ed] While writing up that issue I had a different event: an instant power off while typing here. Had been doing various testing on USB ports and had left a storage device plugged into one of the controllers on the 670 chipset. On trying to boot I got the same power off after entering the disk password, suspecting sys-usb, I took a couple more devices out and could then get up and running and then noticed the USB drive on the back panel, removed it and could put those devices back in sys-usb and boot OK. So it looks like all it takes is for a device to be plugged into a port that is mapped to sys-usb to cause a reset or power off on start. I did plug the mouse and keyboard into the only 2 ports that are USB 2.0/1.1 that are on a USB 2.0 hub direct on the CPU, hence my original suspicion. |
4.2.0 6.6.2 did not have above installation problem. Still get a power off starting sys-usb if the last USB device is mapped. Not getting the power off/reset if a storage device is plugged into another controller when sys-usb is started, however sys-usb does go into a loop: device available, device removed notifications every second that is cleared by removing the storage device. |
I can see the same on Supermicro M11SDV-4C-LN4F, here's log from serial from attempted boot that resulted in hard restart: No panic, nothing unexpected in the last lines. I'm not sure why first lines (5th and 6th) look as they do. I had issue with another Supermicro board (X11-something) where the output was heavily modified by BMC (lines printed out of order with heavy jumping with ANSI escape codes, I can start the OS with |
Need this for Supermicro MBD-M11SDV-4C-LN4F which resets if sys-usb is in use. See QubesOS/qubes-issues#8322 (comment) Signed-off-by: Sergii Dmytruk <sergii.dmytruk@3mdeb.com>
How to file a helpful issue
Qubes OS release
R4.2.0-rc1 + Ryzen 9 7950X + Gigabyte X670E motherboard
Brief summary
Installation proceeds normally till just after "Configure networking" when hardware resets.
Further system boots reset just after entering disk password.
Steps to reproduce
Run a default installation of R4.2.0-rc1.
Expected behavior
No hardware resets.
Actual behavior
As noted.
Problem appears to be caused by a single USB controller being mapped into sys-usb.
There are 5 USB controllers on the CPU and 670 chipset, only one causes a problem.
It is the last one in the devices list, address 37:00.0.
Workaround is to add qubes.skip_autostart option to the linux kernel boot parameters at any boot after installation, then unmap this controller from sys-usb once system is up.
I suspect that it is the on CPU controller that is used for the mouse and keyboard as others on different VM systems on the same CPU have a problem mapping running USB devices causing a hardware reset.
The text was updated successfully, but these errors were encountered: