New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows 10 crashes while installing nvidia drivers #9003
Comments
I was trying to debug this issue for quite some time and ended up writing an article on my website: https://web.neowutran.ovh/qubes/articles/nvidia.pdf |
@neowutran Nice! Looks like the problem is that writes to the
@marmarek which of those solutions do you think is better? |
@neowutran awesome writeup :)
Neither. See below. First lets take a closer look what the write actually is.
BTW @neowutran it would be much more convenient to log those values in hex ( EDIT: skip to the next chapter, the analysis here is correct, but doesn't lead to the real issue I don't know what the value before was (I'm sure the driver read it just before writing, logging a read might be useful too), but I guess it was just flipping bit 2 off and back on. It's bus mastering aka DMA. If my hypothesis about turning it off and then back on is correct, then the end state is the same as initial state, so it shouldn't break anything (regardless whether it was passed through the hardware or not). But since avoiding the write fixed the situation, that's probably not the case. All the config space writes (and reads) should end up in QEMU, which decides which one goes to the hardware, which one is emulated (so VM sees the written value, but hardware doesn't get it) and which one is discarded. That decision is for each bit separately. For the command register specifically, it's here: https://github.com/qemu/qemu/blob/master/hw/xen/xen_pt_config_init.c#L614-L623 {
.offset = PCI_COMMAND,
.size = 2,
.init_val = 0x0000,
.res_mask = 0xF880,
.emu_mask = 0x0743,
.init = xen_pt_common_reg_init,
.u.w.read = xen_pt_word_reg_read,
.u.w.write = xen_pt_cmd_reg_write,
}, What should happen is the As a side note, you can get more info from pciback about actual read/writes to the config space that happen (after QEMU apply the masks) by enabling debug of pciback in dom0:
The actual issue (I think):
I think the first case is much more likely. The behavior here indeed was changed (fixed) by the patch you identified. Before the patch, the The status register is defined as:
The value written to it (after discarding bits for the command register) would be 0x10. It sets the single emulated bit, but clear all the others. When The proper solution would be first to verify if writing just to the command register was intended. If so, it should use 16-bit write function. You can easily find where that write comes from by adding |
An, so this is an Nvidia driver bug that happens to go uncaught on bare hardware. |
@marmarek You are indeed correct, I just did a quick test:
|
Should this be closed with
R: upstream issue
|
Can this be fixed in Qubes? Since it didn't crash in the older versions and I can install the nvidia drivers in other hypervisors too without windows crashing |
@neowutran can you log the values read there? and also, can you log both read and writes with stubdomain without that patch? |
I used your trace command (thanks a lot, didn't know it existed)
Without the patch, the driver try to initialize itself multiple time before stopping trying. It does tens of thousands read and write to pci configuration. Full logs: https://web.neowutran.ovh/marmarek_gpu_pciconfig_logs_not_patched Most interesting part is
With the patch: https://web.neowutran.ovh/marmarek_gpu_pciconfig_logs_patched
What are you looking for ? |
Can you check also with older stubdomain (that worked before) and using not patched driver? |
Old stubdom, driver not patched: https://web.neowutran.ovh/marmarek_gpu_pciconfig_logs_old-stubdom_driver-not-patched I have a hard time understanding the logs. Recent stubdom, driver not patched (crashing): extract of the logs around the first references to reading things maching value "10040" for offset 4
Same thing, but with a old stubdom (without the RO enforcement, and without msi-x support ):
In both case, the write done to "command" field is a success, on the next read to offset 0x4, we can see that the value have been modified accordingly. Either I am confusing myself, or there is another kind of interaction between the driver and the hardware that I do not known. If these logs doesn't make sense to you, I can redo them later |
So, here it disables bus mastering. Is that the last write to this register before failing? Can you correlate whether is it before failing or after?
So, it reads 4 bytes (the read of 2 bytes is I guess from qemu to get ro bits values), writes 2 bytes (I guess this was originally 4 bytes). But in the non-patched version I don't see read of that value immediately after, only some time later. I guess we don't see full picture here, there may be also some MMIO access in between that aren't going through QEMU nor pciback. But looking into it a bit deeper, I think xen-pciback ignores higher bits of that write anyway: it calls write variant of appropriate size and the command register is registered with size 2. To be clear, I do all this analysis still to figure out if the issue is only in the nvidia driver, or also in some of the qubes code (patched qemu being the most likely candidate). |
The very last things (read or write, doesn't matter dword/word/byte) before the first formal message indicating a driver crash ("fallen off the bus") is :
However, between '[ 6.017628]' and '[ 6.056162]', that quite some time, so something interesting is probably happening in between, and it is not related to a call to pci_XX_config_XXX from the driver . I see no way the driver could make the difference between a dword write (command + status, but with status ignored ) and a word write (one of the driver patch that work, write to command only). The driver crash before re reading the value. For the hardware, there is no difference, between the RO enforcement or the patch to force a word write instead of a dword, for the GPU, the 'status' field is never written. So yeah, a layer between the driver and the hardware is not happy about the try write to a RO field and trash the GPU state. I am unable to find any reference to offset 0x6A
in the guest log. |
On the stubdom side, when the driver crash I see a lot of those messages
Definitely related to the issue, but I don't understand if it is useful |
I decided to get new log file. echo file drivers/xen/xen-pciback/conf_space.c +p > /sys/kernel/debug/dynamic_debug/control For the nvidia driver, I modified the os-pci.c file to print more logs. The modified function are below: NV_STATUS NV_API_CALL os_pci_read_byte(
void *handle,
NvU32 offset,
NvU8 *pReturnValue
)
{
if (offset >= NV_PCIE_CFG_MAX_OFFSET)
{
*pReturnValue = 0xff;
return NV_ERR_NOT_SUPPORTED;
}
printk(KERN_ALERT "NEOWUTRAN os_pci_read_byte : try to read %u \n",offset);
pci_read_config_byte( (struct pci_dev *) handle, offset, pReturnValue);
return NV_OK;
}
NV_STATUS NV_API_CALL os_pci_read_word(
void *handle,
NvU32 offset,
NvU16 *pReturnValue
)
{
if (offset >= NV_PCIE_CFG_MAX_OFFSET)
{
*pReturnValue = 0xffff;
return NV_ERR_NOT_SUPPORTED;
}
printk(KERN_ALERT "NEOWUTRAN os_pci_read_word : try to read %x \n",offset);
pci_read_config_word( (struct pci_dev *) handle, offset, pReturnValue);
return NV_OK;
}
NV_STATUS NV_API_CALL os_pci_read_dword(
void *handle,
NvU32 offset,
NvU32 *pReturnValue
)
{
if (offset >= NV_PCIE_CFG_MAX_OFFSET)
{
*pReturnValue = 0xffffffff;
return NV_ERR_NOT_SUPPORTED;
}
printk(KERN_ALERT "NEOWUTRAN os_pci_read_dword : try to read %x \n",offset);
WARN_ON(offset == 4);
pci_read_config_dword( (struct pci_dev *) handle, offset, pReturnValue);
return NV_OK;
}
NV_STATUS NV_API_CALL os_pci_write_byte(
void *handle,
NvU32 offset,
NvU8 value
)
{
if (offset >= NV_PCIE_CFG_MAX_OFFSET)
return NV_ERR_NOT_SUPPORTED;
printk(KERN_ALERT "NEOWUTRAN os_pci_write_byte : try to write %x %x \n",offset, value);
pci_write_config_byte( (struct pci_dev *) handle, offset, value);
return NV_OK;
}
NV_STATUS NV_API_CALL os_pci_write_word(
void *handle,
NvU32 offset,
NvU16 value
)
{
if (offset >= NV_PCIE_CFG_MAX_OFFSET)
return NV_ERR_NOT_SUPPORTED;
printk(KERN_ALERT "NEOWUTRAN os_pci_write_word : try to write %x %x \n",offset, value);
pci_write_config_word( (struct pci_dev *) handle, offset, value);
return NV_OK;
}
NV_STATUS NV_API_CALL os_pci_write_dword(
void *handle,
NvU32 offset,
NvU32 value
)
{
if (offset >= NV_PCIE_CFG_MAX_OFFSET)
return NV_ERR_NOT_SUPPORTED;
printk(KERN_ALERT "NEOWUTRAN os_pci_write_dword : try to write %x %x \n",offset, value);
WARN_ON(offset == 4);
pci_write_config_dword( (struct pci_dev *) handle, offset, value);
return NV_OK;
} Before the each of the following tests, in dom0 I ran To gather the logs in dom0, I executed the command First test: With the most recent release of qubes-vmm-xen-stubdom-linux. Second test: With 4.2.6 version of qubes-vmm-xen-stubdom-linux. |
While at it, may be also useful to log value that was read, as seen by the driver (the |
(
There is a lot of things that I don't understand regarding those things, and it is hard to find good and up to date explanations ) |
functions that I have modified: NV_STATUS NV_API_CALL os_pci_read_byte(
void *handle,
NvU32 offset,
NvU8 *pReturnValue
)
{
if (offset >= NV_PCIE_CFG_MAX_OFFSET)
{
*pReturnValue = 0xff;
return NV_ERR_NOT_SUPPORTED;
}
printk(KERN_ALERT "NEOWUTRAN os_pci_read_byte : try to read %u \n",offset);
pci_read_config_byte( (struct pci_dev *) handle, offset, pReturnValue);
printk(KERN_ALERT "NEOWUTRAN os_pci_read_byte : result : offset %x, value %x \n",offset, *pReturnValue);
return NV_OK;
}
NV_STATUS NV_API_CALL os_pci_read_word(
void *handle,
NvU32 offset,
NvU16 *pReturnValue
)
{
if (offset >= NV_PCIE_CFG_MAX_OFFSET)
{
*pReturnValue = 0xffff;
return NV_ERR_NOT_SUPPORTED;
}
printk(KERN_ALERT "NEOWUTRAN os_pci_read_word : try to read %x \n",offset);
pci_read_config_word( (struct pci_dev *) handle, offset, pReturnValue);
printk(KERN_ALERT "NEOWUTRAN os_pci_read_word : result : offset %x, value %x \n",offset, *pReturnValue);
return NV_OK;
}
NV_STATUS NV_API_CALL os_pci_read_dword(
void *handle,
NvU32 offset,
NvU32 *pReturnValue
)
{
if (offset >= NV_PCIE_CFG_MAX_OFFSET)
{
*pReturnValue = 0xffffffff;
return NV_ERR_NOT_SUPPORTED;
}
printk(KERN_ALERT "NEOWUTRAN os_pci_read_dword : try to read %x \n",offset);
WARN_ON(offset == 4);
pci_read_config_dword( (struct pci_dev *) handle, offset, pReturnValue);
printk(KERN_ALERT "NEOWUTRAN os_pci_read_dword : result : offset %x, value %x \n",offset, *pReturnValue);
return NV_OK;
} First test: With the most recent release of qubes-vmm-xen-stubdom-linux. Second test: With 4.2.6 version of qubes-vmm-xen-stubdom-linux. (didn't checked the logs, maybe tomorrow) |
I don't have any, but for config space it works more or less this way:
For MMIO access (all BARs), the flow is much simpler, since that device memory is mapped directly to the guest address space, so there is no trapping and emulation involved for the most part (QEMU or Xen may decide to trap some of the pages anyway, like the page for MSI-X config, but I don't think it's relevant here). Similarly for DMA, thanks to IOMMU, all the translation is done by the CPU, so no extra emulation is involved. Interrupts are a bit more complicated (there are different ways how device can report interrupts, and the different ways how they can be delivered to the guest), but I hope it isn't related to this issue either. |
I've checked the last set of logs and I'm confused. The values read and written both from the driver perspective and from dom0 perspective looks the same before the failure. So, it doesn't seem to be QEMU changing any of those values in unintended ways. So, my hypothesis that some read-only bit was not enforced is weaker - values that are read back by the driver are the same in both cases.
This looks like enumerating capabilities. And if I'm not mistaken, the timestamp suggests it happens even before Linux starts - so, likely stubdomain itself. This is more likely related to either qemu update or stubdomain's Linux update, not to the problematic patch. BTW, I've checked also the 0x6a register that is read/written directly after writing to the control register. It's MSI Message Control, and the bit 0 is "MSI Enable". So, I guess the write of 0x80 there is already part of a teardown on error, especially since it looks to be done from the Linux's PCI core code, not from nvidia directly (there is no log entry about that write on the guest side). And while at it, looking at the capabilities enumeration, this device does not support MSI-X, so (theoretically at least) the MSI-X code in stubdomain is irrelevant. FYI the capabilities here are:
Honestly, I'm running short on ideas. One remaining idea is that some internal qemu state changes differently due to the patch. Indeed, one of the reason for the patch is that write-handling functions would see previous value too. But I don't see any function using this feature right now. It could be also about enforcing read-only bits in some other register that is written but not read back afterwards (so, the difference doesn't show up in the log) or maybe register that is written from outside of the nvidia driver (so, again, doesn't show up in the log, as you add logging in the nvidia module only). But both of those hypothesis are quite weak, as changing write to the command register to be 16 bits instead of 32 bits fixes the issue. Are you absolutely sure it is just this patch that breaks it? Have you tried stubdomains that differ with only the 0005-hw-xen-xen_pt-Save-back-data-only-for-declared-regis.patch patch being applied or not? While the fix in the nvidia driver has been identified (at least in the Linux version), and in fact it's already fixed in the nvidia-open driver, I'm still confused why this patch breaks it. I suspect there is some subtle side effect that may affect other devices too... So, I have some more ideas, a bit more crazy this time:
This may answer the question if there is a difference in the emulated bits of those registers... |
Maybe the enumeration of capabilities have been added by qemu update or stubdomain linux update.
Yes and yes, and that was my main lead for my writeup. But I will redo that in the following days to be absolutely sure, the logs are quite weird.
For the Windows driver, there is a working patch in the writeup.
I will test that in the following days, but I think we are both pretty confident that we will likely won't find anything suspicious. If I am sure that the issue is created by "0005-hw-xen-xen_pt-Save-back-data-only-for-declared-regis.patch", Maybe need a patch in qemu to add more logs. Like printing something everytime to RO mask would actually block a write. I did another comparison, best to use "diff" between the data below.
latest stubdom, nvidia driver (crashing)
stubdom 4.2.6, nvidia driver
stubdom 4.2.6, nvidia-open driver
|
For lspci, include also As for extra logging in stubdomain, this may indeed make sense. Maybe even a verbose one. First of all, add |
We are probably closer to finding the root issue, but I am even more confused. Some informations: New stubdom I am using for testing: new implementation of os_pci_write_dword: NV_STATUS NV_API_CALL os_pci_write_dword(
void *handle,
NvU32 offset,
NvU32 value
)
{
if (offset >= NV_PCIE_CFG_MAX_OFFSET)
return NV_ERR_NOT_SUPPORTED;
if (offset == 4){
NvU16 command = 0;
NvU16 status = 0;
NvU32 command_status = 0;
pci_read_config_word( (struct pci_dev *) handle, 4, &command);
pci_read_config_word( (struct pci_dev *) handle, 6, &status);
pci_read_config_dword( (struct pci_dev *) handle, 4, &command_status);
printk(KERN_ALERT "NEOWUTRAN ; PRE_WRITE 4 ; command: %x ; status: %x ; command and status: %x ; %x \n",command, status, command_status, command_status == status << 2 & command);
}
printk(KERN_ALERT "NEOWUTRAN os_pci_write_dword : try to write %x %x \n",offset, value);
pci_write_config_dword( (struct pci_dev *) handle, offset, value);
if (offset == 4){
NvU16 command = 0;
NvU16 status = 0;
NvU32 command_status = 0;
pci_read_config_word( (struct pci_dev *) handle, 4, &command);
pci_read_config_word( (struct pci_dev *) handle, 6, &status);
pci_read_config_dword( (struct pci_dev *) handle, 4, &command_status);
printk(KERN_ALERT "NEOWUTRAN ; POST_WRITE 4 ; command: %x ; status: %x ; command and status: %x ; %x \n",command, status, command_status, command_status == status << 2 & command);
}
return NV_OK;
}
100% sure.
One log example for guest kernel logs:
Associated qemu log:
Associated xen logs:
I get strong 'WTF' for the line:
|
Full logs here: https://web.neowutran.ovh/v4_dom0_withoutro -> Forgot to activate the logs, will rerun them all https://web.neowutran.ovh/v4_dom0_lspci_after_driver_with_ro |
another interesting part (but after more thinking, #9003 (comment) is more interesting) I removed the nvidia driver and started the guest. With the RO patch and without the RO patch. Beginning of the qemu logs with the RO patch
Beginning of the qemu logs without the RO patch
|
I did some more tests with another custom version of stubdom https://github.com/neowutran/qubes-vmm-xen-stubdom-linux/tree/issue9003withROmodified Guest logs:
qemu logs
(Note: removed a section of this comment that was incorrect ) |
I identified a integer overflow bug in here https://github.com/QubesOS/qubes-vmm-xen-stubdom-linux/blob/main/qemu/patches/0005-hw-xen-xen_pt-Save-back-data-only-for-declared-regis.patch#L69 #include <stdint.h>
#include <stdio.h>
// gcc XXX.c ; ./a.out
int main(int argc, char *argv[]) {
int emul_len = 4;
uint32_t write_val = 0x100403;
// Integer overflow
uint32_t mask1 = ((1 << (emul_len * 8)) - 1);
printf("%x %x \n", mask1, write_val & mask1);
// The value here is probably calculated at compile time using int64 so the overflow doesn't occur ?
uint32_t mask2 = ((1 << (4 * 8)) - 1);
printf("%x %x \n", mask2, write_val & mask2);
// Fixed
uint32_t mask3 = (((uint64_t)1 << (emul_len * 8)) - 1);
printf("%x %x \n", mask3, write_val & mask3);
} Fixing the integer overflow seems to solve the driver issue ( https://github.com/neowutran/qubes-vmm-xen-stubdom-linux/blob/issue9003tryfix3/qemu/patches/0005-hw-xen-xen_pt-Save-back-data-only-for-declared-regis.patch#L69 ). |
* origin/pr/65: Fix integer overflow in qemu patch "hw-xen-xen_pt-Save-back-data-only-for-declared-regis" Fixes QubesOS/qubes-issues#8631 Fixes QubesOS/qubes-issues#8783 Fixes QubesOS/qubes-issues#9003
Thanks for fixing the issue. Will this be available to qubesos immediately to update? |
it is available in the testing repository |
@marmarek @neowutran You two are the heroes we need! 😁 I hit this today after spending two days in trial and error (a lot of it because I'm new to Qubes but not virtualization or qemu). Wanted to report that after running in dom0:
And restarting, my 4060 Ti is working in both a debian12 cuda focused VM (genai dev stuff) and a win10 VM with passthrough enabled, with NO changes to xen.xml, and 8 and 16GB of RAM (without crashing)! Many thanks!! |
Automated announcement from builder-github The package
|
Automated announcement from builder-github The package
|
How to file a helpful issue
Qubes OS release
R4.2
Brief summary
Qubes 4.1.2 doesn't crash while installing nvidia drivers but qubes 4.2 crashes SYSTEM_THREAD_EXCEPTION_NOT_HANDLED(Nvlddmkm.sys). Nvidia rtx 2070
Steps to reproduce
Gpu passthrough and install nvidia drivers
Expected behavior
Windows not crashing
Actual behavior
Windows 10 crashes while installing nvidia drivers SYSTEM_THREAD_EXCEPTION_NOT_HANDLED(Nvlddmkm.sys)
The text was updated successfully, but these errors were encountered: