-
-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos.tests.boot.biosUsb.x86_64-linux fails on hydra: "cpage out of range (5)" #170803
Comments
Might this be related to #15690 ? |
Possibly, the |
Even in the past weeks the job has been failing occasionally. I usually don't even look into the log anymore unless it failed twice. Locally I don't get an error on the same commit. Once Hydra's queue runner gets fixed, we'll see if it succeeds. |
Yes, I completely agree. Let's keep this ticket to track the instability, though.
Ok, I now think the
So it seems like there is something wonky in the USB communication between SeaBIOS and qemu. I had a bit of a look but the implementations look reasonable on both sides at first glance. I wonder if we could run this test with SeaBIOS debugging enabled, but using a different SeaBIOS was not as simple as adding |
Thanks for looking into this! I found this bug report https://mail.coreboot.org/hyperkitty/list/seabios@seabios.org/thread/OUTHT5ISSQJGXPNTUPY3O5E5EPZJCHM3 |
Great find! Indeed with that ISO it seems easier to reproduce (while loading the kernel - if you pass '-nographics' you can hit 'enter' and select '2' to load from the console) - perhaps it's just bigger? I also found it sometimes just hangs instead of hitting the cpage error, which might be what causes #15690. Adding some additional diagnostics shows:
Indeed that's an invalid USB packet: you can fit 20480 bytes in there, but only if you start at offset 0, since there's only 4 pages and each page is 4096 bytes. Adding a bunch of logging to qemu, seeing the following pattern:
It seems that qemu(?) 'writes back' the new offset (785+31=816), and somehow that value can 'leak' into the qTD structure for the next request for transferring 20480 bytes (which should start at offset 0, not 816). I can confirm I can reproduce the problem with seabios b3fa8577 and so far not with 1.13.0, though tbh nothing jumps out of me looking at the commits in there... |
Would you be able to bisect between b3fa8577 and 1.13.0? |
I could, though it's rather slow work, since the problem doesn't occur every time. Also it's not entirely clear it would help much: I noticed that when I add some debugging statements to seabios in just the right/wrong places, I can no longer reproduce the problem either... so even if we find the first commit that reproduces the problem, that might not be the commit that actually introduced the bug. |
After a painful bisect I found the same commit as the person in the bug report: b3fa857 "kvm: add support for reading tsc frequency from kvmclock". I can say quite confidently that commenting out this line (from that commit) or this one makes the problem go away. Also, sometimes instead of the cpage error I get "non queue head request in async schedule". This all seems to point at a concurrency/timing problem. Command used to test the issue:
hit Enter and wait a few seconds after "Loading initrd...". |
aha interesting... so on my machine this updates |
Also it updates |
So before b3fa857 it would use the 3.5 MHz PM TIMER, and since then it uses the 2.4GHz TSC (Time-Stamp Counter) timer. When I comment out I don't see any code that obviously would be impacted by a different clock: where the clock is used in EHCI it's usually in timeouts, and AFAICS we're not hitting any timeouts in this scenario in either the 'happy' or the 'problematic' change. So preliminary it seems the different timer causes a timing difference that triggers the bug, but I don't see strong evidence yet that the timer itself is really 'wrong'. The qemu controller uses DMA calls ( So it looks like either seabios writes the 'new' 0 offset too early (before qemu writes the 'old' offset X) or too late (after qemu reads the new offset, which should be 0 but in the problematic case is X). I changed qemu to retry fetching the offset when the values don't make sense (https://gitlab.com/raboof/qemu/-/commit/3692a11ff3e2b96ec596d2260e921369e8ba4729), and with that change I can still reproduce the problem:
OK, not very scientific, but that kinda suggests qemu does the writeback after seabios writes the '0' offset. If that's true, then the question becomes: is seabios writing too early, or is qemu writing too late? I haven't figured out yet how EHCI is supposed to guard against such race conditions, but https://gitlab.com/raboof/qemu/-/blob/master/hw/usb/hcd-ehci.c#L1937 is making me slightly nervous ;) |
That's bone-chilling, but on a more practical note: should we just patch |
😆
That seems a bit heavy-handed, but I guess we could use a build with
I agree I don't think it's supposed to change the speed at which things run, it's just another way to keep the time |
Right. Actually we'd need to change the seabios shipped with qemu, not the one in nixpkgs. |
I think you can override the one shipped with qemy by passing a |
Opened #172059 |
This patch fixes a problem that caused the NixOS tests that tested booting from USB to fail periodically. Fixes NixOS#15690, fixes NixOS#104642, fixes NixOS#170803 Also submitted upstream at https://lists.nongnu.org/archive/html/qemu-devel/2022-05/msg01484.html
Neat! |
Thanks a lot for finding that post by Lin Ma and bouncing ideas here, without that I'd surely have given up much earlier ;) |
The 'active' bit passes control over a qTD between the guest and the controller: set to 1 by guest to enable execution by the controller, and the controller sets it to '0' to hand back control to the guest. ehci_state_writeback write two dwords to main memory using DMA: the third dword of the qTD (containing dt, total bytes to transfer, cpage, cerr and status) and the fourth dword of the qTD (containing the offset). This commit makes sure the fourth dword is written before the third, avoiding a race condition where a new offset written into the qTD by the guest after it observed the status going to go to '0' gets overwritten by a 'late' DMA writeback of the previous offset. This race condition could lead to 'cpage out of range (5)' errors, and reproduced by: ./qemu-system-x86_64 -enable-kvm -bios $SEABIOS/bios.bin -m 4096 -device usb-ehci -blockdev driver=file,read-only=on,filename=/home/aengelen/Downloads/openSUSE-Tumbleweed-DVD-i586-Snapshot20220428-Media.iso,node-name=iso -device usb-storage,drive=iso,bootindex=0 -chardev pipe,id=shell,path=/tmp/pipe -device virtio-serial -device virtconsole,chardev=shell -device virtio-rng-pci -serial mon:stdio -nographic (press a key, select 'Installation' (2), and accept the default values. On my machine the 'cpage out of range' is reproduced while loading the Linux Kernel about once per 7 attempts. With the fix in this commit it no longer fails) This problem was previously reported as a seabios problem in https://mail.coreboot.org/hyperkitty/list/seabios@seabios.org/thread/OUTHT5ISSQJGXPNTUPY3O5E5EPZJCHM3/ and as a nixos CI build failure in NixOS/nixpkgs#170803 Signed-off-by: Arnout Engelen <arnout@bzzt.net> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The 'active' bit passes control over a qTD between the guest and the controller: set to 1 by guest to enable execution by the controller, and the controller sets it to '0' to hand back control to the guest. ehci_state_writeback write two dwords to main memory using DMA: the third dword of the qTD (containing dt, total bytes to transfer, cpage, cerr and status) and the fourth dword of the qTD (containing the offset). This commit makes sure the fourth dword is written before the third, avoiding a race condition where a new offset written into the qTD by the guest after it observed the status going to go to '0' gets overwritten by a 'late' DMA writeback of the previous offset. This race condition could lead to 'cpage out of range (5)' errors, and reproduced by: ./qemu-system-x86_64 -enable-kvm -bios $SEABIOS/bios.bin -m 4096 -device usb-ehci -blockdev driver=file,read-only=on,filename=/home/aengelen/Downloads/openSUSE-Tumbleweed-DVD-i586-Snapshot20220428-Media.iso,node-name=iso -device usb-storage,drive=iso,bootindex=0 -chardev pipe,id=shell,path=/tmp/pipe -device virtio-serial -device virtconsole,chardev=shell -device virtio-rng-pci -serial mon:stdio -nographic (press a key, select 'Installation' (2), and accept the default values. On my machine the 'cpage out of range' is reproduced while loading the Linux Kernel about once per 7 attempts. With the fix in this commit it no longer fails) This problem was previously reported as a seabios problem in https://mail.coreboot.org/hyperkitty/list/seabios@seabios.org/thread/OUTHT5ISSQJGXPNTUPY3O5E5EPZJCHM3/ and as a nixos CI build failure in NixOS/nixpkgs#170803 Signed-off-by: Arnout Engelen <arnout@bzzt.net> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The 'active' bit passes control over a qTD between the guest and the controller: set to 1 by guest to enable execution by the controller, and the controller sets it to '0' to hand back control to the guest. ehci_state_writeback write two dwords to main memory using DMA: the third dword of the qTD (containing dt, total bytes to transfer, cpage, cerr and status) and the fourth dword of the qTD (containing the offset). This commit makes sure the fourth dword is written before the third, avoiding a race condition where a new offset written into the qTD by the guest after it observed the status going to go to '0' gets overwritten by a 'late' DMA writeback of the previous offset. This race condition could lead to 'cpage out of range (5)' errors, and reproduced by: ./qemu-system-x86_64 -enable-kvm -bios $SEABIOS/bios.bin -m 4096 -device usb-ehci -blockdev driver=file,read-only=on,filename=/home/aengelen/Downloads/openSUSE-Tumbleweed-DVD-i586-Snapshot20220428-Media.iso,node-name=iso -device usb-storage,drive=iso,bootindex=0 -chardev pipe,id=shell,path=/tmp/pipe -device virtio-serial -device virtconsole,chardev=shell -device virtio-rng-pci -serial mon:stdio -nographic (press a key, select 'Installation' (2), and accept the default values. On my machine the 'cpage out of range' is reproduced while loading the Linux Kernel about once per 7 attempts. With the fix in this commit it no longer fails) This problem was previously reported as a seabios problem in https://mail.coreboot.org/hyperkitty/list/seabios@seabios.org/thread/OUTHT5ISSQJGXPNTUPY3O5E5EPZJCHM3/ and as a nixos CI build failure in NixOS/nixpkgs#170803 Signed-off-by: Arnout Engelen <arnout@bzzt.net> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The 'active' bit passes control over a qTD between the guest and the controller: set to 1 by guest to enable execution by the controller, and the controller sets it to '0' to hand back control to the guest. ehci_state_writeback write two dwords to main memory using DMA: the third dword of the qTD (containing dt, total bytes to transfer, cpage, cerr and status) and the fourth dword of the qTD (containing the offset). This commit makes sure the fourth dword is written before the third, avoiding a race condition where a new offset written into the qTD by the guest after it observed the status going to go to '0' gets overwritten by a 'late' DMA writeback of the previous offset. This race condition could lead to 'cpage out of range (5)' errors, and reproduced by: ./qemu-system-x86_64 -enable-kvm -bios $SEABIOS/bios.bin -m 4096 -device usb-ehci -blockdev driver=file,read-only=on,filename=/home/aengelen/Downloads/openSUSE-Tumbleweed-DVD-i586-Snapshot20220428-Media.iso,node-name=iso -device usb-storage,drive=iso,bootindex=0 -chardev pipe,id=shell,path=/tmp/pipe -device virtio-serial -device virtconsole,chardev=shell -device virtio-rng-pci -serial mon:stdio -nographic (press a key, select 'Installation' (2), and accept the default values. On my machine the 'cpage out of range' is reproduced while loading the Linux Kernel about once per 7 attempts. With the fix in this commit it no longer fails) This problem was previously reported as a seabios problem in https://mail.coreboot.org/hyperkitty/list/seabios@seabios.org/thread/OUTHT5ISSQJGXPNTUPY3O5E5EPZJCHM3/ and as a nixos CI build failure in NixOS/nixpkgs#170803 Signed-off-by: Arnout Engelen <arnout@bzzt.net> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The 'active' bit passes control over a qTD between the guest and the controller: set to 1 by guest to enable execution by the controller, and the controller sets it to '0' to hand back control to the guest. ehci_state_writeback write two dwords to main memory using DMA: the third dword of the qTD (containing dt, total bytes to transfer, cpage, cerr and status) and the fourth dword of the qTD (containing the offset). This commit makes sure the fourth dword is written before the third, avoiding a race condition where a new offset written into the qTD by the guest after it observed the status going to go to '0' gets overwritten by a 'late' DMA writeback of the previous offset. This race condition could lead to 'cpage out of range (5)' errors, and reproduced by: ./qemu-system-x86_64 -enable-kvm -bios $SEABIOS/bios.bin -m 4096 -device usb-ehci -blockdev driver=file,read-only=on,filename=/home/aengelen/Downloads/openSUSE-Tumbleweed-DVD-i586-Snapshot20220428-Media.iso,node-name=iso -device usb-storage,drive=iso,bootindex=0 -chardev pipe,id=shell,path=/tmp/pipe -device virtio-serial -device virtconsole,chardev=shell -device virtio-rng-pci -serial mon:stdio -nographic (press a key, select 'Installation' (2), and accept the default values. On my machine the 'cpage out of range' is reproduced while loading the Linux Kernel about once per 7 attempts. With the fix in this commit it no longer fails) This problem was previously reported as a seabios problem in https://mail.coreboot.org/hyperkitty/list/seabios@seabios.org/thread/OUTHT5ISSQJGXPNTUPY3O5E5EPZJCHM3/ and as a nixos CI build failure in NixOS/nixpkgs#170803 Signed-off-by: Arnout Engelen <arnout@bzzt.net> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Git-commit: f471e8b References: bsc#1192115 The 'active' bit passes control over a qTD between the guest and the controller: set to 1 by guest to enable execution by the controller, and the controller sets it to '0' to hand back control to the guest. ehci_state_writeback write two dwords to main memory using DMA: the third dword of the qTD (containing dt, total bytes to transfer, cpage, cerr and status) and the fourth dword of the qTD (containing the offset). This commit makes sure the fourth dword is written before the third, avoiding a race condition where a new offset written into the qTD by the guest after it observed the status going to go to '0' gets overwritten by a 'late' DMA writeback of the previous offset. This race condition could lead to 'cpage out of range (5)' errors, and reproduced by: ./qemu-system-x86_64 -enable-kvm -bios $SEABIOS/bios.bin -m 4096 -device usb-ehci -blockdev driver=file,read-only=on,filename=/home/aengelen/Downloads/openSUSE-Tumbleweed-DVD-i586-Snapshot20220428-Media.iso,node-name=iso -device usb-storage,drive=iso,bootindex=0 -chardev pipe,id=shell,path=/tmp/pipe -device virtio-serial -device virtconsole,chardev=shell -device virtio-rng-pci -serial mon:stdio -nographic (press a key, select 'Installation' (2), and accept the default values. On my machine the 'cpage out of range' is reproduced while loading the Linux Kernel about once per 7 attempts. With the fix in this commit it no longer fails) This problem was previously reported as a seabios problem in https://mail.coreboot.org/hyperkitty/list/seabios@seabios.org/thread/OUTHT5ISSQJGXPNTUPY3O5E5EPZJCHM3/ and as a nixos CI build failure in NixOS/nixpkgs#170803 Signed-off-by: Arnout Engelen <arnout@bzzt.net> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Lin Ma <lma@suse.com> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Git-commit: f471e8b References: bsc#1192115 The 'active' bit passes control over a qTD between the guest and the controller: set to 1 by guest to enable execution by the controller, and the controller sets it to '0' to hand back control to the guest. ehci_state_writeback write two dwords to main memory using DMA: the third dword of the qTD (containing dt, total bytes to transfer, cpage, cerr and status) and the fourth dword of the qTD (containing the offset). This commit makes sure the fourth dword is written before the third, avoiding a race condition where a new offset written into the qTD by the guest after it observed the status going to go to '0' gets overwritten by a 'late' DMA writeback of the previous offset. This race condition could lead to 'cpage out of range (5)' errors, and reproduced by: ./qemu-system-x86_64 -enable-kvm -bios $SEABIOS/bios.bin -m 4096 -device usb-ehci -blockdev driver=file,read-only=on,filename=/home/aengelen/Downloads/openSUSE-Tumbleweed-DVD-i586-Snapshot20220428-Media.iso,node-name=iso -device usb-storage,drive=iso,bootindex=0 -chardev pipe,id=shell,path=/tmp/pipe -device virtio-serial -device virtconsole,chardev=shell -device virtio-rng-pci -serial mon:stdio -nographic (press a key, select 'Installation' (2), and accept the default values. On my machine the 'cpage out of range' is reproduced while loading the Linux Kernel about once per 7 attempts. With the fix in this commit it no longer fails) This problem was previously reported as a seabios problem in https://mail.coreboot.org/hyperkitty/list/seabios@seabios.org/thread/OUTHT5ISSQJGXPNTUPY3O5E5EPZJCHM3/ and as a nixos CI build failure in NixOS/nixpkgs#170803 Signed-off-by: Arnout Engelen <arnout@bzzt.net> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Lin Ma <lma@suse.com> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Git-commit: f471e8b References: bsc#1192115 The 'active' bit passes control over a qTD between the guest and the controller: set to 1 by guest to enable execution by the controller, and the controller sets it to '0' to hand back control to the guest. ehci_state_writeback write two dwords to main memory using DMA: the third dword of the qTD (containing dt, total bytes to transfer, cpage, cerr and status) and the fourth dword of the qTD (containing the offset). This commit makes sure the fourth dword is written before the third, avoiding a race condition where a new offset written into the qTD by the guest after it observed the status going to go to '0' gets overwritten by a 'late' DMA writeback of the previous offset. This race condition could lead to 'cpage out of range (5)' errors, and reproduced by: ./qemu-system-x86_64 -enable-kvm -bios $SEABIOS/bios.bin -m 4096 -device usb-ehci -blockdev driver=file,read-only=on,filename=/home/aengelen/Downloads/openSUSE-Tumbleweed-DVD-i586-Snapshot20220428-Media.iso,node-name=iso -device usb-storage,drive=iso,bootindex=0 -chardev pipe,id=shell,path=/tmp/pipe -device virtio-serial -device virtconsole,chardev=shell -device virtio-rng-pci -serial mon:stdio -nographic (press a key, select 'Installation' (2), and accept the default values. On my machine the 'cpage out of range' is reproduced while loading the Linux Kernel about once per 7 attempts. With the fix in this commit it no longer fails) This problem was previously reported as a seabios problem in https://mail.coreboot.org/hyperkitty/list/seabios@seabios.org/thread/OUTHT5ISSQJGXPNTUPY3O5E5EPZJCHM3/ and as a nixos CI build failure in NixOS/nixpkgs#170803 Signed-off-by: Arnout Engelen <arnout@bzzt.net> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Lin Ma <lma@suse.com> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Git-commit: f471e8b References: bsc#1192115 The 'active' bit passes control over a qTD between the guest and the controller: set to 1 by guest to enable execution by the controller, and the controller sets it to '0' to hand back control to the guest. ehci_state_writeback write two dwords to main memory using DMA: the third dword of the qTD (containing dt, total bytes to transfer, cpage, cerr and status) and the fourth dword of the qTD (containing the offset). This commit makes sure the fourth dword is written before the third, avoiding a race condition where a new offset written into the qTD by the guest after it observed the status going to go to '0' gets overwritten by a 'late' DMA writeback of the previous offset. This race condition could lead to 'cpage out of range (5)' errors, and reproduced by: ./qemu-system-x86_64 -enable-kvm -bios $SEABIOS/bios.bin -m 4096 -device usb-ehci -blockdev driver=file,read-only=on,filename=/home/aengelen/Downloads/openSUSE-Tumbleweed-DVD-i586-Snapshot20220428-Media.iso,node-name=iso -device usb-storage,drive=iso,bootindex=0 -chardev pipe,id=shell,path=/tmp/pipe -device virtio-serial -device virtconsole,chardev=shell -device virtio-rng-pci -serial mon:stdio -nographic (press a key, select 'Installation' (2), and accept the default values. On my machine the 'cpage out of range' is reproduced while loading the Linux Kernel about once per 7 attempts. With the fix in this commit it no longer fails) This problem was previously reported as a seabios problem in https://mail.coreboot.org/hyperkitty/list/seabios@seabios.org/thread/OUTHT5ISSQJGXPNTUPY3O5E5EPZJCHM3/ and as a nixos CI build failure in NixOS/nixpkgs#170803 Signed-off-by: Arnout Engelen <arnout@bzzt.net> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Lin Ma <lma@suse.com> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Git-commit: f471e8b References: bsc#1192115 The 'active' bit passes control over a qTD between the guest and the controller: set to 1 by guest to enable execution by the controller, and the controller sets it to '0' to hand back control to the guest. ehci_state_writeback write two dwords to main memory using DMA: the third dword of the qTD (containing dt, total bytes to transfer, cpage, cerr and status) and the fourth dword of the qTD (containing the offset). This commit makes sure the fourth dword is written before the third, avoiding a race condition where a new offset written into the qTD by the guest after it observed the status going to go to '0' gets overwritten by a 'late' DMA writeback of the previous offset. This race condition could lead to 'cpage out of range (5)' errors, and reproduced by: ./qemu-system-x86_64 -enable-kvm -bios $SEABIOS/bios.bin -m 4096 -device usb-ehci -blockdev driver=file,read-only=on,filename=/home/aengelen/Downloads/openSUSE-Tumbleweed-DVD-i586-Snapshot20220428-Media.iso,node-name=iso -device usb-storage,drive=iso,bootindex=0 -chardev pipe,id=shell,path=/tmp/pipe -device virtio-serial -device virtconsole,chardev=shell -device virtio-rng-pci -serial mon:stdio -nographic (press a key, select 'Installation' (2), and accept the default values. On my machine the 'cpage out of range' is reproduced while loading the Linux Kernel about once per 7 attempts. With the fix in this commit it no longer fails) This problem was previously reported as a seabios problem in https://mail.coreboot.org/hyperkitty/list/seabios@seabios.org/thread/OUTHT5ISSQJGXPNTUPY3O5E5EPZJCHM3/ and as a nixos CI build failure in NixOS/nixpkgs#170803 Signed-off-by: Arnout Engelen <arnout@bzzt.net> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Lin Ma <lma@suse.com> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Git-commit: f471e8b References: bsc#1192115 The 'active' bit passes control over a qTD between the guest and the controller: set to 1 by guest to enable execution by the controller, and the controller sets it to '0' to hand back control to the guest. ehci_state_writeback write two dwords to main memory using DMA: the third dword of the qTD (containing dt, total bytes to transfer, cpage, cerr and status) and the fourth dword of the qTD (containing the offset). This commit makes sure the fourth dword is written before the third, avoiding a race condition where a new offset written into the qTD by the guest after it observed the status going to go to '0' gets overwritten by a 'late' DMA writeback of the previous offset. This race condition could lead to 'cpage out of range (5)' errors, and reproduced by: ./qemu-system-x86_64 -enable-kvm -bios $SEABIOS/bios.bin -m 4096 -device usb-ehci -blockdev driver=file,read-only=on,filename=/home/aengelen/Downloads/openSUSE-Tumbleweed-DVD-i586-Snapshot20220428-Media.iso,node-name=iso -device usb-storage,drive=iso,bootindex=0 -chardev pipe,id=shell,path=/tmp/pipe -device virtio-serial -device virtconsole,chardev=shell -device virtio-rng-pci -serial mon:stdio -nographic (press a key, select 'Installation' (2), and accept the default values. On my machine the 'cpage out of range' is reproduced while loading the Linux Kernel about once per 7 attempts. With the fix in this commit it no longer fails) This problem was previously reported as a seabios problem in https://mail.coreboot.org/hyperkitty/list/seabios@seabios.org/thread/OUTHT5ISSQJGXPNTUPY3O5E5EPZJCHM3/ and as a nixos CI build failure in NixOS/nixpkgs#170803 Signed-off-by: Arnout Engelen <arnout@bzzt.net> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Lin Ma <lma@suse.com> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Git-commit: f471e8b References: bsc#1192115 The 'active' bit passes control over a qTD between the guest and the controller: set to 1 by guest to enable execution by the controller, and the controller sets it to '0' to hand back control to the guest. ehci_state_writeback write two dwords to main memory using DMA: the third dword of the qTD (containing dt, total bytes to transfer, cpage, cerr and status) and the fourth dword of the qTD (containing the offset). This commit makes sure the fourth dword is written before the third, avoiding a race condition where a new offset written into the qTD by the guest after it observed the status going to go to '0' gets overwritten by a 'late' DMA writeback of the previous offset. This race condition could lead to 'cpage out of range (5)' errors, and reproduced by: ./qemu-system-x86_64 -enable-kvm -bios $SEABIOS/bios.bin -m 4096 -device usb-ehci -blockdev driver=file,read-only=on,filename=/home/aengelen/Downloads/openSUSE-Tumbleweed-DVD-i586-Snapshot20220428-Media.iso,node-name=iso -device usb-storage,drive=iso,bootindex=0 -chardev pipe,id=shell,path=/tmp/pipe -device virtio-serial -device virtconsole,chardev=shell -device virtio-rng-pci -serial mon:stdio -nographic (press a key, select 'Installation' (2), and accept the default values. On my machine the 'cpage out of range' is reproduced while loading the Linux Kernel about once per 7 attempts. With the fix in this commit it no longer fails) This problem was previously reported as a seabios problem in https://mail.coreboot.org/hyperkitty/list/seabios@seabios.org/thread/OUTHT5ISSQJGXPNTUPY3O5E5EPZJCHM3/ and as a nixos CI build failure in NixOS/nixpkgs#170803 Signed-off-by: Arnout Engelen <arnout@bzzt.net> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Lin Ma <lma@suse.com> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Git-commit f471e8b References: bsc#1192115 The 'active' bit passes control over a qTD between the guest and the controller: set to 1 by guest to enable execution by the controller, and the controller sets it to '0' to hand back control to the guest. ehci_state_writeback write two dwords to main memory using DMA: the third dword of the qTD (containing dt, total bytes to transfer, cpage, cerr and status) and the fourth dword of the qTD (containing the offset). This commit makes sure the fourth dword is written before the third, avoiding a race condition where a new offset written into the qTD by the guest after it observed the status going to go to '0' gets overwritten by a 'late' DMA writeback of the previous offset. This race condition could lead to 'cpage out of range (5)' errors, and reproduced by: ./qemu-system-x86_64 -enable-kvm -bios $SEABIOS/bios.bin -m 4096 -device usb-ehci -blockdev driver=file,read-only=on,filename=/home/aengelen/Downloads/openSUSE-Tumbleweed-DVD-i586-Snapshot20220428-Media.iso,node-name=iso -device usb-storage,drive=iso,bootindex=0 -chardev pipe,id=shell,path=/tmp/pipe -device virtio-serial -device virtconsole,chardev=shell -device virtio-rng-pci -serial mon:stdio -nographic (press a key, select 'Installation' (2), and accept the default values. On my machine the 'cpage out of range' is reproduced while loading the Linux Kernel about once per 7 attempts. With the fix in this commit it no longer fails) This problem was previously reported as a seabios problem in https://mail.coreboot.org/hyperkitty/list/seabios@seabios.org/thread/OUTHT5ISSQJGXPNTUPY3O5E5EPZJCHM3/ and as a nixos CI build failure in NixOS/nixpkgs#170803 Signed-off-by: Arnout Engelen <arnout@bzzt.net> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Lin Ma <lma@suse.com> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
mainline inclusion commit f471e8b category: bugfix --------------------------------------------------------------- The 'active' bit passes control over a qTD between the guest and the controller: set to 1 by guest to enable execution by the controller, and the controller sets it to '0' to hand back control to the guest. ehci_state_writeback write two dwords to main memory using DMA: the third dword of the qTD (containing dt, total bytes to transfer, cpage, cerr and status) and the fourth dword of the qTD (containing the offset). This commit makes sure the fourth dword is written before the third, avoiding a race condition where a new offset written into the qTD by the guest after it observed the status going to go to '0' gets overwritten by a 'late' DMA writeback of the previous offset. This race condition could lead to 'cpage out of range (5)' errors, and reproduced by: ./qemu-system-x86_64 -enable-kvm -bios $SEABIOS/bios.bin -m 4096 -device usb-ehci -blockdev driver=file,read-only=on,filename=/home/aengelen/Downloads/openSUSE-Tumbleweed-DVD-i586-Snapshot20220428-Media.iso,node-name=iso -device usb-storage,drive=iso,bootindex=0 -chardev pipe,id=shell,path=/tmp/pipe -device virtio-serial -device virtconsole,chardev=shell -device virtio-rng-pci -serial mon:stdio -nographic (press a key, select 'Installation' (2), and accept the default values. On my machine the 'cpage out of range' is reproduced while loading the Linux Kernel about once per 7 attempts. With the fix in this commit it no longer fails) This problem was previously reported as a seabios problem in https://mail.coreboot.org/hyperkitty/list/seabios@seabios.org/thread/OUTHT5ISSQJGXPNTUPY3O5E5EPZJCHM3/ and as a nixos CI build failure in NixOS/nixpkgs#170803 Signed-off-by: Arnout Engelen <arnout@bzzt.net> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: tangbinzy <tangbin_yewu@cmss.chinamobile.com>
https://hydra.nixos.org/build/174964149
Specifically:
For comparison, a successful run loads bzImage successfully:
The text was updated successfully, but these errors were encountered: