Guest does not boot when resuming before Linux boot #3658

retrage · 2022-01-31T11:44:24Z

This issue appears when booting from firmware.

Start CH with --kernel=hypervisor-fw and --api-socket $API_SOCK
Run ch-remote --api-socket $API_SOCK pause before starting Linux
Run ch-remote --api-socket $API_SOCK resume
Both pause and resume command returns success, and the virtio devices are resuming as far as I see the log, but the guest does not boot after sending the resume.

CH version: cloud-hypervisor v21.0-67-g0dafd47a
Build command: cargo build

VM configuration:

$CH_BIN \                                                                       
  --kernel $FW_BIN \                                                            
  --disk path=$IMG \ 
  --cpus boot=1 \
  --memory size=1024M \
  --net "tap=,mac=,ip=,mask=" \
  --console off \
  --serial tty \
  --log-file $LOG \
  -vvvv \
  --api-socket $API_SOCK

Log:

cloud-hypervisor: 129.715515981s: <vmm> INFO:vmm/src/lib.rs:1340 -- API request event: VmPause(Sender { .. })
cloud-hypervisor: 129.715605731s: <vmm> INFO:virtio-devices/src/transport/pci_device.rs:697 -- _virtio-pci-_disk0: Device does not need activation
cloud-hypervisor: 129.715642006s: <vmm> INFO:virtio-devices/src/transport/pci_device.rs:697 -- _virtio-pci-_net1: Device does not need activation
cloud-hypervisor: 129.715673586s: <vmm> INFO:virtio-devices/src/transport/pci_device.rs:697 -- _virtio-pci-_rng: Device does not need activation
cloud-hypervisor: 129.716899947s: <vmm> INFO:virtio-devices/src/device.rs:342 -- Pausing virtio-block
cloud-hypervisor: 129.716982961s: <_disk0_q0> INFO:virtio-devices/src/epoll_helper.rs:139 -- PAUSE_EVENT received, pausing epoll loop
cloud-hypervisor: 129.717063684s: <vmm> INFO:virtio-devices/src/device.rs:342 -- Pausing virtio-rng
cloud-hypervisor: 129.717106452s: <vmm> INFO:virtio-devices/src/device.rs:342 -- Pausing virtio-net
cloud-hypervisor: 132.211447234s: <vmm> INFO:vmm/src/lib.rs:1340 -- API request event: VmResume(Sender { .. })
cloud-hypervisor: 132.211577872s: <vmm> INFO:virtio-devices/src/device.rs:364 -- Resuming virtio-block
cloud-hypervisor: 132.211618658s: <vmm> INFO:virtio-devices/src/device.rs:364 -- Resuming virtio-rng
cloud-hypervisor: 132.211652359s: <vmm> INFO:virtio-devices/src/device.rs:364 -- Resuming virtio-net

The text was updated successfully, but these errors were encountered:

rbradford · 2022-01-31T14:26:59Z

@retrage Unfortunately i'm having trouble reproducing. Can you:

Attach the full log
Attach gdb to the process after a failed resume and include the output of thread apply all bt

My hunch is that we're getting a vCPU process deadlock (like #3585) possibly around the same issue of pending virtio device barriers.

retrage · 2022-02-02T04:17:30Z

Thank you for taking a look into this issue.

Here is the full log and GDB output:
ch-resuming-log.txt
ch-threads-bt-log.txt

rbradford · 2022-02-02T10:28:29Z

Can you combine the log file and the serial output together (i.e. remove --log-file) so we can see the output from the fw interleaved with the log so we can see what the fw is doing when the pause/resume happens.

rbradford · 2022-02-02T16:15:04Z

@retrage I was able to reproduce with the debug build of rust-hypervisor-firmware. It looks like we end up in this tight loop:

https://github.com/cloud-hypervisor/rust-hypervisor-firmware/blob/main/src/block.rs#L278-L281

        // Check for the completion of the request
        while unsafe { core::ptr::read_volatile(&state.used.idx) } != state.avail.idx {
            core::sync::atomic::fence(core::sync::atomic::Ordering::Acquire);
        }

I think what is happening is that the asynchronous MMIO write to an ioeventfd is getting lost and the queue is not being processed resulting in the used.idx not being updated.

When running with rust-hypervisor-firmware which has a trivial virtio-block implementation it is necessary to try and process the queue as it does not use interrupts for knowing if the queue has been processed. Fixes: cloud-hypervisor#3658 Signed-off-by: Rob Bradford <robert.bradford@intel.com>

rbradford · 2022-02-02T16:32:09Z

@retrage With the patch in #3665 I wasn't able to reproduce any more but it should be noted I didn't have a 100% reproduce rate without it. Please can you test.

retrage · 2022-02-03T12:54:52Z

@rbradford I confirmed it fixed with #3665. Thanks!

As per this kernel documentation: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN, KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding operations are complete (and guest state is consistent) only after userspace has re-entered the kernel with KVM_RUN. The kernel side will first finish incomplete operations and then check for pending signals. The pending state of the operation is not preserved in state which is visible to userspace, thus userspace should ensure that the operation is completed before performing a live migration. Userspace can re-enter the guest with an unmasked signal pending or with the immediate_exit field set to complete pending operations without allowing any further instructions to be executed. Since we capture the state as part of the pause and override it as part of the resume we must ensure the state is consistent otherwise we will lose the results of the MMIO or PIO operation that caused the exit from which we paused. Fixes: cloud-hypervisor#3658 Signed-off-by: Rob Bradford <robert.bradford@intel.com>

As per this kernel documentation: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN, KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding operations are complete (and guest state is consistent) only after userspace has re-entered the kernel with KVM_RUN. The kernel side will first finish incomplete operations and then check for pending signals. The pending state of the operation is not preserved in state which is visible to userspace, thus userspace should ensure that the operation is completed before performing a live migration. Userspace can re-enter the guest with an unmasked signal pending or with the immediate_exit field set to complete pending operations without allowing any further instructions to be executed. Since we capture the state as part of the pause and override it as part of the resume we must ensure the state is consistent otherwise we will lose the results of the MMIO or PIO operation that caused the exit from which we paused. Fixes: #3658 Signed-off-by: Rob Bradford <robert.bradford@intel.com>

As per this kernel documentation: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN, KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding operations are complete (and guest state is consistent) only after userspace has re-entered the kernel with KVM_RUN. The kernel side will first finish incomplete operations and then check for pending signals. The pending state of the operation is not preserved in state which is visible to userspace, thus userspace should ensure that the operation is completed before performing a live migration. Userspace can re-enter the guest with an unmasked signal pending or with the immediate_exit field set to complete pending operations without allowing any further instructions to be executed. Since we capture the state as part of the pause and override it as part of the resume we must ensure the state is consistent otherwise we will lose the results of the MMIO or PIO operation that caused the exit from which we paused. Fixes: cloud-hypervisor#3658 Signed-off-by: Rob Bradford <robert.bradford@intel.com> (cherry picked from commit 5079123) Signed-off-by: Rob Bradford <robert.bradford@intel.com>

As per this kernel documentation: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN, KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding operations are complete (and guest state is consistent) only after userspace has re-entered the kernel with KVM_RUN. The kernel side will first finish incomplete operations and then check for pending signals. The pending state of the operation is not preserved in state which is visible to userspace, thus userspace should ensure that the operation is completed before performing a live migration. Userspace can re-enter the guest with an unmasked signal pending or with the immediate_exit field set to complete pending operations without allowing any further instructions to be executed. Since we capture the state as part of the pause and override it as part of the resume we must ensure the state is consistent otherwise we will lose the results of the MMIO or PIO operation that caused the exit from which we paused. Fixes: #3658 Signed-off-by: Rob Bradford <robert.bradford@intel.com> (cherry picked from commit 5079123) Signed-off-by: Rob Bradford <robert.bradford@intel.com>

retrage mentioned this issue Jan 31, 2022

GDB Stub Support #3575

Merged

rbradford mentioned this issue Feb 2, 2022

virtio-devices: Force the processing of the virtio-block queue on resume #3665

Closed

rbradford mentioned this issue Feb 7, 2022

vmm: Ensure that PIO and MMIO exits complete before pausing #3677

Merged

rbradford closed this as completed in #3677 Feb 7, 2022

rbradford added this to Done in Release 22.0 Mar 3, 2022

rbradford mentioned this issue Sep 4, 2022

Cloud hypervisor v21.1+ seems not work in centos 7 with kernel 3.10.0-1160.45.1.el7.x86_64 #4579

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guest does not boot when resuming before Linux boot #3658

Guest does not boot when resuming before Linux boot #3658

retrage commented Jan 31, 2022

rbradford commented Jan 31, 2022

retrage commented Feb 2, 2022

rbradford commented Feb 2, 2022

rbradford commented Feb 2, 2022

rbradford commented Feb 2, 2022

retrage commented Feb 3, 2022

Guest does not boot when resuming before Linux boot #3658

Guest does not boot when resuming before Linux boot #3658

Comments

retrage commented Jan 31, 2022

rbradford commented Jan 31, 2022

retrage commented Feb 2, 2022

rbradford commented Feb 2, 2022

rbradford commented Feb 2, 2022

rbradford commented Feb 2, 2022

retrage commented Feb 3, 2022