Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guest does not boot when resuming before Linux boot #3658

Closed
retrage opened this issue Jan 31, 2022 · 6 comments · Fixed by #3677
Closed

Guest does not boot when resuming before Linux boot #3658

retrage opened this issue Jan 31, 2022 · 6 comments · Fixed by #3677

Comments

@retrage
Copy link
Contributor

retrage commented Jan 31, 2022

This issue appears when booting from firmware.

  1. Start CH with --kernel=hypervisor-fw and --api-socket $API_SOCK
  2. Run ch-remote --api-socket $API_SOCK pause before starting Linux
  3. Run ch-remote --api-socket $API_SOCK resume
    Both pause and resume command returns success, and the virtio devices are resuming as far as I see the log, but the guest does not boot after sending the resume.

CH version: cloud-hypervisor v21.0-67-g0dafd47a
Build command: cargo build

VM configuration:

$CH_BIN \                                                                       
  --kernel $FW_BIN \                                                            
  --disk path=$IMG \ 
  --cpus boot=1 \
  --memory size=1024M \
  --net "tap=,mac=,ip=,mask=" \
  --console off \
  --serial tty \
  --log-file $LOG \
  -vvvv \
  --api-socket $API_SOCK

Log:

cloud-hypervisor: 129.715515981s: <vmm> INFO:vmm/src/lib.rs:1340 -- API request event: VmPause(Sender { .. })
cloud-hypervisor: 129.715605731s: <vmm> INFO:virtio-devices/src/transport/pci_device.rs:697 -- _virtio-pci-_disk0: Device does not need activation
cloud-hypervisor: 129.715642006s: <vmm> INFO:virtio-devices/src/transport/pci_device.rs:697 -- _virtio-pci-_net1: Device does not need activation
cloud-hypervisor: 129.715673586s: <vmm> INFO:virtio-devices/src/transport/pci_device.rs:697 -- _virtio-pci-_rng: Device does not need activation
cloud-hypervisor: 129.716899947s: <vmm> INFO:virtio-devices/src/device.rs:342 -- Pausing virtio-block
cloud-hypervisor: 129.716982961s: <_disk0_q0> INFO:virtio-devices/src/epoll_helper.rs:139 -- PAUSE_EVENT received, pausing epoll loop
cloud-hypervisor: 129.717063684s: <vmm> INFO:virtio-devices/src/device.rs:342 -- Pausing virtio-rng
cloud-hypervisor: 129.717106452s: <vmm> INFO:virtio-devices/src/device.rs:342 -- Pausing virtio-net
cloud-hypervisor: 132.211447234s: <vmm> INFO:vmm/src/lib.rs:1340 -- API request event: VmResume(Sender { .. })
cloud-hypervisor: 132.211577872s: <vmm> INFO:virtio-devices/src/device.rs:364 -- Resuming virtio-block
cloud-hypervisor: 132.211618658s: <vmm> INFO:virtio-devices/src/device.rs:364 -- Resuming virtio-rng
cloud-hypervisor: 132.211652359s: <vmm> INFO:virtio-devices/src/device.rs:364 -- Resuming virtio-net
@rbradford
Copy link
Member

@retrage Unfortunately i'm having trouble reproducing. Can you:

  • Attach the full log
  • Attach gdb to the process after a failed resume and include the output of thread apply all bt

My hunch is that we're getting a vCPU process deadlock (like #3585) possibly around the same issue of pending virtio device barriers.

@retrage
Copy link
Contributor Author

retrage commented Feb 2, 2022

Thank you for taking a look into this issue.

Here is the full log and GDB output:
ch-resuming-log.txt
ch-threads-bt-log.txt

@rbradford
Copy link
Member

Can you combine the log file and the serial output together (i.e. remove --log-file) so we can see the output from the fw interleaved with the log so we can see what the fw is doing when the pause/resume happens.

@rbradford
Copy link
Member

@retrage I was able to reproduce with the debug build of rust-hypervisor-firmware. It looks like we end up in this tight loop:

https://github.com/cloud-hypervisor/rust-hypervisor-firmware/blob/main/src/block.rs#L278-L281

        // Check for the completion of the request
        while unsafe { core::ptr::read_volatile(&state.used.idx) } != state.avail.idx {
            core::sync::atomic::fence(core::sync::atomic::Ordering::Acquire);
        }

I think what is happening is that the asynchronous MMIO write to an ioeventfd is getting lost and the queue is not being processed resulting in the used.idx not being updated.

rbradford added a commit to rbradford/cloud-hypervisor that referenced this issue Feb 2, 2022
When running with rust-hypervisor-firmware which has a trivial
virtio-block implementation it is necessary to try and process the queue
as it does not use interrupts for knowing if the queue has been
processed.

Fixes: cloud-hypervisor#3658

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
@rbradford
Copy link
Member

@retrage With the patch in #3665 I wasn't able to reproduce any more but it should be noted I didn't have a 100% reproduce rate without it. Please can you test.

@retrage
Copy link
Contributor Author

retrage commented Feb 3, 2022

@rbradford I confirmed it fixed with #3665. Thanks!

rbradford added a commit to rbradford/cloud-hypervisor that referenced this issue Feb 7, 2022
As per this kernel documentation:

      For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN,
      KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding
      operations are complete (and guest state is consistent) only after userspace
      has re-entered the kernel with KVM_RUN.  The kernel side will first finish
      incomplete operations and then check for pending signals.

      The pending state of the operation is not preserved in state which is
      visible to userspace, thus userspace should ensure that the operation is
      completed before performing a live migration.  Userspace can re-enter the
      guest with an unmasked signal pending or with the immediate_exit field set
      to complete pending operations without allowing any further instructions
      to be executed.

Since we capture the state as part of the pause and override it as part
of the resume we must ensure the state is consistent otherwise we will
lose the results of the MMIO or PIO operation that caused the exit from
which we paused.

Fixes: cloud-hypervisor#3658

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
rbradford added a commit that referenced this issue Feb 7, 2022
As per this kernel documentation:

      For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN,
      KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding
      operations are complete (and guest state is consistent) only after userspace
      has re-entered the kernel with KVM_RUN.  The kernel side will first finish
      incomplete operations and then check for pending signals.

      The pending state of the operation is not preserved in state which is
      visible to userspace, thus userspace should ensure that the operation is
      completed before performing a live migration.  Userspace can re-enter the
      guest with an unmasked signal pending or with the immediate_exit field set
      to complete pending operations without allowing any further instructions
      to be executed.

Since we capture the state as part of the pause and override it as part
of the resume we must ensure the state is consistent otherwise we will
lose the results of the MMIO or PIO operation that caused the exit from
which we paused.

Fixes: #3658

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
@rbradford rbradford added this to Done in Release 22.0 Mar 3, 2022
rbradford added a commit to rbradford/cloud-hypervisor that referenced this issue Mar 10, 2022
As per this kernel documentation:

      For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN,
      KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding
      operations are complete (and guest state is consistent) only after userspace
      has re-entered the kernel with KVM_RUN.  The kernel side will first finish
      incomplete operations and then check for pending signals.

      The pending state of the operation is not preserved in state which is
      visible to userspace, thus userspace should ensure that the operation is
      completed before performing a live migration.  Userspace can re-enter the
      guest with an unmasked signal pending or with the immediate_exit field set
      to complete pending operations without allowing any further instructions
      to be executed.

Since we capture the state as part of the pause and override it as part
of the resume we must ensure the state is consistent otherwise we will
lose the results of the MMIO or PIO operation that caused the exit from
which we paused.

Fixes: cloud-hypervisor#3658

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
(cherry picked from commit 5079123)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
rbradford added a commit that referenced this issue Mar 11, 2022
As per this kernel documentation:

      For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN,
      KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding
      operations are complete (and guest state is consistent) only after userspace
      has re-entered the kernel with KVM_RUN.  The kernel side will first finish
      incomplete operations and then check for pending signals.

      The pending state of the operation is not preserved in state which is
      visible to userspace, thus userspace should ensure that the operation is
      completed before performing a live migration.  Userspace can re-enter the
      guest with an unmasked signal pending or with the immediate_exit field set
      to complete pending operations without allowing any further instructions
      to be executed.

Since we capture the state as part of the pause and override it as part
of the resume we must ensure the state is consistent otherwise we will
lose the results of the MMIO or PIO operation that caused the exit from
which we paused.

Fixes: #3658

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
(cherry picked from commit 5079123)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
2 participants