Disk hot-add after vm.restore causes I/O errors in guest #6265

praveen-pk · 2024-03-04T21:22:36Z

Describe the bug

hot-adding a disk after vm.restore causes I/O errors in guest.

To Reproduce
Steps to reproduce the behaviour:

cloud-hypervisor --version
cloud-hypervisor v38.0-73-gd245e624

export API_SOCKET="/tmp/ch-socket"
export SNAPSHOT_DIR="${HOME}/ch-snapshot"

# Start the VM
/usr/bin/cloud-hypervisor --api-socket ${API_SOCKET} --kernel hypervisor-fw \
--disk path=focal-server-cloudimg-amd64.raw \
--cpus boot=2 --memory size=1024M --net tap=,mac=,ip=,mask= --serial tty --console off --seccomp log

# Pause and snapshot
rm -rf ${SNAPSHOT_DIR}/*
ch-remote --api-socket=${API_SOCKET} pause
ch-remote --api-socket=${API_SOCKET} snapshot file://${SNAPSHOT_DIR}

# clean up/kill the remnant ch process

/usr/bin/cloud-hypervisor \
    --api-socket ${API_SOCKET} \
    --restore source_url=file://${SNAPSHOT_DIR}

# resume the VM
ch-remote --api-socket=${API_SOCKET} resume


#Guest works just fine at this point

testuser@guest:~$ ls /dev/vd*
/dev/vda  /dev/vda1  /dev/vda14  /dev/vda15
testuser@guest:~$ uname -r
5.4.0-172-generic


# Create a test file
dd if=/dev/zero of=/tmp/test.img bs=1M count=10

# Hot add the file to the VM
$ ch-remote --api-socket=${API_SOCKET} add-disk path=/tmp/test.img 
{"id":"_disk2","bdf":"0000:00:04.0"}

Once a new disk it hot-added, I see the following failure messages in guest's dmesg:

[  192.747265] pci 0000:00:04.0: [1af4:1042] type 00 class 0x018000
[  192.747619] pci 0000:00:04.0: reg 0x10: [mem 0xe7f00000-0xe7f7ffff]
[  192.752451] pci 0000:00:04.0: BAR 0: assigned [mem 0xc0000000-0xc007ffff]
[  192.752725] virtio-pci 0000:00:04.0: enabling device (0000 -> 0002)
[  192.755663] virtio_blk virtio3: [vdb] 20480 512-byte logical blocks (10.5 MB/10.0 MiB)
[  192.782113] blk_update_request: I/O error, dev vda, sector 3545088 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
[  192.784566] Buffer I/O error on device vda1, logical block 414720
[  192.786015] blk_update_request: I/O error, dev vda, sector 464840 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
[  192.788176] Buffer I/O error on device vda1, logical block 29689
[  192.789402] blk_update_request: I/O error, dev vda, sector 465048 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
[  192.791435] Buffer I/O error on device vda1, logical block 29715
[  192.782113] blk_update_request: I/O error, dev vda, sector 35[  192.792628] blk_update_request: I/O error, dev vda, sector 465280 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
45088 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
[  192.784561] EXT4-fs warning (device vda1): ext4_end_bio:311: I/O error 10 writing to inode 72950 (offset 0 size 0 starting block 443137)[  192.798883] Buffer I/O error on device vda1, logical block 29744
[  192.800088] blk_update_request: I/O error, dev vda, sector 465440 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
[  192.802141] Buffer I/O error on device vda1, logical block 29764
[  192.803346] blk_update_request: I/O error, dev vda, sector 465528 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0

[  192.805395] Buffer I/O error on device vda1, logical block 29775
[  192.806760] blk_update_request: I/O error, dev vda, sector 465592 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
[  192.808776] Buffer I/O error on device vda1, logical block 29783
[  192.784566] Buffer I/O error on device vda1, logical block 41[  192.809984] blk_update_request: I/O error, dev vda, sector 465672 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
4720[  192.813043] Buffer I/O error on device vda1, logical block 29793
[  192.814363] blk_update_request: I/O error, dev vda, sector 465800 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
[  192.816262] Buffer I/O error on device vda1, logical block 29809

[  192.817365] blk_update_request: I/O error, dev vda, sector 465856 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
[  192.819365] Buffer I/O error on device vda1, logical block 29816
[  192.786015] blk_update_request: I/O error, dev vda, sector 464840 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
[  192.788175] EXT4-fs warning (device vda1): ext4_end_bio:311: I/O error 10 writing to inode 72950 (offset 0 size 0 starting block 58106)
[  192.788176] Buffer I/O error on device vda1, logical block 29689
[  192.789402] blk_update_request: I/O error, dev vda, sector 465048 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
[  192.791434] EXT4-fs warning (device vda1): ext4_end_bio:311: I/O error 10 writing to inode 72950 (offset 0 size 0 starting block 58132)
[  192.791435] Buffer I/O error on device vda1, logical block 29715
[  192.792628] blk_update_request: I/O error, dev vda, sector 465280 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
[  192.798882] EXT4-fs warning (device vda1): ext4_end_bio:311: I/O error 10 writing to inode 72950 (offset 0 size 0 starting block 58161)
[  192.798883] Buffer I/O error on device vda1, logical block 29744
[  192.800088] blk_update_request: I/O error, dev vda, sector 465440 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
[  192.802139] EXT4-fs warning (device vda1): ext4_end_bio:311: I/O error 10 writing to inode 72950 (offset 0 size 0 starting block 58181)
[  192.802141] Buffer I/O error on device vda1, logical block 29764
[  192.803346] blk_update_request: I/O error, dev vda, sector 465528 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
[  192.805393] EXT4-fs warning (device vda1): ext4_end_bio:311: I/O error 10 writing to inode 72950 (offset 0 size 0 starting block 58192)
[  192.805395] Buffer I/O error on device vda1, logical block 29775
[  192.806760] blk_update_request: I/O error, dev vda, sector 465592 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
[  192.808775] EXT4-fs warning (device vda1): ext4_end_bio:311: I/O error 10 writing to inode 72950 (offset 0 size 0 starting block 58200)
[  192.808776] Buffer I/O error on device vda1, logical block 29783
[  192.809984] blk_update_request: I/O error, dev vda, sector 465672 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
[  192.813041] EXT4-fs warning (device vda1): ext4_end_bio:311: I/O error 10 writing to inode 72950 (offset 0 size 0 starting block 58210)

At the point, the guest is unusable.

Build Flags

cargo build

Guest OS version details:

Ubuntu Focal
$ uname -r
5.4.0-172-generic

Host OS version details:

Fedora 39 
$ uname -r
6.8.0-rc1+

Logs

Output of cloud-hypervisor -v from either standard error or via --log-file:

cloud-hypervisor: 648.738146ms: <vmm> WARN:hypervisor/src/kvm/mod.rs:2122 -- Detected faulty MSR 0x4b564d04 while setting MSRs
cloud-hypervisor: 649.853501ms: <vmm> WARN:hypervisor/src/kvm/mod.rs:2122 -- Detected faulty MSR 0x4b564d04 while setting MSRs

The text was updated successfully, but these errors were encountered:

liuw · 2024-03-06T00:27:07Z

What happens if you just hot-add the same disk to the same guest without the steps to save/restore it?

peng6662001 · 2024-03-14T09:13:03Z

I reproduced this error on Ampere Altra, where /dev/vdc comes up if the same disk is added to the same client without a save/restore step.

liuw · 2024-03-14T16:57:37Z

I reproduced this error on Ampere Altra, where /dev/vdc comes up if the same disk is added to the same client without a save/restore step.

I am not sure if this is the same thing though. If you hot-add the same image twice I can see why there will be inconsistency.

I think Praveen created a new (different) image to be added as vdb.

liuw · 2024-03-15T23:41:55Z

I cannot reproduce this. I'm using the same focal image. The guest kernel version is the same. After the new disk is plugged, it doesn't show up in the guest.

peng6662001 · 2024-03-19T08:50:45Z

I reproduced this error on Ampere Altra, where /dev/vdc comes up if the same disk is added to the same client without a save/restore step.

I am not sure if this is the same thing though. If you hot-add the same image twice I can see why there will be inconsistency.

I think Praveen created a new (different) image to be added as vdb.

/dev/vdb is for cloud-init disk on my vm,/dev/vdc is for test.img

My scripts to reproduce the bug:

1_start_vm.sh
#!/bin/bash -x
rm -rf /tmp/ch-socket1

# Start the VM
cloud-hypervisor --api-socket /tmp/ch-socket1 --firmware /root/workloads/CLOUDHV_EFI.fd \
    --disk path=/home/dom/images/ubuntu22.04.raw path=/home/dom/images/cloudinit \
    --cpus boot=2 --memory size=1024M --net tap=,mac=,ip=,mask= --seccomp log

2_remote_vm.sh
#!/bin/bash -x                                                                                                                                                                     rm -rf ${PWD}/ch-snapshot
mkdir ${PWD}/ch-snapshot
ch-remote --api-socket=/tmp/ch-socket1 pause
ch-remote --api-socket=/tmp/ch-socket1 snapshot file://${PWD}/ch-snapshot

3_restore_vm.sh
#!/bin/bash -x                                                                                                                                                                     rm -rf /tmp/ch-socket2
cloud-hypervisor --api-socket /tmp/ch-socket2 --restore source_url=file://${PWD}/ch-snapshot

4_restore_new_vm.sh
#!/bin/bash -x                                                                                                                                                                     ch-remote --api-socket=/tmp/ch-socket2 resume

#!/bin/bash -x
dd if=/dev/zero of=/tmp/test.img bs=1M count=10
ch-remote --api-socket=/tmp/ch-socket2 add-disk path=/tmp/test.img

praveen-pk · 2024-03-19T14:37:55Z

What happens if you just hot-add the same disk to the same guest without the steps to save/restore it?

As this case is just a regular hot-add, that works fine without any errors.

liuw · 2024-03-19T21:02:47Z

So this only happens with a recent kernel. My original kernel is 5.4.0-43-generic. That kernel didn't even hot plug the disk correctly. The recent 5.4.0-173-generic kernel can trigger this issue.

liuw · 2024-03-19T21:39:02Z

Heh, after the disk is hot added, old device threads quit. :-/

Before and after

  Id   Target Id                                             Frame
* 1    Thread 0x7f2b15e03800 (LWP 1062563) "cloud-hyperviso" __futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=1062564, futex_word=0x7f2b15e02990) at ./nptl/futex-internal.c:57
  2    Thread 0x7f2b15e026c0 (LWP 1062564) "vmm"             0x00007f2b15f0efc6 in epoll_wait (epfd=16, events=0x7f2b100041c0, maxevents=100, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  3    Thread 0x7f2b15c016c0 (LWP 1062565) "http-server"     0x00007f2b15f0efc6 in epoll_wait (epfd=12, events=0x7f2b15bffb60, maxevents=12, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  4    Thread 0x7f2b15a006c0 (LWP 1062566) "vmm_signal_hand" __libc_recv (flags=<optimized out>, len=1, buf=0x7f2b159ffb28, fd=19) at ../sysdeps/unix/sysv/linux/recv.c:28
  5    Thread 0x7f2b157ff6c0 (LWP 1062569) "serial-manager"  0x00007f2b15f0efc6 in epoll_wait (epfd=61, events=0x7f2b157fecb4, maxevents=3, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  6    Thread 0x7f2b155f76c0 (LWP 1062570) "_disk0_q0"       syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  7    Thread 0x7f2b153f36c0 (LWP 1062571) "_net1_ctrl"      syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  8    Thread 0x7f2b151f26c0 (LWP 1062572) "_net1_qp0"       syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  9    Thread 0x7f2b14fee6c0 (LWP 1062574) "__rng"           syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  10   Thread 0x7f2b14de76c0 (LWP 1062575) "vcpu0"           syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  11   Thread 0x7f2b14be36c0 (LWP 1062576) "vcpu1"           syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38

  Id   Target Id                                             Frame
  1    Thread 0x7f2b15e03800 (LWP 1062563) "cloud-hyperviso" __futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=1062564, futex_word=0x7f2b15e02990) at ./nptl/futex-internal.c:57
  2    Thread 0x7f2b15e026c0 (LWP 1062564) "vmm"             0x00007f2b15f0efc6 in epoll_wait (epfd=16, events=0x7f2b100041c0, maxevents=100, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  3    Thread 0x7f2b15c016c0 (LWP 1062565) "http-server"     0x00007f2b15f0efc6 in epoll_wait (epfd=12, events=0x7f2b15bffb60, maxevents=12, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
* 4    Thread 0x7f2b15a006c0 (LWP 1062566) "vmm_signal_hand" __libc_recv (flags=<optimized out>, len=1, buf=0x7f2b159ffb28, fd=19) at ../sysdeps/unix/sysv/linux/recv.c:28
  5    Thread 0x7f2b157ff6c0 (LWP 1062569) "serial-manager"  0x00007f2b15f0efc6 in epoll_wait (epfd=61, events=0x7f2b157fecb4, maxevents=3, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  10   Thread 0x7f2b14de76c0 (LWP 1062575) "vcpu0"           __GI___ioctl (fd=28, request=44672) at ../sysdeps/unix/sysv/linux/ioctl.c:36
  11   Thread 0x7f2b14be36c0 (LWP 1062576) "vcpu1"           __GI___ioctl (fd=29, request=44672) at ../sysdeps/unix/sysv/linux/ioctl.c:36
  12   Thread 0x7f2b149db6c0 (LWP 1062764) "_disk2_q0"       0x00007f2b15f0efc6 in epoll_wait (epfd=141, events=0x7f2aac000ca0, maxevents=100, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30

Apparently they recevied KILL_EVENTs.

cloud-hypervisor: 98.701300s: <_net1_qp0> INFO:virtio-devices/src/epoll_helper.rs:216 -- KILL_EVENT received, stopping epoll loop
cloud-hypervisor: 98.701304s: <_net1_ctrl> INFO:virtio-devices/src/epoll_helper.rs:216 -- KILL_EVENT received, stopping epoll loop

CC @rbradford @likebreath

likebreath · 2024-03-19T22:14:25Z

@liuw That's very strange. Is the guest still functional?

AFAIK, the KILL_EVENT is mostly used from the drop() function of the virtio devices, e.g. when they are out of scope. This basically means the DeviceManager is getting out-of-scope.

liuw · 2024-03-20T02:34:16Z

@liuw That's very strange. Is the guest still functional?

AFAIK, the KILL_EVENT is mostly used from the drop() function of the virtio devices, e.g. when they are out of scope. This basically means the DeviceManager is getting out-of-scope.

No. The guest is basically dead since its OS disk is gone.

Device manager is still there. It is just that all device thread other than the last hot added disk thread are killed.

rbradford · 2024-03-20T09:52:51Z

So this only happens with a recent kernel. My original kernel is 5.4.0-43-generic. That kernel didn't even hot plug the disk correctly. The recent 5.4.0-173-generic kernel can trigger this issue.

This is host kernel or guest kernel?

liuw · 2024-03-20T16:28:33Z

So this only happens with a recent kernel. My original kernel is 5.4.0-43-generic. That kernel didn't even hot plug the disk correctly. The recent 5.4.0-173-generic kernel can trigger this issue.

This is host kernel or guest kernel?

Guest kernel. It is the kernel in the the latest focal image as of yesterday.

liuw · 2024-03-20T17:13:01Z

I experimented a bit more last night.

Pausing and resuming a guest does not cause this issue.
Adding a virtio-nic will cause the same issue to surface.
Serial manager thread stays alive at all times (not using virtio infra). Device manager is never dropped.
If the new device is added before the guest is resume, other virtio threads will hang around, but the issue is still there.
If the new device is added after the guest is resume, other virtio threads receive kill events.

The insight right now is the bug is in the common code for virtio devices.

liuw · 2024-03-20T17:18:47Z

@liuw That's very strange. Is the guest still functional?

AFAIK, the KILL_EVENT is mostly used from the drop() function of the virtio devices, e.g. when they are out of scope. This basically means the DeviceManager is getting out-of-scope.

I put in a debug! in Block's drop function. It is not called.

rbradford · 2024-03-20T18:25:22Z

I experimented a bit more last night.

1. Pausing and resuming a guest does not cause this issue.

2. Adding a virtio-nic will cause the same issue to surface.

3. Serial manager thread stays alive at all times (not using virtio infra). Device manager is never dropped.

4. If the new device is added before the guest is resume, other virtio threads will hang around, but the issue is still there.

5. If the new device is added after the guest is resume, other virtio threads receive kill events.

The insight right now is the bug is in the common code for virtio devices.

Great observations. Would be interesting to see if it is newly introduced kernel behaviour - does it happen with direct kernel booting and our reference kernel?

liuw · 2024-03-20T22:33:43Z

I experimented a bit more last night.
1. Pausing and resuming a guest does not cause this issue.

2. Adding a virtio-nic will cause the same issue to surface.

3. Serial manager thread stays alive at all times (not using virtio infra). Device manager is never dropped.

4. If the new device is added before the guest is resume, other virtio threads will hang around, but the issue is still there.

5. If the new device is added after the guest is resume, other virtio threads receive kill events.
The insight right now is the bug is in the common code for virtio devices.
Great observations. Would be interesting to see if it is newly introduced kernel behaviour - does it happen with direct kernel booting and our reference kernel?

Direct kernel boot with the reference 6.2 kernel has the same issue.

rbradford · 2024-03-21T08:37:37Z

I experimented a bit more last night.
1. Pausing and resuming a guest does not cause this issue.

2. Adding a virtio-nic will cause the same issue to surface.

3. Serial manager thread stays alive at all times (not using virtio infra). Device manager is never dropped.

4. If the new device is added before the guest is resume, other virtio threads will hang around, but the issue is still there.

5. If the new device is added after the guest is resume, other virtio threads receive kill events.
The insight right now is the bug is in the common code for virtio devices.
Great observations. Would be interesting to see if it is newly introduced kernel behaviour - does it happen with direct kernel booting and our reference kernel?
Direct kernel boot with the reference 6.2 kernel has the same issue.

Good to know. And I just checked - adding a hotplug device is not something we do after restoring in out test.

liuw · 2024-03-21T15:36:41Z

A quick test shows that virtio devices are reset after the new device is plugged.

I put a break point at virtio-devices/src/transport/pci_device.rs:VirtioCommon::reset. If the guest is resumed before adding the device, then the reset function will be called. If order is reversed, the reset function will not be called.

In both cases, the guest is hosed.

rbradford · 2024-03-23T21:35:27Z

The reset is misdirection (that's just the kernel trying to recover). I bisected to this commit as the first bad:

eae804389048d0b9e4e33909c98d8787f0b9cb62 is the first bad commit
commit eae804389048d0b9e4e33909c98d8787f0b9cb62
Author: Sebastien Boeuf <sebastien.boeuf@intel.com>
Date:   Fri Oct 21 17:57:20 2022 +0200

    pci, virtio-devices: Move VirtioPciDevice to the new restore design
    
    The code for restoring a VirtioPciDevice has been updated, including the
    dependencies VirtioPciCommonConfig, MsixConfig and PciConfiguration.
    
    It's important to note that both PciConfiguration and MsixConfig still
    have restore() implementations because Vfio and VfioUser devices still
    rely on the old way for restore.
    
    Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>

 pci/src/bus.rs                                    |   1 +
 pci/src/configuration.rs                          | 106 +++++---
 pci/src/lib.rs                                    |   3 +-
 pci/src/msix.rs                                   |  66 ++++-
 pci/src/vfio.rs                                   |   5 +-
 pci/src/vfio_user.rs                              |   1 +
 virtio-devices/src/transport/mod.rs               |   2 +-
 virtio-devices/src/transport/pci_common_config.rs |  35 +--
 virtio-devices/src/transport/pci_device.rs        | 309 ++++++++++++----------
 vmm/src/device_manager.rs                         |   1 +
 10 files changed, 322 insertions(+), 207 deletions(-)

…shots When restoring a VM, the VirtioPciCfgCapInfo struct is not properly initialized. All fields are 0, including the offset where the capabibility should start. Hence, when you read a PCI configuration register in the range [0..length(VirtioPciCfgCap)] you get the value 0 instead of the actual register contents. Linux rescans the whole PCI bus when adding a new device. It reads the values vendor_id and device_id for every device. Because these are stored at offset 0 in pci configuration space, their value is 0 for existing devices. As such, Linux considers that the devices have been unplugged and it removes them from the system. Fixes: cloud-hypervisor#6265 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>

When restoring a VM, the VirtioPciCfgCapInfo struct is not properly initialized. All fields are 0, including the offset where the capabibility should start. Hence, when you read a PCI configuration register in the range [0..length(VirtioPciCfgCap)] you get the value 0 instead of the actual register contents. Linux rescans the whole PCI bus when adding a new device. It reads the values vendor_id and device_id for every device. Because these are stored at offset 0 in pci configuration space, their value is 0 for existing devices. As such, Linux considers that the devices have been unplugged and it removes them from the system. Fixes: cloud-hypervisor#6265 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>

alex-matei · 2024-03-24T07:44:17Z

I'm trying to add support for snapshots in kata-containers and I stumbled upon the same issue. The problem is with the PCI configuration capability, its state isn't saved. This means that PciDevice::read_config_register returns incorrect values after restore for existing devices.

When restoring a VM, the VirtioPciCfgCapInfo struct is not properly initialized. All fields are 0, including the offset where the capabibility should start. Hence, when you read a PCI configuration register in the range [0..length(VirtioPciCfgCap)] you get the value 0 instead of the actual register contents. Linux rescans the whole PCI bus when adding a new device. It reads the values vendor_id and device_id for every device. Because these are stored at offset 0 in pci configuration space, their value is 0 for existing devices. As such, Linux considers that the devices have been unplugged and it removes them from the system. Fixes: cloud-hypervisor#6265 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>

When restoring a VM, the VirtioPciCfgCapInfo struct is not properly initialized. All fields are 0, including the offset where the capabibility starts. Hence, when you read a PCI configuration register in the range [0..length(VirtioPciCfgCap)] you get the value 0 instead of the actual register contents. Linux rescans the whole PCI bus when adding a new device. It reads the values vendor_id and device_id for every device. Because these are stored at offset 0 in pci configuration space, their value is 0 for existing devices. As such, Linux considers that the devices have been unplugged and it removes them from the system. Fixes: cloud-hypervisor#6265 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>

When restoring a VM, the VirtioPciCfgCapInfo struct is not properly initialized. All fields are 0, including the offset where the capabibility starts. Hence, when you read a PCI configuration register in the range [0..length(VirtioPciCfgCap)] you get the value 0 instead of the actual register contents. Linux rescans the whole PCI bus when adding a new device. It reads the values vendor_id and device_id for every device. Because these are stored at offset 0 in pci configuration space, their value is 0 for existing devices. As such, Linux considers that the devices have been unplugged and it removes them from the system. Fixes: #6265 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>

praveen-pk mentioned this issue Mar 4, 2024

[WIP] Enable landlock in cloud-hypervisor #6214

Draft

liuw added the bug Something isn't working label Mar 20, 2024

alex-matei mentioned this issue Mar 24, 2024

virtio-devices: save pci configuration capability state in snapshot #6326

Merged

rbradford mentioned this issue Mar 24, 2024

Add test for hotplug after migration / snapshot-restore #6327

Open

rbradford closed this as completed in #6326 Mar 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disk hot-add after vm.restore causes I/O errors in guest #6265

Disk hot-add after vm.restore causes I/O errors in guest #6265

praveen-pk commented Mar 4, 2024

liuw commented Mar 6, 2024

peng6662001 commented Mar 14, 2024

liuw commented Mar 14, 2024

liuw commented Mar 15, 2024

peng6662001 commented Mar 19, 2024

praveen-pk commented Mar 19, 2024

liuw commented Mar 19, 2024

liuw commented Mar 19, 2024 •

edited

likebreath commented Mar 19, 2024

liuw commented Mar 20, 2024 •

edited

rbradford commented Mar 20, 2024

liuw commented Mar 20, 2024

liuw commented Mar 20, 2024

liuw commented Mar 20, 2024

rbradford commented Mar 20, 2024

liuw commented Mar 20, 2024

rbradford commented Mar 21, 2024

liuw commented Mar 21, 2024 •

edited

rbradford commented Mar 23, 2024

alex-matei commented Mar 24, 2024

Disk hot-add after vm.restore causes I/O errors in guest #6265

Disk hot-add after vm.restore causes I/O errors in guest #6265

Comments

praveen-pk commented Mar 4, 2024

liuw commented Mar 6, 2024

peng6662001 commented Mar 14, 2024

liuw commented Mar 14, 2024

liuw commented Mar 15, 2024

peng6662001 commented Mar 19, 2024

praveen-pk commented Mar 19, 2024

liuw commented Mar 19, 2024

liuw commented Mar 19, 2024 • edited

likebreath commented Mar 19, 2024

liuw commented Mar 20, 2024 • edited

rbradford commented Mar 20, 2024

liuw commented Mar 20, 2024

liuw commented Mar 20, 2024

liuw commented Mar 20, 2024

rbradford commented Mar 20, 2024

liuw commented Mar 20, 2024

rbradford commented Mar 21, 2024

liuw commented Mar 21, 2024 • edited

rbradford commented Mar 23, 2024

alex-matei commented Mar 24, 2024

liuw commented Mar 19, 2024 •

edited

liuw commented Mar 20, 2024 •

edited

liuw commented Mar 21, 2024 •

edited