-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disk hot-add after vm.restore causes I/O errors in guest #6265
Comments
What happens if you just hot-add the same disk to the same guest without the steps to save/restore it? |
I reproduced this error on Ampere Altra, where /dev/vdc comes up if the same disk is added to the same client without a save/restore step. |
I am not sure if this is the same thing though. If you hot-add the same image twice I can see why there will be inconsistency. I think Praveen created a new (different) image to be added as vdb. |
I cannot reproduce this. I'm using the same focal image. The guest kernel version is the same. After the new disk is plugged, it doesn't show up in the guest. |
/dev/vdb is for cloud-init disk on my vm,/dev/vdc is for test.img My scripts to reproduce the bug:
|
As this case is just a regular hot-add, that works fine without any errors. |
So this only happens with a recent kernel. My original kernel is 5.4.0-43-generic. That kernel didn't even hot plug the disk correctly. The recent 5.4.0-173-generic kernel can trigger this issue. |
Heh, after the disk is hot added, old device threads quit. :-/ Before and after
Apparently they recevied KILL_EVENTs.
|
@liuw That's very strange. Is the guest still functional? AFAIK, the |
No. The guest is basically dead since its OS disk is gone. Device manager is still there. It is just that all device thread other than the last hot added disk thread are killed. |
This is host kernel or guest kernel? |
Guest kernel. It is the kernel in the the latest focal image as of yesterday. |
I experimented a bit more last night.
The insight right now is the bug is in the common code for virtio devices. |
I put in a |
Great observations. Would be interesting to see if it is newly introduced kernel behaviour - does it happen with direct kernel booting and our reference kernel? |
Direct kernel boot with the reference 6.2 kernel has the same issue. |
Good to know. And I just checked - adding a hotplug device is not something we do after restoring in out test. |
A quick test shows that virtio devices are reset after the new device is plugged. I put a break point at virtio-devices/src/transport/pci_device.rs:VirtioCommon::reset. If the guest is resumed before adding the device, then the reset function will be called. If order is reversed, the reset function will not be called. In both cases, the guest is hosed. |
The reset is misdirection (that's just the kernel trying to recover). I bisected to this commit as the first bad:
|
…shots When restoring a VM, the VirtioPciCfgCapInfo struct is not properly initialized. All fields are 0, including the offset where the capabibility should start. Hence, when you read a PCI configuration register in the range [0..length(VirtioPciCfgCap)] you get the value 0 instead of the actual register contents. Linux rescans the whole PCI bus when adding a new device. It reads the values vendor_id and device_id for every device. Because these are stored at offset 0 in pci configuration space, their value is 0 for existing devices. As such, Linux considers that the devices have been unplugged and it removes them from the system. Fixes: cloud-hypervisor#6265 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
When restoring a VM, the VirtioPciCfgCapInfo struct is not properly initialized. All fields are 0, including the offset where the capabibility should start. Hence, when you read a PCI configuration register in the range [0..length(VirtioPciCfgCap)] you get the value 0 instead of the actual register contents. Linux rescans the whole PCI bus when adding a new device. It reads the values vendor_id and device_id for every device. Because these are stored at offset 0 in pci configuration space, their value is 0 for existing devices. As such, Linux considers that the devices have been unplugged and it removes them from the system. Fixes: cloud-hypervisor#6265 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
I'm trying to add support for snapshots in kata-containers and I stumbled upon the same issue. The problem is with the PCI configuration capability, its state isn't saved. This means that PciDevice::read_config_register returns incorrect values after restore for existing devices. |
When restoring a VM, the VirtioPciCfgCapInfo struct is not properly initialized. All fields are 0, including the offset where the capabibility should start. Hence, when you read a PCI configuration register in the range [0..length(VirtioPciCfgCap)] you get the value 0 instead of the actual register contents. Linux rescans the whole PCI bus when adding a new device. It reads the values vendor_id and device_id for every device. Because these are stored at offset 0 in pci configuration space, their value is 0 for existing devices. As such, Linux considers that the devices have been unplugged and it removes them from the system. Fixes: cloud-hypervisor#6265 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
When restoring a VM, the VirtioPciCfgCapInfo struct is not properly initialized. All fields are 0, including the offset where the capabibility starts. Hence, when you read a PCI configuration register in the range [0..length(VirtioPciCfgCap)] you get the value 0 instead of the actual register contents. Linux rescans the whole PCI bus when adding a new device. It reads the values vendor_id and device_id for every device. Because these are stored at offset 0 in pci configuration space, their value is 0 for existing devices. As such, Linux considers that the devices have been unplugged and it removes them from the system. Fixes: cloud-hypervisor#6265 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
When restoring a VM, the VirtioPciCfgCapInfo struct is not properly initialized. All fields are 0, including the offset where the capabibility starts. Hence, when you read a PCI configuration register in the range [0..length(VirtioPciCfgCap)] you get the value 0 instead of the actual register contents. Linux rescans the whole PCI bus when adding a new device. It reads the values vendor_id and device_id for every device. Because these are stored at offset 0 in pci configuration space, their value is 0 for existing devices. As such, Linux considers that the devices have been unplugged and it removes them from the system. Fixes: cloud-hypervisor#6265 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
When restoring a VM, the VirtioPciCfgCapInfo struct is not properly initialized. All fields are 0, including the offset where the capabibility starts. Hence, when you read a PCI configuration register in the range [0..length(VirtioPciCfgCap)] you get the value 0 instead of the actual register contents. Linux rescans the whole PCI bus when adding a new device. It reads the values vendor_id and device_id for every device. Because these are stored at offset 0 in pci configuration space, their value is 0 for existing devices. As such, Linux considers that the devices have been unplugged and it removes them from the system. Fixes: cloud-hypervisor#6265 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
When restoring a VM, the VirtioPciCfgCapInfo struct is not properly initialized. All fields are 0, including the offset where the capabibility starts. Hence, when you read a PCI configuration register in the range [0..length(VirtioPciCfgCap)] you get the value 0 instead of the actual register contents. Linux rescans the whole PCI bus when adding a new device. It reads the values vendor_id and device_id for every device. Because these are stored at offset 0 in pci configuration space, their value is 0 for existing devices. As such, Linux considers that the devices have been unplugged and it removes them from the system. Fixes: #6265 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Describe the bug
hot-add
ing a disk aftervm.restore
causes I/O errors in guest.To Reproduce
Steps to reproduce the behaviour:
cloud-hypervisor --version
cloud-hypervisor v38.0-73-gd245e624
Once a new disk it hot-added, I see the following failure messages in guest's dmesg:
At the point, the guest is unusable.
Build Flags
Guest OS version details:
Host OS version details:
Logs
Output of
cloud-hypervisor -v
from either standard error or via--log-file
:The text was updated successfully, but these errors were encountered: