-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
virtio-devices: save pci configuration capability state in snapshot #6326
Conversation
eccfaee
to
3a5e2c2
Compare
When restoring a VM, the VirtioPciCfgCapInfo struct is not properly initialized. All fields are 0, including the offset where the capabibility starts. Hence, when you read a PCI configuration register in the range [0..length(VirtioPciCfgCap)] you get the value 0 instead of the actual register contents. Linux rescans the whole PCI bus when adding a new device. It reads the values vendor_id and device_id for every device. Because these are stored at offset 0 in pci configuration space, their value is 0 for existing devices. As such, Linux considers that the devices have been unplugged and it removes them from the system. Fixes: cloud-hypervisor#6265 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
3a5e2c2
to
aaa6626
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fab - good work. I tested this with live migration which is how I bisected the first bad commit in #6265
c3f1c3e
I want to point out that this is a breaking change for live upgrade, and I don't think we can (easily) workaround it (like what we did for xsave_state, etc). Also, I can see this is an very important bug fix, given any migrated/restored VM won't be functional if there is a device hotplug without this fix. We will need to see how we want to backport it. |
Is the concern here the new CH will reject an old stream because the old stream does not have the PCI state field? In that case can we generate a state structure that's all zeros to preserve the buggy behaviour? |
The concern is that we won't be able to backport this fix without breaking the live-upgrade, particularly for point releases of LTS version, where we intended to ensure they are live upgradable. |
I think we may be able to fix the issue in the backport in a different way - to me it looks like that the PCI capability data for the virtio devices is static - afaict the only place the offset and capability data is set is here:
Perhaps we can just setup the capability and offset after the device migration. |
When restoring a VM, the VirtioPciCfgCapInfo struct is not properly initialized. All fields are 0, including the offset where the capabibility should start. Hence, when you read a PCI configuration register in the range [0..length(VirtioPciCfgCap)] you get the value 0 instead of the actual register contents.
Linux rescans the whole PCI bus when adding a new device. It reads the values vendor_id and device_id for every device. Because these are stored at offset 0 in pci configuration space, their value is 0 for existing devices. As such, Linux considers that the devices have been unplugged and it removes them from the system.
Fixes: #6265