Skip to content

[Backport] virtio-devices: signal activated queue eventfds on resume#144

Merged
Coffeeri merged 5 commits into
cyberus-technology:gardenlinuxfrom
Coffeeri:backport-virtio-kickdown
Apr 14, 2026
Merged

[Backport] virtio-devices: signal activated queue eventfds on resume#144
Coffeeri merged 5 commits into
cyberus-technology:gardenlinuxfrom
Coffeeri:backport-virtio-kickdown

Conversation

@Coffeeri
Copy link
Copy Markdown

We revert #138 and backport the rework landed upstream cloud-hypervisor#8004.

Coffeeri and others added 5 commits April 14, 2026 15:37
This reverts commit fb29aa2.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
This reverts commit ff466c2.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
A restored virtqueue can already contain pending descriptors when the VM
resumes. Before this change, the worker thread was unparked and then
waited for a fresh queue eventfd signal. That is normally fine, but not
when the queue was already non-empty at snapshot time. The virtqueue
state lives in guest memory and is restored, but the original host-side
queue eventfd signal is not persistent snapshot state. If the guest
already notified the queue before the snapshot, it may not notify it
again after resume.

That can leave the worker idle while the guest is still waiting for the
pending request to complete. In one observed case, this stalled a
virtio-blk flush during early boot after snapshot/restore.

We mitigate this in the shared `VirtioCommon` resume path.
`VirtioCommon` retains cloned queue eventfds for activated virtqueues
and signals each of them once on resume after unparking the worker
threads.

Keep virtio-net on its existing special-case path: it resumes worker
threads without signaling queue eventfds so the `driver_awake`
workaround remains intact until the guest performs a real notify.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
This will wake up the guest and avoid a livelock situation by ensuring
that it will process any pending queues on its side.

Signed-off-by: Rob Bradford <rbradford@meta.com>
Now on the generic restore path the worker thread is notified on the
events and also the guest is notified via the interrupt. This avoids the
same "livelock" situation that required this "driver_awake" workaround
when restoring the net device.

Signed-off-by: Rob Bradford <rbradford@meta.com>
@Coffeeri Coffeeri self-assigned this Apr 14, 2026
@Coffeeri Coffeeri merged commit 06092ab into cyberus-technology:gardenlinux Apr 14, 2026
11 checks passed
@Coffeeri Coffeeri deleted the backport-virtio-kickdown branch April 14, 2026 14:06
@phip1611
Copy link
Copy Markdown
Member

fantastic!!

@arctic-alpaca
Copy link
Copy Markdown

arctic-alpaca commented Apr 14, 2026

@Coffeeri Could you bump chv in libvirt to include this in the nightly runs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants