main, vmm: keep VMM alive on guest shutdown#104
main, vmm: keep VMM alive on guest shutdown#104Coffeeri merged 4 commits intocyberus-technology:gardenlinuxfrom
Conversation
049d759 to
aa6851d
Compare
aa6851d to
1822d76
Compare
|
Very nice solution! If it is possible, you may explain the reasoning around the whole change in one of the commit messages? It is not only that this races for live migrations. There might be any Libvirt code going on and at every point in time, the CHV process could vanish due to a guest induced shutdown. And please schedule a libvirt-tests pipeline using the new flag :) |
phip1611
left a comment
There was a problem hiding this comment.
Generally LGTM! I think you should also upstream this soon to see what upstream things
Introduce a dedicated guest_exit_evt and a matching epoll dispatch path for guest-triggered shutdowns. This series is needed because libvirt may still need the Cloud Hypervisor process to stay alive after the guest has shut down. Today a guest-induced shutdown can make the VMM disappear immediately, which means libvirt can lose the monitor unexpectedly. It may result in a stale state from libvirts point of view. This must only apply to guest-triggered shutdowns. Fatal error paths and other internal exit paths must keep using the existing VMM exit handling. For now GuestExit still calls vmm_shutdown(), so this commit only adds the separate plumbing and keeps the current behavior unchanged. On-behalf-of: SAP leander.kohler@sap.com Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
Plumb ACPI S5 shutdown through guest_exit_evt instead of the shared exit path. This separates guest-triggered shutdown from fatal VMM exit handling. That distinction is required because management layers such as libvirt must not lose the Cloud Hypervisor process just because the guest asked to shut down. Only the guest shutdown path is moved here. Reboot handling stays on reset_evt and non-guest exit paths are left unchanged. On-behalf-of: SAP leander.kohler@sap.com Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
Add a CLI-only --no-shutdown flag that keeps the VMM process alive after a guest-triggered shutdown. This matches the libvirt use case where the guest may shut down while management code still needs the Cloud Hypervisor process to exist. Live migration is one example, but the problem is broader: at any point in time a guest-induced shutdown can otherwise make the VMM disappear from under libvirt. The flag only affects the GuestExit path. Fatal exits and other existing VMM shutdown paths remain unchanged. On-behalf-of: SAP leander.kohler@sap.com Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
1822d76 to
dc62781
Compare
Move the migration workaround from the shared Exit path to GuestExit and rename the postponed shutdown event to `VmShutdown`. With --no-shutdown, guest-triggered shutdown must keep following the guest exit path even when it is delayed until after migration completion. This preserves the distinction between guest shutdown and real VMM exit conditions. The existing fatal exit path stays unchanged. On-behalf-of: SAP leander.kohler@sap.com Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
dc62781 to
8237b10
Compare
phip1611
left a comment
There was a problem hiding this comment.
LGTM! I think you missed however the cmos and i8042 reboot paths?
We only consider shutdown paths, the reset paths don't kill the VMM. |
Ah thanks! |
6416bbf
This fixes https://github.com/cobaltcore-dev/cobaltcore/issues/448.
A previous change in PR #87 made guest-induced shutdowns during live migration get postponed until the migration finishes.
That introduced a new race with libvirt. The destination VM can already be shutdown while libvirt is still in
finish3(). In that case, libvirt can no longer access the CHV monitor because the VMM has already exited. Moreover, weird state-sideaffects can generally occur, when libvirt is not the one killing the VMM itself after a shutdown.To fix this, add a
--no-shutdownflag, similar to QEMU. When Cloud Hypervisor is started with this flag, a guest-induced shutdown only shuts down the VM and keeps the VMM process alive.Steps to un-draft