Skip to content

main, vmm: keep VMM alive on guest shutdown#104

Merged
Coffeeri merged 4 commits intocyberus-technology:gardenlinuxfrom
Coffeeri:add-no-shutdown-cmd
Mar 10, 2026
Merged

main, vmm: keep VMM alive on guest shutdown#104
Coffeeri merged 4 commits intocyberus-technology:gardenlinuxfrom
Coffeeri:add-no-shutdown-cmd

Conversation

@Coffeeri
Copy link
Copy Markdown

@Coffeeri Coffeeri commented Mar 9, 2026

This fixes https://github.com/cobaltcore-dev/cobaltcore/issues/448.

A previous change in PR #87 made guest-induced shutdowns during live migration get postponed until the migration finishes.

That introduced a new race with libvirt. The destination VM can already be shutdown while libvirt is still in finish3(). In that case, libvirt can no longer access the CHV monitor because the VMM has already exited. Moreover, weird state-sideaffects can generally occur, when libvirt is not the one killing the VMM itself after a shutdown.

To fix this, add a --no-shutdown flag, similar to QEMU. When Cloud Hypervisor is started with this flag, a guest-induced shutdown only shuts down the VM and keeps the VMM process alive.

Steps to un-draft

@Coffeeri Coffeeri requested review from phip1611 and tpressure March 9, 2026 16:10
@Coffeeri Coffeeri self-assigned this Mar 9, 2026
@Coffeeri Coffeeri force-pushed the add-no-shutdown-cmd branch from 049d759 to aa6851d Compare March 9, 2026 16:11
@Coffeeri Coffeeri requested a review from hertrste March 9, 2026 16:15
@Coffeeri Coffeeri force-pushed the add-no-shutdown-cmd branch from aa6851d to 1822d76 Compare March 9, 2026 16:19
@hertrste
Copy link
Copy Markdown
Collaborator

hertrste commented Mar 9, 2026

Very nice solution!

If it is possible, you may explain the reasoning around the whole change in one of the commit messages?

It is not only that this races for live migrations. There might be any Libvirt code going on and at every point in time, the CHV process could vanish due to a guest induced shutdown.

And please schedule a libvirt-tests pipeline using the new flag :)

Copy link
Copy Markdown
Member

@phip1611 phip1611 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM! I think you should also upstream this soon to see what upstream things

Introduce a dedicated guest_exit_evt and a matching epoll dispatch
path for guest-triggered shutdowns.

This series is needed because libvirt may still need the Cloud
Hypervisor process to stay alive after the guest has shut down. Today a
guest-induced shutdown can make the VMM disappear immediately, which
means libvirt can lose the monitor unexpectedly.
It may result in a stale state from libvirts point of view.

This must only apply to guest-triggered shutdowns. Fatal error paths
and other internal exit paths must keep using the existing VMM exit
handling.

For now GuestExit still calls vmm_shutdown(), so this commit only adds
the separate plumbing and keeps the current behavior unchanged.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
Plumb ACPI S5 shutdown through guest_exit_evt instead of the shared
exit path.

This separates guest-triggered shutdown from fatal VMM exit handling.
That distinction is required because management layers such as libvirt
must not lose the Cloud Hypervisor process just because the guest asked
to shut down.

Only the guest shutdown path is moved here. Reboot handling stays on
reset_evt and non-guest exit paths are left unchanged.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
Add a CLI-only --no-shutdown flag that keeps the VMM process alive
after a guest-triggered shutdown.

This matches the libvirt use case where the guest may shut down while
management code still needs the Cloud Hypervisor process to exist.
Live migration is one example, but the problem is broader: at any point
in time a guest-induced shutdown can otherwise make the VMM disappear
from under libvirt.

The flag only affects the GuestExit path. Fatal exits and other
existing VMM shutdown paths remain unchanged.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
@Coffeeri Coffeeri force-pushed the add-no-shutdown-cmd branch from 1822d76 to dc62781 Compare March 10, 2026 08:56
Move the migration workaround from the shared Exit path to GuestExit
and rename the postponed shutdown event to `VmShutdown`.

With --no-shutdown, guest-triggered shutdown must keep following the
guest exit path even when it is delayed until after migration
completion. This preserves the distinction between guest shutdown and
real VMM exit conditions.

The existing fatal exit path stays unchanged.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
@Coffeeri Coffeeri force-pushed the add-no-shutdown-cmd branch from dc62781 to 8237b10 Compare March 10, 2026 09:00
@Coffeeri Coffeeri marked this pull request as ready for review March 10, 2026 09:27
Copy link
Copy Markdown
Member

@phip1611 phip1611 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I think you missed however the cmos and i8042 reboot paths?

@Coffeeri
Copy link
Copy Markdown
Author

Coffeeri commented Mar 10, 2026

LGTM! I think you missed however the cmos and i8042 reboot paths?

We only consider shutdown paths, the reset paths don't kill the VMM.

@phip1611
Copy link
Copy Markdown
Member

LGTM! I think you missed however the cmos and i8042 reboot paths?

We only consider shutdown paths, the reset paths don't kill the VMM.

Ah thanks!

@Coffeeri Coffeeri added this pull request to the merge queue Mar 10, 2026
@Coffeeri Coffeeri removed this pull request from the merge queue due to a manual request Mar 10, 2026
@Coffeeri Coffeeri added this pull request to the merge queue Mar 10, 2026
@Coffeeri Coffeeri removed this pull request from the merge queue due to a manual request Mar 10, 2026
@Coffeeri Coffeeri added this pull request to the merge queue Mar 10, 2026
Merged via the queue into cyberus-technology:gardenlinux with commit 6416bbf Mar 10, 2026
11 checks passed
@Coffeeri Coffeeri deleted the add-no-shutdown-cmd branch March 10, 2026 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants