vmm: fix CpuManager deadlock between pause handling and MMIO#136
Merged
phip1611 merged 3 commits intoApr 10, 2026
Merged
Conversation
phip1611
commented
Apr 1, 2026
1c9b7eb to
9d3fc06
Compare
9d3fc06 to
c2b9ffc
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
c2b9ffc to
153f9d1
Compare
199578e to
065e7e8
Compare
Member
Author
|
Ready for another review round! |
amphi
approved these changes
Apr 9, 2026
065e7e8 to
4658b49
Compare
Member
Author
I made significant changes to the implementation since your review - please recheck |
4658b49 to
7c2c1ce
Compare
phip1611
commented
Apr 9, 2026
1bbec5e to
bb9a084
Compare
bb9a084 to
688cf4e
Compare
arctic-alpaca
approved these changes
Apr 10, 2026
This is a prerequisite for the next commit where we need shared access. On-behalf-of: SAP philipp.schuster@sap.com Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
Extract AcpiCpuHotplugController from CpuManager and move the BusDevice implementation to the new type. This separates VMM-internal vCPU management from the guest-visible ACPI CPU hotplug MMIO interface. Besides clarifying responsibilities and reducing technical debt, this fixes a rare deadlock involving pause handling and MMIO access. New responsibilities: - CpuManager manages VMM-internal vCPU lifecycle and coordination - AcpiCpuHotplugController implements the guest-visible ACPI CPU hotplug MMIO interface A vCPU thread may exit KVM_RUN to perform an MMIO access previously handled by CpuManager. If the VMM thread begins processing a `pause` event before that MMIO operation acquires access to CpuManager, CpuManager::pause() will block waiting for the vCPU thread to ACK the pause, while the vCPU thread is blocked waiting to complete the MMIO operation through the same CpuManager - which it can never lock - the VMM is deadlocked. This can occur during early boot or CPU hotplug when pause events race with MMIO accesses. The issue is rare and timing-dependent, but real. For reproducing: run `ch-remote pause|resume` in a loop while booting a Linux VM (via direct kernel boot). With the new design, these MMIO operations no longer depend on CpuManager, which removes the deadlock path entirely. On-behalf-of: SAP philipp.schuster@sap.com Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
This improves the documentation at various places that the next commits will touch anyway. On-behalf-of: SAP philipp.schuster@sap.com Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
688cf4e to
f067c39
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Extract AcpiCpuHotplugController from CpuManager and move the BusDevice implementation to the new type. This separates VMM-internal vCPU management from the guest-visible ACPI CPU hotplug MMIO interface.
Besides clarifying responsibilities and reducing technical debt, this fixes a rare deadlock involving pause handling and MMIO access.
New responsibilities:
Deadlock scenario
A vCPU thread may exit KVM_RUN to perform an MMIO access previously handled by CpuManager. If the VMM thread begins processing a
pauseevent before that MMIO operation acquires access to CpuManager, CpuManager::pause() will block waiting for the vCPU thread to ACK the pause, while the vCPU thread is blocked waiting to complete the MMIO operation through the same CpuManager - which it can never lock - the VMM is deadlocked.This can occur during early boot or CPU hotplug when pause events race with MMIO accesses. The issue is rare and timing-dependent, but real. For reproducing: run
ch-remote pause|resumein a loop while booting a Linux VM (via direct kernel boot).With the new design, these MMIO operations no longer depend on CpuManager, which removes the deadlock path entirely.
PS: We have the same problem with
DeviceManager. This is more complex, however, to fix.Hints for Reviewers
CI Pipeline: https://gitlab.cyberus-technology.de/cyberus/cloud/libvirt/-/merge_requests/165/pipelines