Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmm: Avoid deadlock from waiting on paused device worker threads #6293

Merged

Conversation

likebreath
Copy link
Member

A deadlock can happen from the destination VM of live upgrade or migration due to waiting on paused device worker threads. For example, when a serialization error happens after the DeviceManager struct is restored (where all virtio device worker threads are spawned but in paused/parked state), a deadlock will happen from
DeviceManager::drop(), as it blocks for waiting worker threads to join.

This patch ensures that we wake up all device (mostly virtio) worker threads before we block for them to join.

@likebreath likebreath requested a review from a team as a code owner March 13, 2024 22:26
@@ -4977,6 +4977,10 @@ impl BusDevice for DeviceManager {

impl Drop for DeviceManager {
fn drop(&mut self) {
// Wake up the DeviceManager threads (mainly virtio device workers),
// to avoid deadlock on waiting for paused/parked worker threads.
self.resume().ok();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - but would it not be better to log the error here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Good catch.

A deadlock can happen from the destination VM of live upgrade or
migration due to waiting on paused device worker threads. For example,
when a serialization error happens after the `DeviceManager` struct is
restored (where all virtio device worker threads are spawned but in
paused/parked state), a deadlock will happen from
`DeviceManager::drop()`, as it blocks for waiting worker threads to
join.

This patch ensures that we wake up all device (mostly virtio) worker
threads before we block for them to join.

Signed-off-by: Bo Chen <chen.bo@intel.com>
@likebreath likebreath added this pull request to the merge queue Mar 14, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 14, 2024
@likebreath likebreath added this pull request to the merge queue Mar 14, 2024
Merged via the queue into cloud-hypervisor:main with commit 1363891 Mar 14, 2024
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🆕 New
Development

Successfully merging this pull request may close these issues.

None yet

2 participants