Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmm: cpu: Add pending removed vcpu check to avoid resize vcpu hang #5668

Merged
merged 1 commit into from
Aug 20, 2023

Conversation

up2wing
Copy link
Contributor

@up2wing up2wing commented Aug 10, 2023

Add pending removed vcpu check according to VcpuState.removing, which can avoid cloud hypervisor hangup during continual vcpu resize.

Fix #5419

@up2wing up2wing requested a review from a team as a code owner August 10, 2023 06:36
Copy link
Member

@rbradford rbradford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for diving in!

vmm/src/cpu.rs Outdated
@@ -518,12 +521,6 @@ impl BusDevice for CpuManager {
{
state.inserting = false;
}
// Ditto for removal
if (data[0] & (1 << CPU_REMOVING_FLAG) == 1 << CPU_REMOVING_FLAG)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hhm, I think we might still want to keep the ACPI pending removal flag - my only worry is if the kernel gets confused that is has acknowledged the removal and then the removal bit doesn't get cleared when it checks again. Maybe we need to track another bit of state :-(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Let me think a better way out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to keep the existing removing bool. And add a Arc<AtomicBool>> called pending_removal. Set both on the request to remove, let the guest clear the removing as it sees fit but use the atomic one to check if a subsequent removal should be rejected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for ur suggestions, that sounds make sense and I will update the patch.

Add pending removed vcpu check according to VcpuState.removing, which
can avoid cloud hypervisor hangup during continual vcpu resize.

Fix cloud-hypervisor#5419

Signed-off-by: Yi Wang <foxywang@tencent.com>
@rbradford rbradford merged commit d46dd4b into cloud-hypervisor:main Aug 20, 2023
21 checks passed
@up2wing up2wing deleted the vcpu-resize branch August 22, 2023 12:36
@likebreath likebreath added the bug-fix Bug fix to include in release notes label Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-fix Bug fix to include in release notes
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

VM hangs when stressing CPU hotplug
3 participants