Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: runtime/cgo: misuse of an invalid Handle on RequestStop and waiting for vm stopped state #124

Closed
balajiv113 opened this issue Feb 27, 2023 · 18 comments · Fixed by #125 or #128
Labels
bug Something isn't working

Comments

@balajiv113
Copy link
Contributor

Describe the bug
Issue is similar to the #119. But now it happens mainly during RequestStop and waiting for vm state to change to stop

To Reproduce
Steps to reproduce the behavior:

  1. limactl start template://experimental/vz
  2. limactl stop vz
  3. Notice panic happened

Expected behavior
No panic occurs during stop

Environment that you use to compile (please complete the following information):

  • Xcode version: [Xcode 14.1 Build version 14B47b]
  • macOS Version: [ProductName: macOS, ProductVersion: 13.2, BuildVersion: 22D49]
  • mac architecture: [x86_64]
  • Go Version: [go version go1.20.1 darwin/amd64]

Additional context
lima-vm/lima#1381

@balajiv113 balajiv113 added the bug Something isn't working label Feb 27, 2023
@Code-Hex
Copy link
Owner

Code-Hex commented Mar 8, 2023

I can handle it next month... 🙏
sorry... I'm very busy now

@Code-Hex
Copy link
Owner

Code-Hex commented Mar 8, 2023

btw, is this using Kernel which is supported RequestStop?

@pmalek
Copy link

pmalek commented Mar 15, 2023

I also observe this on Mac M1 Max (Ventura 13.2.1) arm64 when running colima v0.5.4

{"level":"error","msg":"dhcp: unhandled message type: RELEASE","time":"2023-03-15T11:14:23+01:00"}
panic: runtime/cgo: misuse of an invalid Handle

goroutine 17 [running, locked to thread]:
runtime/cgo.Handle.Value(...)
        /opt/homebrew/Cellar/go/1.20.1/libexec/src/runtime/cgo/handle.go:124
github.com/Code-Hex/vz/v3.changeStateOnObserver(0x0, 0x14000002680?)
        /Users/USER/Library/Caches/Homebrew/go_mod_cache/pkg/mod/github.com/!code-!hex/vz/v3@v3.0.4/virtualization.go:160 +0xe8

@ryancurrah
Copy link
Contributor

I am running Ventura 13.2.1 (22D68), CPU Apple M1 Max, from the Apple docs RequestStop seems to be supported since macOS 11.0+.

panic: runtime/cgo: misuse of an invalid Handle

goroutine 17 [running, locked to thread]:
runtime/cgo.Handle.Value(...)
        /opt/homebrew/Cellar/go/1.20.1/libexec/src/runtime/cgo/handle.go:124
github.com/Code-Hex/vz/v3.changeStateOnObserver(0x0, 0x14000002680?)
        /Users/brew/Library/Caches/Homebrew/go_mod_cache/pkg/mod/github.com/!code-!hex/vz/v3@v3.0.4/virtualization.go:160 +0xe8

@Code-Hex
Copy link
Owner

@pmalek @ryancurrah I'm sorry I didn't notice!
I will check it on Saturday!

@ryancurrah I know supported the RequestStop API from macOS 11.0+. I mentioned to Linux Kernel one.
#124 (comment)

@ryancurrah
Copy link
Contributor

@ryancurrah I know supported the RequestStop API from macOS 11.0+. I mentioned to Linux Kernel one.
#124 (comment)

Ah that makes more sense now. Thanks!

@Code-Hex
Copy link
Owner

I'm working this!

@Code-Hex
Copy link
Owner

I've probably fixed it. I don't have the code to reproduce it at hand, which is worrying, but I'll see how this goes!

If there are still problems, we expect the issue to be opened again with reproduced code.

@Code-Hex
Copy link
Owner

Btw, I just published the latest version for this

https://github.com/Code-Hex/vz/releases/tag/v3.0.5

@balajiv113
Copy link
Contributor Author

Thank you for the fix :)
I can confirm that this fixes panic that happened with in StateChangeHandler()

But even then a panic occurs if we call Status() directly after the RequestStop() is called. But unfortunately i still don't have vz specific reproducible code. So i will keep this issue closed.

PS: Lima issue is hopefully fixed now by not using Status() method lima-vm/lima#1494

@Code-Hex
Copy link
Owner

@balajiv113 Oh...

What does it mean when we know this sequence and you don't reproduce it? 🤔

But even then a panic occurs if we call Status() directly after the RequestStop() is called. But unfortunately i still don't have vz specific reproducible code

@balajiv113
Copy link
Contributor Author

balajiv113 commented Apr 19, 2023

@Code-Hex
This issue is still happening even with StateChangeHandler()
I was able to identify the root cause, with that was able to come-up with a reproduction test as well

Root cause
StateHandle object here https://github.com/Code-Hex/vz/blob/main/virtualization.go#L106 gets GC'ed in go side.

Due to which, in change observer https://github.com/Code-Hex/vz/blob/main/virtualization.go#L157 while converting unsafe.pointer to cgo.Handler, the uintptr value is different / invalid

Possible Solution
Holding unsafe.Pointer of stateHandler in vm here https://github.com/Code-Hex/vz/blob/main/virtualization.go#L121 makes sure that this doesn't gets GC'ed

Reproduction Test

func TestRunIssue(t *testing.T) {
	container := newVirtualizationMachine(t,
		func(vmc *VirtualMachineConfiguration) error {
			return setupConsoleConfig(vmc)
		},
	)
	defer container.Close()

	sshSession := container.NewSession(t)
	defer sshSession.Close()

	vm := container.VirtualMachine

	if got := vm.State(); VirtualMachineStateRunning != got {
		t.Fatalf("want state %v but got %v", VirtualMachineStateRunning, got)
	}
	if got := vm.CanPause(); !got {
		t.Fatal("want CanPause is true")
	}
	if err := vm.Pause(); err != nil {
		t.Fatal(err)
	}

	timeout := 5 * time.Second
	waitState(t, timeout, vm, VirtualMachineStatePausing)
	waitState(t, timeout, vm, VirtualMachineStatePaused)

	if got := vm.State(); VirtualMachineStatePaused != got {
		t.Fatalf("want state %v but got %v", VirtualMachineStatePaused, got)
	}
	if got := vm.CanResume(); !got {
		t.Fatal("want CanPause is true")
	}
	if err := vm.Resume(); err != nil {
		t.Fatal(err)
	}

	waitState(t, timeout, vm, VirtualMachineStateResuming)
	waitState(t, timeout, vm, VirtualMachineStateRunning)

	if got := vm.CanRequestStop(); !got {
		t.Fatal("want CanRequestStop is true")
	}

	ch := make(chan bool)
	runtime.SetFinalizer(&vm.stateHandle, func(self *cgo.Handle) {
		ch <- true
	})

	runtime.GC()
	select {
	case <-ch:
	case <-time.After(5 * time.Minute):
		t.Errorf("finalizer didn't run")
	}

	runtime.GC()
	sshSession.Run("poweroff")
	waitState(t, timeout, vm, VirtualMachineStateStopped)

	if got := vm.State(); VirtualMachineStateStopped != got {
		t.Fatalf("want state %v but got %v", VirtualMachineStateStopped, got)
	}
}

Test will take time as it has to wait till stateHandle gets finalised.

Do let me know if you need any more input. I hope we can reopen this issue as we now have a test to reproduce

@Code-Hex Code-Hex reopened this Apr 23, 2023
@Code-Hex
Copy link
Owner

@balajiv113 Thanks! I handle this issue on Wednesday 🙏

@Code-Hex
Copy link
Owner

@balajiv113 I apologize for not being able to keep the promise I made earlier.

Due to a hectic workload, I was unable to complete today's tasks.
Unfortunately, I will be going on a trip for the next four days and it seems unlikely that I'll be able to work on this issue during that time.

It appears that the release will be pushed back to next week 🙇

@balajiv113
Copy link
Contributor Author

@Code-Hex No problem at all.

Please do enjoy your day off 🎉

@Code-Hex
Copy link
Owner

Code-Hex commented May 5, 2023

@balajiv113 I'm working on this!

Code-Hex added a commit that referenced this issue May 5, 2023
Code-Hex added a commit that referenced this issue May 5, 2023
@Code-Hex
Copy link
Owner

Code-Hex commented May 5, 2023

@balajiv113 I released this fix as https://github.com/Code-Hex/vz/releases/tag/v3.0.6
Thank you so much!

@balajiv113
Copy link
Contributor Author

@Code-Hex, Thanks a lot.
Will integrate with lima. Hopefully we should not see this issue again :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants