Skip to content
This repository has been archived by the owner on Jul 16, 2020. It is now read-only.

--hard-reset can hang #343

Closed
markdryan opened this issue Jul 8, 2016 · 1 comment
Closed

--hard-reset can hang #343

markdryan opened this issue Jul 8, 2016 · 1 comment

Comments

@markdryan
Copy link
Contributor

markdryan commented Jul 8, 2016

I've seen --hard-reset hang on a node when shutting down an instance. I'm guessing the problem is with the function qemuKillInstance, in particular

    // Keep reading until the socket fails.  If we close the socket straight away, qemu does not
    // honour our quit command.

    scanner := bufio.NewScanner(conn)
    for scanner.Scan() {
    }

Perhaps for some reason qemu did not close the doman socket and this loop loops for ever. It seems that the qemu instance was hung somehow. Might be best to quit the loop after a fixed number of interations and try to kill the qemu process by sending it a signal instead.

@tpepper
Copy link

tpepper commented Jul 14, 2016

If the socket can give data, failure or block, my odds here are on qemu being hung and the read simply blocking, versus an infinite stream of data coming back from it.

qemuKillInstance() seems like it needs a monitor which spawns a helper goroutine to do the above "normal shutdown" and report success/failure via a channel to the monitor. The monitor can then select() on that channel or a timer. Then we get a good timeout fallback. In the timeout case another helper function for qemuForceKillInstanc() could be implemented to kill the qemu process via signal.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants