-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running pebble exec makes both client and server hang on server shutdown #163
Comments
Good bug report -- thanks. I've noticed this when testing as well. Implementation note: we use |
I spent an hour or so on this today, but this is going to require further investigation. Exec commands are tasks, and In any case, I'd like to look into this further, but have other things I need to work on now. In the meantime, if you can post specifics of your use case, or a workaround you've used, that might help others. |
I think this happens because the I created a patch to kill all child process managed by the reaper and wait for them to exit, which seems to fix this issue. diff --git a/internals/reaper/reaper.go b/internals/reaper/reaper.go
index fb1d7b6..f211ffd 100644
--- a/internals/reaper/reaper.go
+++ b/internals/reaper/reaper.go
@@ -21,6 +21,7 @@ import (
"os/exec"
"os/signal"
"sync"
+ "syscall"
"golang.org/x/sys/unix"
"gopkg.in/tomb.v2"
@@ -67,6 +68,23 @@ func Stop() error {
}
mutex.Unlock()
+ // kill all child process and wait them to die
+ mutex.Lock()
+ for len(pids) > 0 {
+ for p, ch := range pids {
+ syscall.Kill(p, syscall.SIGKILL)
+ mutex.Unlock()
+ // waiting for reaper finishing its job on this process
+ code, ok := <-ch
+ if ok {
+ ch <- code
+ }
+ mutex.Lock()
+ break
+ }
+ }
+ mutex.Unlock()
+
reaperTomb.Kill(nil)
reaperTomb.Wait()
reaperTomb = tomb.Tomb{}
@@ -201,6 +219,7 @@ func WaitCommand(cmd *exec.Cmd) (int, error) {
// Wait for reaper to reap this PID and send us the exit code.
exitCode := <-ch
+ close(ch)
// Remove PID from waits map once we've received exit code from reaper.
mutex.Lock()
|
And for use cases, we've identified an issue within our production environment where some workload containers is terminated during the execution of a command by charm. This problem prevents the container from restarting until the |
Okay, I understand better why this is happening now (and thanks for your pointer to the reaper, @weiiwang01). It's because the
Then when So I think the best fix is to move the This is closely related to #284, which suggests a similar fix. |
For example, if an exec command is running when you Ctrl-C the Pebble daemon, it hangs. It's hanging because the ServiceManager is stopping the reaper early in the process, before TaskRunner has had a chance to abort the exec tasks (aborting an exec task sends SIGKILL to its pid via the tomb and command context). Then when TaskRunner.Stop is called, it calls tomb.Kill and then tomb.Wait on each task (exec) tomb, and because the reaper's not running, the tomb.Wait hangs. So the fix is to move the reaper.Start and reaper.Stop to the top level (inside the "run" command, which is also used for "enter"), instead of putting them in servstate.NewManager and ServiceManager.Stop. After doing this, some of the tests also had to be modified to start and stop the reaper. Fixes canonical#163
In latest master (a1f5223), when a pebble exec command is running, shutting down the server will make both the client and server hang.
Steps to reproduce
The following steps are run in pebble source tree.
In terminal 1:
In terminal 2:
Now shutdown the server (in terminal 1) with Ctrl-C or sending SIGINT/SIGTERM directly to the pebble process. The server will send SIGKILL to the bash child, but the child will remain in zombie state. The client (terminal 2) will stop processing input, even escape sequences like ^C and ^Z, with last characters printed by bash remaining on the output.
To stop both client and server you can send SIGKILL to the server. The client will notice that websocket connection is gone and exit with failure.
It's not related to bash being started as interactive shell. The same behavior is observed when running e.g. "sleep 999" instead of bash.
The text was updated successfully, but these errors were encountered: