New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JUJU-1887] Introduces service grace periods for shutdown and pebble shut down signals. #184
Conversation
this is interesting. While spec-ing Pebble for ROCKs I came across a similar dilemma. Although your proposal does the job for the given scenario, I wonder if the following implementation would make a bit more sense and serve a more general purpose: what about a |
@cjdcordeiro Thanks for the pointer to Docker's
Pebble calls this "shutdown" (for example, the "shutdown" action in the Pebble config), and the term "stop signal" in Pebble's case would more mean the signal Pebble sends to services to stop them, which is a different thing. So I suggest we call this To disable (as Tom wants here), we could just set it to an arbitrary signal like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this.
c648bcc
to
2f4dace
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the --shutdown-signals
flag with the list, thanks. I've left a few comments for potential improvements/simplifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks: re-reviewed and added a few more comments. I think there's a few things we need to fix yet, particularly the handling of failWait
and other uses of killWait
.
cmd/pebble/cmd_run.go
Outdated
CreateDirs bool `long:"create-dirs"` | ||
Hold bool `long:"hold"` | ||
HTTP string `long:"http"` | ||
ShutdownSignals string `long:"shutdown-signals" defaults:"INT,QUIT,TERM"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default tag isn't actually being used at the moment, because it's misspelled as defaults
instead of default
. Hmmm, might be tricky (might have to use os/exec to start a subprocess), but I wonder if we need a cmd_run test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about it tests. I'll dive into it and see what I can come up with.
4d2a07d
to
176632e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me now, thanks! Just one small comment re the short*Wait
constants.
return func() { | ||
killWait, failWait = old1, old2 | ||
defaultGracePeriod, killWaitDuration = old1, old2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks. One more thing. These (at the top of servstate/manager_test.go
):
shortKillWait = 100 * time.Millisecond
shortFailWait = 200 * time.Millisecond
Should probably be changed to shortGracePeriod
and shortKillWaitDuration
, and the latter could be changed to 100 * time.Millisecond
to match what it was before (now that it's additive).
- This commit adds a new cmd arg to pebble run that allows the user to specify the behaviour to take place when Pebble receives sigterm. The two supported values are ignore and terminate. This change is needed for Kubernetes and Juju to address bug lp1951415. When Kubernetes decides to remove a pod from the cluster to reschedule else where it sends a sigterm to all containers in the pod at once. Having Pebble ignore this sigterm and giving the charm a chance to run it's tear down hooks to completion before the workload under control exits is critical for the day to day operations of the charm. - The second change in this commit adds a new api endpoint to Pebble called shutdown. If Pebble is ignoring sigterm messages from controlling applications such as Kubernetes then we need a way to signal to Pebble to shutdown. - Third change is the ability to define a grace period for services and how long they receive to handle SIGTERM and shut down before being killed by Pebble
Closing in favor of #190 |
This commit adds a new cmd arg to pebble run that allows the user to specify the behaviour to take place when Pebble receives sigterm. The two supported values are ignore and terminate.
This change is needed for Kubernetes and Juju to address bug lp1951415. When Kubernetes decides to remove a pod from the cluster to reschedule else where it sends a sigterm to all containers in the pod at once. Having Pebble ignore this sigterm and giving the charm a chance to run it's tear down hooks to completion before the workload under control exits is critical for the day to day operations of the charm.
The second change in this commit adds a new api endpoint to Pebble called shutdown. If Pebble is ignoring sigterm messages from controlling applications such as Kubernetes then we need a way to signal to Pebble to shutdown.
Third change is the ability to define a grace period for services and how long they receive to handle SIGTERM and shut down before being killed by Pebble.
This work is being proposed as part of the overall fix in Juju for https://bugs.launchpad.net/juju/+bug/1951415
Testing:
The easiest way to perform manual testing against this change is to create a pebble service definition and run this with
pebble run --sigterm=ignore
. From there you should be able to send term to the process and watch the log output indicate that it is ignoring the signal.kill -s TERM <pid>