-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: separate signals for passive, active, and forced shutdown #12358
Conversation
5715f44
to
e823bae
Compare
`SIGTERM`: Passive shutdown stopping provisioner daemons from accepting new jobs but waiting for existing jobs to successfully complete. `SIGINT` (old existing behavior): Notify provisioner daemons to cancel in-flight jobs, wait 5s for jobs to be exited, then force quit. `SIGKILL`: Untouched from before, will force-quit.
e823bae
to
8d160ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like there being an option to let in-flight jobs complete, but I think it would be worthwhile to consolidate around one shutdown method. (I'm fine with wait for jobs being the default.)
Also, I think we should merge back "interrupt" and "stop" signal slices so that we don't break other commands.
If we still want interrupt in coderd to be "faster" (only useful for development, really?), we could use two additional slices.
var StopSignals = []os.Signal{ // new default (prev. interruptsignals renamed)
os.Interrupt,
syscall.SIGTERM,
syscall.SIGHUP,
}
var StopSignalsNoInterrupt = []os.Signal{
syscall.SIGTERM,
syscall.SIGHUP,
}
var InterruptSignals = []os.Signal{
os.Interrupt,
}
That's my proposal, also moved HUP since there's no inherent reason for HUP to use the fast-sequence either.
(We don't even need to take HUP as a stop signal, but it can be useful for development. If we don't use it for stop we should probably ignore it though to protect users from abrupt shutdown.)
cli/server.go
Outdated
exitErr = notifyCtx.Err() | ||
case <-stopCtx.Done(): | ||
exitErr = stopCtx.Err() | ||
graceful = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel both methods are graceful, so perhaps a rename here is appropriate?
graceful = true | |
waitForProvisionerJobs = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's reasonable.
enterprise/cli/provisionerdaemons.go
Outdated
@@ -225,7 +225,7 @@ func (r *RootCmd) provisionerDaemonStart() *clibase.Cmd { | |||
cliui.Errorf(inv.Stderr, "Unexpected error, shutting down server: %s\n", exitErr) | |||
} | |||
|
|||
err = srv.Shutdown(ctx) | |||
err = srv.Shutdown(ctx, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we're changing the default behavior for interrupt, and since we split interrupt and stop, we're also not handling SIGTERM
here anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
os.Interrupt, | ||
syscall.SIGTERM, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change in all CLI commands that handle signals. They used to handle TERM
but no longer do.
It would be really fantastic to get this finished and included in a release! That would make deployments much safer since they wouldn't risk interrupting the provisioner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🎉
os.Interrupt, | ||
} | ||
|
||
var StopSignalsNoInterrupt = []os.Signal{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love how Windows leaves you SOL. 🥲
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No doubt!
This is from a customer request to gracefully drain instances instead of forcefully killing the jobs.
SIGTERM
: Passive shutdown stopping provisioner daemons from accepting newjobs but waiting for existing jobs to be successfully completed.
SIGINT
(old existing behavior): Notify provisioner daemons to cancel in-flight jobs, wait for 5s for jobs to be exited, then force quit.SIGKILL
: Untouched from before, will force-quit.