Handle SIGHUP, SIGTERM, SIGINT signals #21
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds proper handling of signals.
On
SIGTERMorSIGINT, it cleanly shuts down the goroutine that's querying the alerts and then does a graceful shutdown of the HTTP server (allowing it to finish serving in-flight requests and close the TCP connections first).On
SIGHUP, it does the same clean shutdown process, then re-reads the config file (note: it doesn't reload environment variables), and restarts things in place.This relies on functionality added to Go 1.7 (
contextlibrary) and 1.8 (support for graceful shutdown on the HTTP server), so it now will only compile with Go 1.8+. Thecenturylink/golang-builderdocker image that was being used is stuck on 1.5 and its corresponding github repo hasn't been touched in two years, so I also switched it to use the plaingolang:1.8docker image.Since I know you're not really Go programmers, and this relies on some pretty Go specific constructs, here's a bit of explanation of how the tricky bits work:
First, with:
It's setting up a channel,
sigs, that can be passedos.Signals. That is passed tosignal.Notify, along with a list of the signals that we're interested in being notified about. When one of those signals is seen, it is passed onto thesigschannel.A couple lines later, it does:
<-is the "read from channel" operator. So that reads a signal value from thesigschannel and assigns it tosignal. If a channel is empty,<-blocks until a value shows up. Ie, the main thread of execution in the program pretty much sits here waiting for the program to get one of the three signal types that we care about. Meanwhile the alertsCollection and HTTP server are running in background goroutines and doing their thing, so it's not like the program is hung. Once we get one of those signals, we proceed to shut things down, and, if it's aSIGHUP, reload the config and restart or, just exit if it's one of the other signals.The other tricky part is the
contextstuff. You can think of acontextin Go as a cancellable thing that can be chained together into structures, which can then all be cancelled at once. They are commonly used in Go for timeouts, deadlines, and explicit cancellation. It's probably helpful to think of acontextas both thecontextand an "entangled" cancel function. If the cancel function is called, the associatedcontext(and any childcontexts that have been derived from it) knows that it has been cancelled.The HTTP graceful shutdown API takes a context parameter to enable a hard timeout. Here:
A context
ctxis created that will automatically be cancelled after one second. It is passed to theShutdown()call. It will spend up to a second trying to finish serving in-flight requests and close connections, but no longer. The later call tocancel()explicitly makes sure the context has been cancelled, even if it never timed out (not strictly necessary, but good practice).For the alertsCollection, which is also running in the background, polling Graphite every interval, we also pass it a context, and the associated cancel function is returned to our main thread as
alertscancel. When we callalertscancel(), it cancels the context that the alertsCollection is holding onto.alertsCollection's main loop is in
Run(). Previously, it just was a plain loop that would poll graphite, then sleep for an interval and repeat forever. It still does that, but now there's aselectin there with two cases.selecton multiple empty channels in Go blocks until one of the channels has a value.context.Done()returns a value that context has been cancelled. So as soon asalertscancel()was called back in the main goroutine, that unblocks and it can break out of the loop. The other channel that the select is trying to read from is now coming fromtime.After(), which just pushes a value on the channel after a specified duration, so that's simply taking the place of the old sleep. If the context is cancelled while it is off polling graphite or sending out alert emails, it will complete that work before it gets back to theselectand sees that the context is cancelled. It would be possible to pass the context down to those functions so they could abort themselves immediately (and I might send that commit later), but it's probably fine to let it finish out its complete cycle.