You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you are building a HTTP service, occational upgrade of binary and change of configuration are almost unavoidable. It might cause serious problems if you did not realize the server should shutdown/restart gracefully until the alarms sound!
Simply put, a gracefull server/service should be capable of:
ensuring all in-progress requests are handled properly or timed-out;
restarting itself without closing the listening socket, optionally with upgraded binary or changed config.
This idea first came to me as one of my collegues talking about Nginx hot-reload, then I found this blog post explains it quite well. But when I try to implement a basal version of it (here's my effort grace), I realize there are still a lot to be filled in, including some updates from the coming Go 1.8, and here comes this post to share my experiences :).
TL;DR
If all you need is closing the server regardless of the open connections, you can just kill the process with a standard unix signal. Thus to handle all requests received before the process exits, the signal have to be caught to trigger certain logics specified by the server.
A Go HTTP server is run as a forever-blocking goroutine (usually the main one), which internally performs a infinite loop in func (srv *Server) Serve(l net.Listener) error until error appears. So the shutdown logics should keep tracks of the completions of all open connections while stopping the main goroutine from exiting, which usually introduce another blocking.
If requires restart, just fork a new process inheriting the listening socket (through file descriptor) and starts accepting connections on it before the shutdown.
Server, Listener & Conn
Before diving into the details of the shutdown logics, let's first figure out how Go HTTP server works (HTTPS similarily).
Whether you starts your server by http.ListenAndServe or srv.ListenAndServe, it all comes down to srv.Serve(l):
func (srv *Server) Serve(l net.Listener) error {
defer l.Close()
...
for {
rw, e := l.Accept()
if e != nil {
...
return e
}
...
c := srv.newConn(rw)
c.setState(c.rwc, StateNew) // before Serve can return
go c.serve(ctx)
}
}
l.Accept() waits for and returns the next connection (a net.Conn) to the listener, error from it is the only way to break out of the loop;
srv.newConn(c) converts a net.Conn to a internal conn which wraps the *Server and net.Conn;
after setting the connection state, go c.serve(ctx) dispatches a goroutine to handle the connection.
The listener provides a Close() method to cause breaking the loop:
type Listener interface {
...
// Close closes the listener.
// Any blocked Accept operations will be unblocked and return errors.
Close() error
}
Without other blocking codes, the main goroutine will return after srv.Serve(l) returns, thus terminating the process along with all other goroutines including those processing the open connections. This is the underlying reason of just-kill-the-server being not graceful.
Graceful Shutdown
So, the problem of graceful shutdown can be reduced to make the main goroutine wait/block until all connections got properly handled or timed-out. To do so, the server needs a way to track all the in-progress connections.
Periodic Polling in Go 1.8
The coming Go 1.8 ships with a graceful shutdown implementation (see this commit), which I think worth looking into the details.
First let's look at those added fields in Server and conn:
type Server struct {
...
inShutdown int32 // accessed atomically (non-zero means we're in Shutdown)
mu sync.Mutex
listeners map[net.Listener]struct{}
activeConn map[*conn]struct{}
doneChan chan struct{}
}
type conn struct {
...
curState atomic.Value // of ConnectionState
}
The Server uses two maps to hold the listeners and active connections, and each conn now holds its internal state (before 1.8, only Server provides a func(net.Conn, ConnState) hook invoked by func (c *conn) setState(nc net.Conn, state ConnState)). Every time the ConnState changed, activeConn map tracks it:
func (s *Server) trackConn(c *conn, add bool) {
s.mu.Lock()
defer s.mu.Unlock()
if s.activeConn == nil {
s.activeConn = make(map[*conn]struct{})
}
if add {
s.activeConn[c] = struct{}{}
} else {
delete(s.activeConn, c)
}
}
func (c *conn) setState(nc net.Conn, state ConnState) {
srv := c.server
switch state {
case StateNew:
srv.trackConn(c, true)
case StateHijacked, StateClosed:
srv.trackConn(c, false)
}
c.curState.Store(connStateInterface[state])
if hook := srv.ConnState; hook != nil {
hook(nc, state)
}
}
The srv.Serve(l) method now tracks the listeners (similar to trackConn using listeners map) and try to identify the new ErrServerClosed:
func (srv *Server) Serve(l net.Listener) error {
...
srv.trackListener(l, true)
defer srv.trackListener(l, false)
...
for {
rw, e := l.Accept()
if e != nil {
select {
case <-srv.getDoneChan():
return ErrServerClosed
default:
}
...
Finally, Server exposes two API to either close (immediately) or shutdown (gracefully) itself; the comments explain:
// Close immediately closes all active net.Listeners and connections,
// regardless of their state. For a graceful shutdown, use Shutdown.
func (s *Server) Close() error {
s.mu.Lock()
defer s.mu.Lock()
s.closeDoneChanLocked()
err := s.closeListenersLocked()
for c := range s.activeConn {
c.rwc.Close()
delete(s.activeConn, c)
}
return err
}
// shutdownPollInterval is how often we poll for quiescence
// during Server.Shutdown. This is lower during tests, to
// speed up tests.
// Ideally we could find a solution that doesn't involve polling,
// but which also doesn't have a high runtime cost (and doesn't
// involve any contentious mutexes), but that is left as an
// exercise for the reader.
var shutdownPollInterval = 500 * time.Millisecond
// Shutdown gracefully shuts down the server without interrupting any
// active connections. Shutdown works by first closing all open
// listeners, then closing all idle connections, and then waiting
// indefinitely for connections to return to idle and then shut down.
// If the provided context expires before the shutdown is complete,
// then the context's error is returned.
func (s *Server) Shutdown(ctx context.Context) error {
atomic.AddInt32(&s.inShutdown, 1)
defer atomic.AddInt32(&s.inShutdown, -1)
s.mu.Lock()
lnerr := s.closeListenersLocked()
s.closeDoneChanLocked()
s.mu.Unlock()
ticker := time.NewTicker(shutdownPollInterval)
defer ticker.Stop()
for {
if s.closeIdleConns() {
return lnerr
}
select {
case <-ctx.Done():
return ctx.Err()
case <-ticker.C:
}
}
}
s.closeDoneChanLocked() is used to signal a ErrServerClosed; s.closeListenersLocked calls l.Close() for all s.listeners; and s.closeIdleConns() scan through all s.activeConn's state periodically:
// closeIdleConns closes all idle connections and reports whether the
// server is quiescent.
func (s *Server) closeIdleConns() bool {
s.mu.Lock()
defer s.mu.Unlock()
quiescent := true
for c := range s.activeConn {
st, ok := c.curState.Load().(ConnState)
if !ok || st != StateIdle {
quiescent = false
continue
}
c.rwc.Close()
delete(s.activeConn, c)
}
return quiescent
}
In conclusion, Go 1.8 blocks when you call srv.Shutdown(ctx) explicitly and waits for the progress of each connection to complete by polling their states.
Other Ways to Track & Block
tylerb/graceful solved the problem by hooking the ConnState and extensively using channels to avoid most mutexes: connections are still tracked by a map and the progress blocks at channel receiving (inside srv.shutdown):
// Serve is equivalent to http.Server.Serve with graceful shutdown enabled.
func (srv *Server) Serve(listener net.Listener) error {
...
srv.Server.ConnState = func(conn net.Conn, state http.ConnState) {
switch state {
case http.StateNew:
add <- conn
case http.StateActive:
active <- conn
case http.StateIdle:
idle <- conn
case http.StateClosed, http.StateHijacked:
remove <- conn
}
srv.stopLock.Lock()
defer srv.stopLock.Unlock()
if srv.ConnState != nil {
srv.ConnState(conn, state)
}
}
...
go srv.handleInterrupt(interrupt, quitting, listener)
// Serve with graceful listener.
// Execution blocks here until listener.Close() is called, above.
err := srv.Server.Serve(listener)
...
srv.shutdown(shutdown, kill)
The other solution is to utilize sync.WaitGroup where each accepted c net.Conn makes wg.Add(1) and each calls of c.Close() triggers wg.Done(), which is explained in the above post and used by package endless. It requires addional wraps for net.Listener and net.Conn, and contentious mutexes. It might also be a problem when a connection is hijacked (through the Hajacker interface which bypass all the cleanups including c.Close()).
Graceful HTTP Server (in Golang)
If you are building a HTTP service, occational upgrade of binary and change of configuration are almost unavoidable. It might cause serious problems if you did not realize the server should shutdown/restart gracefully until the alarms sound!
Simply put, a gracefull server/service should be capable of:
This idea first came to me as one of my collegues talking about Nginx hot-reload, then I found this blog post explains it quite well. But when I try to implement a basal version of it (here's my effort grace), I realize there are still a lot to be filled in, including some updates from the coming Go 1.8, and here comes this post to share my experiences :).
TL;DR
If all you need is closing the server regardless of the open connections, you can just kill the process with a standard unix signal. Thus to handle all requests received before the process exits, the signal have to be caught to trigger certain logics specified by the server.
A Go HTTP server is run as a forever-blocking goroutine (usually the main one), which internally performs a infinite loop in
func (srv *Server) Serve(l net.Listener) error
until error appears. So the shutdown logics should keep tracks of the completions of all open connections while stopping the main goroutine from exiting, which usually introduce another blocking.If requires restart, just fork a new process inheriting the listening socket (through file descriptor) and starts accepting connections on it before the shutdown.
Server, Listener & Conn
Before diving into the details of the shutdown logics, let's first figure out how Go HTTP server works (HTTPS similarily).
Whether you starts your server by
http.ListenAndServe
orsrv.ListenAndServe
, it all comes down tosrv.Serve(l)
:l.Accept()
waits for and returns the next connection (anet.Conn
) to the listener, error from it is the only way to break out of the loop;srv.newConn(c)
converts anet.Conn
to a internalconn
which wraps the*Server
andnet.Conn
;go c.serve(ctx)
dispatches a goroutine to handle the connection.The listener provides a
Close()
method to cause breaking the loop:Without other blocking codes, the main goroutine will return after
srv.Serve(l)
returns, thus terminating the process along with all other goroutines including those processing the open connections. This is the underlying reason of just-kill-the-server being not graceful.Graceful Shutdown
So, the problem of graceful shutdown can be reduced to make the main goroutine wait/block until all connections got properly handled or timed-out. To do so, the server needs a way to track all the in-progress connections.
Periodic Polling in Go 1.8
The coming Go 1.8 ships with a graceful shutdown implementation (see this commit), which I think worth looking into the details.
First let's look at those added fields in
Server
andconn
:The
Server
uses two maps to hold the listeners and active connections, and eachconn
now holds its internal state (before 1.8, onlyServer
provides afunc(net.Conn, ConnState)
hook invoked byfunc (c *conn) setState(nc net.Conn, state ConnState)
). Every time theConnState
changed,activeConn
map tracks it:The
srv.Serve(l)
method now tracks the listeners (similar totrackConn
usinglisteners
map) and try to identify the newErrServerClosed
:Finally,
Server
exposes two API to either close (immediately) or shutdown (gracefully) itself; the comments explain:s.closeDoneChanLocked()
is used to signal aErrServerClosed
;s.closeListenersLocked
callsl.Close()
for alls.listeners
; ands.closeIdleConns()
scan through alls.activeConn
's state periodically:In conclusion, Go 1.8 blocks when you call
srv.Shutdown(ctx)
explicitly and waits for the progress of each connection to complete by polling their states.Other Ways to Track & Block
tylerb/graceful solved the problem by hooking the
ConnState
and extensively using channels to avoid most mutexes: connections are still tracked by a map and the progress blocks at channel receiving (insidesrv.shutdown
):The other solution is to utilize
sync.WaitGroup
where each acceptedc net.Conn
makeswg.Add(1)
and each calls ofc.Close()
triggerswg.Done()
, which is explained in the above post and used by package endless. It requires addional wraps fornet.Listener
andnet.Conn
, and contentious mutexes. It might also be a problem when a connection is hijacked (through theHajacker
interface which bypass all the cleanups includingc.Close()
).Repo
https://github.com/ShevaXu/playground/tree/master/grace
The text was updated successfully, but these errors were encountered: